The study of the relative contribution of genes and environment to the risk of common diseases presents a number of statistical challenges, from study design to analysis. My research focus is statistical methodology in genetic epidemiology, including family-based and population-based case-control studies.
My current projects include methods to measure association between haplotypes of multiple tightly-linked markers and disease in matched case-control studies and to detect gene x gene and gene x environment interactions. I am also interested in using joint variation in DNA sequence and gene expression to better understand disease etiology.
I collaborate with colleagues in the Department of Epidemiology and the Channing Laboratory on a number of large-scale cohort studies, such as the Nurses' Health Study, as well as the international Cohort Consortium for Breast and Prostate Cancer.
"GECOR is a windows program written in Java, (i.e., a Java desktop GUI wrap-around for an R function) for calculating sample sizes in matched case-control studies examining genetic and environmental factors, and/or gene-environment interaction. It allows for sample size calculations for the main effects of gene and/or environment, as well as gene-environment interaction. Main effects of gene and/or environment may also be calculated without gene-environment interaction. Environmental effects may be either modelled as dichotomous or categorical. Genetic models are restricted to an additive, dominant, or recessive mode of inheritance. Users may specify multiple controls per case. Additionally, the sample size or power calculations can accommodate scenarios with correlation between the case and control environmental exposure levels."
"GEmis calculates power in case-control studies examining genetic factors in the presence of misclassification of an environmental factor E as well as dependence between the genetic variant G and the environmental exposure E. Three different tests are considered, the marginal effect of the gene, the standard test for gene-environment interaction and the joint test for a genetic marginal effect and gene-environment interaction. Environmental effect is modelled as dichotomous. Genetic model is restricted to dominant inheritance model."
"HAPPY estimates haplotype-specific odds ratios from genotype data on unrelated cases and controls using unconditional logistic regression. It can adjust for the main effects of relevant covariates and estimate stratum-specific haplotype effects. Aside from confidence intervals around individual odds ratio estimates, HAPPY calculates omnibus tests of haplotype association and haplotype-environment interactions. HAPPY uses the "expectation substitution" approach [1,2], which treats expected haplotype scores (calculated under a user-specified inheritance model) as observed covariates in a standard unconditional logistic analysis. The macro outputs these expected scores to an auxiliary data set; the scores can be then be used in customized analyses."
Odds ratios for phase-known haplotypes measured with error.
"MULTIPOW calculates the power for both joint and replication-based analysis of general multi-stage genetic association studies. It differs from other packages that calculate the power for joint analysis in that: (a) it allows for an arbitrary number of stages (three, four or more instead of just two); (b) it can incorporate the efficiency of the genotyped marker panel into the power calculations; and (c) it is based on the 2 d.f. Pearson's chi-squared test statistic from the 2×3 disease-by-genotype table, rather than the Z test comparing allele frequencies between cases and controls."
These functions are useful for designs with more than two stages.
Permutation adjustment for multiple testing.