Accounting for Population Stratification in DNA Methylation Studies 公开

Barfield, Richard Thomas (2012)

Permanent URL:


DNA methylation is an important epigenetic mechanism that helps regulate gene
expression and can be influenced by both the environment and the genome. DNA
methylation has also been linked to some cancers, complex diseases, and
transgenerational effects, and is thus of great interest to public health researchers as a
potential link between genome, environment, and disease. In recent years there has been
an increase in the number of genome-wide DNA methylation association studies due to a
decrease in prices and improved technology. We can now perform DNA methylation
association studies at the scale that genome wide association studies (GWAS) were
performed a few years back. As with GWAS, problems such as population stratification
will also need to be addressed in these DNA methylation studies. Failure to adjust for
population stratification in genetic association studies can lead to potential false positives
and erroneous results, but population stratification has yet to be accounted for in DNA
methylation studies. To address this, we analyzed DNA methylation for association with
race in two separate datasets, and identified widespread associations with race across the
genome in both cases. We then performed principal components analysis on different
forms of the data and included these principal components in the model to determine
whether this approach would reduce the number of sites significantly associated with
race. We examined principal components computed from data pruned based on
correlation and principal components based on CpG sites within a certain distance of a
SNP ("informed pruning"). We found that the principal components from the informed
pruning performed the best in reducing the number of sites significantly associated with
race (90.55- 97.82% reductions in the number of FDR-significant and 84.07-94.38%
reductions in the number of Holm-significant sites); this approach was also less
computationally intensive than approaches requiring correlation-based pruning. We have
therefore developed an effective method to account for population stratification in DNA
methylation studies that does not require the collection of data on genetic variants.

Table of Contents

Table of Contents

About this Master's Thesis

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
Subfield / Discipline
  • English
Research field
Committee Chair / Thesis Advisor
Partnering Agencies

Primary PDF

Supplemental Files