Identification of the Effect of Population Stratification on Association Studies of Rare Variants Open Access

Jiang, Yunxuan (2011)

Permanent URL: https://etd.library.emory.edu/concern/etds/r781wg25f?locale=en
Published

Abstract



Abstract

Identification of the Effect of Population Stratification on Association Studies of Rare
Variants

BY Yunxuan Jiang

Human genome research, which aims to find the genetic etiology of the disease, is having a more
and more profound influence on public health. And rare variants, which both have large effect
size and can explain a great proportion of heritability, are becoming the focus of current human
genome research. Although several statistical methods have developed to increase the power of
detecting rare variants and reduce false positive rate, none of these methods address an important
issue that often arises in genetic studies: false positives due to population stratification.
Population stratification is a well-known problem that can substantially cause inflated false
positive rate and decreased power to detect real association. We simulated several case-control
studies with different sample size and population structure according to a series of disease
prevalence for each population (Europea and Africa), and found that population stratification can
have a significant influence on rare variants studies. The false positive rate increases dramatically
as sample size increase and population structure become extreme. We applied principal
component analysis to control for population structure. Our results showed that the principal
component method performed very well even for highly structured data. The false positive rate
remained around 0.05 in our simulation. Our results implicates that researchers need to carefully
match case and control ancestry, in order of avoid false positive caused by population structure in
rare variants study. If it is inevitable to recruit samples from different population, then researchers
can correct for it with our easy implemented method.




Identification of the Effect of Population Stratification on Association Studies of Rare
Variants
BY


Yunxuan Jiang
Bachelor of Science
Beijing Forestry University
2009


Thesis Committee Chair: Karen N. Conneely, Ph.D
Michael P. Epstein, Ph.D

A thesis submitted to the Faculty of the
Rollins School of Public Health of Emory University
in partial fulfillment of the requirements for the degree of
Master of Science in Public Health
in Biostatistics
2011

Table of Contents

Table of Contents

Chapter 1 Introduction ......................................................................1

Chapter 2 Review of the Literatures ...................................................6

2.1 Common vs. Rare Variants................................................................7

2.2 Study Design.................................................................................11

2.2.1 Linkage Analysis..........................................................................12

2.2.2 Association Studies......................................................................13

2.2.2.1 Candidate Gene Association Studies..............................................13

2.2.2.2 Genome Wide Association Studies.................................................14

2.3 Population Stratification...................................................................16

2.3.1 Genomic Control...........................................................................18

2.3.2 Structure....................................................................................19

2.3.3 Principal Component Analysis...........................................................19

2.4 Statistical Methods for Analyzing Rare Variants......................................20

Chapter 3 Methodology .......................................................................21

3.1 Simulating Population Specific Haplotype...............................................23

3.1.1 Building the Genealogy....................................................................23

3.1.2 Mutation......................................................................................25

3.1.3 Adding Neutral Mutations to the genealogy.........................................26

3.1.4 Migration......................................................................................27

3.1.5 Recombination...............................................................................29

3.2 Simulating a case-control Study..........................................................30

3.3 Simulating GWAS Data.......................................................................32

3.4 Methods to Calculate Principal Components............................................32

3.5 Testing for Association Between Rare Variants and Disease.......................34

Chapter 4 Results ................................................................................36

4.1 Simulated Study................................................................................37

4.2 False Positive Rate........................................................ ....................37

4.3 Correcting for Population Stratification using Principal Components..............40

Chapter 5 Conclusions, Implications and Recommendations ...................43

Reference ...........................................................................................47

About this Master's Thesis

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Subfield / Discipline
Degree
Submission
Language
  • English
Research Field
Keyword
Committee Chair / Thesis Advisor
Partnering Agencies
Last modified

Primary PDF

Supplemental Files