Statistical Methods for Rare-Variant Sequencing Studies in Pedigrees Open Access

Jiang, Yunxuan (2017)

Permanent URL:


Next-generation sequencing studies have the potential to increase understanding of genetic architecture of complex diseases in more depth than ever before, but require the development of robust and powerful statistical methods to identify trait-influencing variation. During the past few years, interests have shifted from identifying common susceptibility variation in the population to rare susceptibility variation. However, the infrequent observation of rare variants (<5% in the population) poses difficulties in developing powerful statistical methods. Although methods have been proposed to analyze rare susceptibility variation in population-based or case-control designs, few of these methods can be applied to family-based study designs. Family-based designs have several advantages including higher power due to increased genetic load, robustness to population stratification, and the ability to identify de-novo mutations by sequencing trios. In our first project, we developed a flexible and robust method for rare variant analysis of quantitative traits in nuclear families and trios. Our method uses a kernel- machine framework to analyze rare variants in aggregate, and has the advantages of analytical calculation of p-values and robustness to population stratification. The method also employs a screening step to improve power. This method, as with other existing methods, mainly focuses on trios and nuclear families while ignoring the information provided by other types of relatives. As more studies tend to re-sequence subjects from previous linkage analysis studies, which normally involve more than two generations, statistical methods to analyze sequencing studies of large pedigrees are needed. In our second project, we develop a method for family-based rare-variant analysis of quantitative outcomes that can accommodate any family structure and size. Our first and second projects are designed to perform family-based tests that consider association between a gene and a single phenotype. However, there has been increasing interest in identifying pleiotropic genes through joint testing of multiple phenotypes; such approaches are both biologically meaningful and statistically more powerful than univariate testing of individual phenotypes. Therefore, in our third project, we develop a cross-phenotype association test for case-parent trio studies. Based on the kernel distance covariance framework, our test can incorporate multiple traits (both binary and continuous in nature) and is more powerful compared to analogous univariate tests of individual phenotypes.

Table of Contents


Chapter 1. Introduction 1

1.1 Background 2

1.2 Literature review 4

1.2.1 Existing methods for rare variant analysis 5

1.2.2 Existing methods for family-based studies 10

1.2.3 Population stratification and QTDT 11

1.2.4 Screening methods 13

1.3 Summary 14

Chapter 2. Flexible and Robust Methods for Rare-Variant Testing of Quantitative Traits in Trios and Nuclear Families 16

2.1 Introduction 18

2.2 Materials and methods 18

2.2.1 Notation and KMFAM model 22

2.2.2 Robust rare-variant association test 22

2.2.3 Screening procedure 22

2.2.4 Type I error simulations 22

2.2.5 Power simulations 22

2.3 Results 27

2.3.1 Type I error 27

2.3.2 Power 28

Chapter 3. Robust Rare-Variant Association Tests For Quantitative Traits in General Pedigrees 40

3.1 Introduction 42

3.2 Materials and methods 45

3.2.1 Study Design and notation 45

3.2.2 KMR framework for pedigree data 46

3.2.3 QTDT Framework for general pedigrees 48

3.2.4 Screening methods 50

3.2.5 Simulation studies 51

3.2.6 GAW18 Data 52

3.3 Results 53

3.3.1 Type I error 53

3.3.2 Power 53

3.3.3 Application to GAW18 database54

3.4 Discussion55

Chapter 4. Powerful and Robust Cross-Phenotype Association Test of Rare Variants in Case-Parent Trios 62

4.1 Introduction64

4.2 Materials and methods67

4.2.1 Notation 67

4.2.2 Kernel distance covariance test of Independence67

4.2.3 KDC test for case-parent trios 69

4.2.4 Simulations 70

4.2.5 GoKind Data Analysis 72

4.3 Results72

4.3.1 Type I error 73

4.3.2 Power74

4.5 Discussion76

Chapter 5 Conclusion and future work 77

About this Dissertation

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
  • English
Research Field
Committee Chair / Thesis Advisor
Committee Members
Last modified

Primary PDF

Supplemental Files