Maximizing the ability to detect modifying genetic factors of rare complex disorders – Fragile X-Associated Primary Ovarian Insufficiency and Down Syndrome - Congenital Heart Defects Open Access

Trevino, Cristina (Summer 2020)

Permanent URL:


In order to better identify and understand the genetic architecture of complex traits, modern genomic methods are more focused on using the ample amount of data that has been collected over the last decade and examining the genome in different ways. However, prioritizing functional variants in this framework remains challenging. Strategies including faster and easier to use annotation and filtering methods are increasingly important for genomic analyses today. Selecting cohorts from genetically-sensitized populations or constructing a cohort from those with the extreme phenotypes of a complex trait are other strategies to maximize the ability to detect susceptibility variants. In this dissertation, I employ these strategies to study primary ovarian insufficiency (POI) in a cohort of women with a fragile X premutation (PM) and to study atrioventricular septal defects (AVSD) in a cohort of individuals with Down syndrome (DS). Both of these groups have these co-occurring traits at a much higher frequency than the general population - women with a PM are at a 20-fold increased risk for POI and individuals with DS are at a >2000 increased risk for AVSD.

POI, which affects 1% of women in the general population, is a condition characterized by symptoms of early menopause and is a leading cause of infertility. About 20% of women who carry a PM, a CGG repeat expansion in the range of 55-200 repeats in the 5’UTR of the X-linked FMR1 gene, are diagnosed with fragile X-associated POI (FXPOI). We hypothesize that there are genetic modifiers that contribute to the age of onset and severity of FXPOI. In order to test this, we conducted a case/control study among women with a PM taken from the extremes of the distribution of age at onset of FXPOI/menopause (onset before age 35 and after age 50). We compared whole genome sequencing (WGS) data in an untargeted way and examining candidate genes that are involved in the underlying mechanism of PM-associated disorders. Top ranked genes were then screened using the Drosophila model as a high-throughput, whole organism functional screen to gain further evidence of their involvement in ovarian dysfunction.

AVSDs are a rare and severe form of congenital heart defects (CHD) and require surgery soon after birth. In general, CHDs occur in almost 1% of infants in the general population; AVSD occurs in about 1/10,000. Most genetic studies of CHD examine all forms, although there is strong evidence of etiological heterogeneity. We took the same strategy as above and identified a genetically-sensitized population to increase the ability to identify risk variants of AVSD. About 20% of infants with Down syndrome, or trisomy 21, are born with an AVSD, an enormous increase in frequency over the general population. Thus, we based our study on 702 individuals with DS who did and did not have an AVSD, again, drawing from those with the extremes of heart development. We used available whole exome sequencing, WGS, and/or array-based imputation data and took a variety of statistical approaches to examine risk-associated genes and pathways and to examine the contribution of many common variants of small effect size using polygenic risk score (PRS) methods.

Results from both studies that combined multiple statistical approaches of genetic data based on extreme phenotypes within genetically-sensitized cohorts proved successful. Identified candidate genes can now be moved to mammalian model systems to test for functional involvement. These studies benefit not only those with increased risk (i.e., women with a PM or people with DS), but may also be translated to those with idiopathic forms of the disorders.

Table of Contents

I. Introduction

    I.I Understanding the genetic architecture of complex traits

    I.I.i. Gene set analyses SKAT-O in Human Genetics

I.I.ii. Understanding contribution of polygenes

I.II Fragile X-Associated Primary Ovarian Insufficiency

I.II.i. Prevalence of Primary Ovarian Insufficiency

I.II.ii. Risk factors for FXPOI

I.II.iii. Mechanisms of the PM leading to FXPOI

I.II.iv. Animal models for FXPOI

I.III Congenital Heart Defects in Down Syndrome

I.III.i. Prevalence and variability in phenotype for DS

I.III.ii. Genetic studies of CHD

    I.III.iii. Genetic studies of DS CHD


II. Identifying modifying genes to explain the variation in severity of fragile X‐associated primary ovarian insufficiency

II.I. Abstract

II.II. Introduction

II.III. Methods

II.III.i. Participants

II.III.ii. Laboratory Methods

II.III.iii. Bioinformatic Analysis

II.III.iv. Common variant analysis

II.III.v. Rare variant analysis Polygenic risk score analyses

II.III.vii. Generation of a stable line expressing 90 CGG in the Drosophila germline

II.III.viii. Fecundity Testing

II.IV. Results

II.III.i. Genome wide association study of common variants

II.III.ii. Age at Menopause Polygenic Risk Score Analysis and its association with FXPOI

II.III.iii. Identifying modifying gene candidates with SKAT-O analysis

II.III.iv. Drosophila fecundity as a whole organism functional study

II.III.v. Fecundity of RNA binding proteins

II.V. Discussion

II.VI. Tables and Figures

II.VII. References


III. Identifying genetic factors that contribute to the increased risk of congenital heart defects in infants with Down syndrome

III.I. Abstract

III.II. Introduction

III.III. Methods

III.III.i. Subjects

III.III.ii. Whole exome sequencing

III.III.iii. Whole genome sequencing

III.III.iv. Samples with imputed genotypes based on microarray

III.III.v. SKAT-O variant analysis Polygenic risk score analyses dataset for primary analyses dataset for secondary PRS analyses PRS for the primary analyses PRS for the secondary analyses association of PRS with DS+AVSD

III.IV. Results

III.IV.i. Gene discovery using SKAT analyses

III.IV.ii. CHD polygenic risk score and its association with DS+AVSD

III.IV.ii.i Primary analyses indicate a non-significant association of the CHD-based PRS with DS+AVSD

III.IV.ii.ii Adding data from chromosome 21 into the PRS calculation did not change the association with DS+AVSD

III.V. Discussion

III.VI. Tables and Figures

II.VII. References

II.VIII. Supplemental methods and References


IV. Bystro: rapid online variant annotation and natural-language filtering at whole-genome scale

IV.I. Abstract

IV.II. Introduction

IV.III. Results

IV.IV. Discussion

IV.V. Methods

III.V.i. Accessing Bystro

III.V.ii. Bystro Database

III.V.iii. WGS Datasets

III.V.iv. Online annotation comparisons

III.V.v. Variant filtering comparisons Filtering accuracy comparison

III.V.vii. Offline annotation comparisons

III.V.viii. Annotation accuracy comparison

III.VI. Tables and Figures

II.VII. References


V. Discussion

V.I Conclusions

V.II Limitations

V.III Implications and future directions

V.IV References


Table 2.1. Bloomington TRiP line stocks and corresponding human gene orthologs

Table 2.2. Odds ratios for PRS Quartiles

Table 2.2. Top candidate genes from SKAT-O analysis

Table 2.3. Quasipoisson regression model for top three candidates

Table 3.1. Summary of cohort

Table 3.2. Summary of gene sets for SKAT-O

Table 3.3. Diagnoses for first training set

Table 3.4. Diagnoses for second training set

Table 3.5. SKAT-O results of rare variants

Table 3.6. SKAT-O results of ultra-rare variants

Table 3.7. SKAT-O results of common variants

Table 3.8. SKAT-O results of rare variants for top two pathways

Table 3.9. PRS results using discovery GWAS of 2,594 mixed CHD cases and 5,159 controls and SNPs with MAF ≥ 0.35

Table 4.1. Bystro, VEP, ANNOVAR offline command-line performance.

Table 4.2. Online comparison of Bystro and recent programs in filtering

Table 4.3. Online comparison of Bystro and GEMINI/Galaxy in filtering 106 sites



Figure 1.1. Expression of the FMR1 mRNA and translation into FMRP differs at different sizes of the CGG repeat in the 5’ UTR of the FMR1 resulting in different phenotypes

Figure 1.2. Potential mechanisms involved in CGG PM-related pathology

Figure 2.1. Distribution of cohort

Figure 2.2. Manhattan plot of common variant (MAF > 0.05) GWAS

Figure 2.3. PRS analysis reveals a Nagelkerke’s R2 of 7.5% at a threshold below p-values < 0.002 in the discovery set

Figure 2.4. Fecundity of Drosophila Controls

Figure 2.5. Initial screen for top WGS candidate genes

Figure 2.6. Follow-up fecundity testing on top three WGS candidates

Figure 2.7. Fecundity screen of RNA binding proteins previously associated with Fragile-X associated disorders

Figure 3.1. Flowchart showing the multiple steps involved in generating the final data set for the primary PRS analyses

Figure 3.2. Representative SKAT-O Manhattan plot and QQ plot of common variants

Figure 3.3. PRS results using discovery GWAS of 2,594 mixed CHD cases and 5,159 controls and SNPs with MAF ≥ 0.35

Figure 3.4. PRS results using discovery GWAS of 2,594 mixed CHD cases and 5,159 controls and various MAF thresholds

Figure 3.5. PRS results using discovery GWAS of 406 mixed CHD cases and 2,976 controls and various MAF thresholds

Figure 3.6. PRS results using meta-analysis of two GWAS as discovery dataset and employing inverse variance weighted SNP effects for scoring, for various MAF thresholds

Figure 3.7. PRS results for all autosomes excluding chromosome 21

Figure 3.8. Maximum variance in target phenotype that can be explained by PRS (y-axis: liability scale r2) given a range of training sample sizes (x-axis: number of cases in thousands)

Figure 4.1. A) Bystro use overview

Figure 4.1. B) Variant selection using Bystro

Figure 4.2. Online performance comparison of Bystro, VEP, wANNOVAR, and GEMINI

About this Dissertation

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
Subfield / Discipline
  • English
Research Field
Committee Chair / Thesis Advisor
Committee Members
Last modified

Primary PDF

Supplemental Files