Maximizing the ability to detect modifying genetic factors of rare complex disorders – Fragile X-Associated Primary Ovarian Insufficiency and Down Syndrome - Congenital Heart Defects Pubblico
Trevino, Cristina (Summer 2020)
Abstract
In order to better identify and understand the genetic architecture of complex traits, modern genomic methods are more focused on using the ample amount of data that has been collected over the last decade and examining the genome in different ways. However, prioritizing functional variants in this framework remains challenging. Strategies including faster and easier to use annotation and filtering methods are increasingly important for genomic analyses today. Selecting cohorts from genetically-sensitized populations or constructing a cohort from those with the extreme phenotypes of a complex trait are other strategies to maximize the ability to detect susceptibility variants. In this dissertation, I employ these strategies to study primary ovarian insufficiency (POI) in a cohort of women with a fragile X premutation (PM) and to study atrioventricular septal defects (AVSD) in a cohort of individuals with Down syndrome (DS). Both of these groups have these co-occurring traits at a much higher frequency than the general population - women with a PM are at a 20-fold increased risk for POI and individuals with DS are at a >2000 increased risk for AVSD.
POI, which affects 1% of women in the general population, is a condition characterized by symptoms of early menopause and is a leading cause of infertility. About 20% of women who carry a PM, a CGG repeat expansion in the range of 55-200 repeats in the 5’UTR of the X-linked FMR1 gene, are diagnosed with fragile X-associated POI (FXPOI). We hypothesize that there are genetic modifiers that contribute to the age of onset and severity of FXPOI. In order to test this, we conducted a case/control study among women with a PM taken from the extremes of the distribution of age at onset of FXPOI/menopause (onset before age 35 and after age 50). We compared whole genome sequencing (WGS) data in an untargeted way and examining candidate genes that are involved in the underlying mechanism of PM-associated disorders. Top ranked genes were then screened using the Drosophila model as a high-throughput, whole organism functional screen to gain further evidence of their involvement in ovarian dysfunction.
AVSDs are a rare and severe form of congenital heart defects (CHD) and require surgery soon after birth. In general, CHDs occur in almost 1% of infants in the general population; AVSD occurs in about 1/10,000. Most genetic studies of CHD examine all forms, although there is strong evidence of etiological heterogeneity. We took the same strategy as above and identified a genetically-sensitized population to increase the ability to identify risk variants of AVSD. About 20% of infants with Down syndrome, or trisomy 21, are born with an AVSD, an enormous increase in frequency over the general population. Thus, we based our study on 702 individuals with DS who did and did not have an AVSD, again, drawing from those with the extremes of heart development. We used available whole exome sequencing, WGS, and/or array-based imputation data and took a variety of statistical approaches to examine risk-associated genes and pathways and to examine the contribution of many common variants of small effect size using polygenic risk score (PRS) methods.
Results from both studies that combined multiple statistical approaches of genetic data based on extreme phenotypes within genetically-sensitized cohorts proved successful. Identified candidate genes can now be moved to mammalian model systems to test for functional involvement. These studies benefit not only those with increased risk (i.e., women with a PM or people with DS), but may also be translated to those with idiopathic forms of the disorders.
Table of Contents
I. Introduction
I.I Understanding the genetic architecture of complex traits
I.I.i. Gene set analyses SKAT-O in Human Genetics
I.I.ii. Understanding contribution of polygenes
I.II Fragile X-Associated Primary Ovarian Insufficiency
I.II.i. Prevalence of Primary Ovarian Insufficiency
I.II.ii. Risk factors for FXPOI
I.II.iii. Mechanisms of the PM leading to FXPOI
I.II.iv. Animal models for FXPOI
I.III Congenital Heart Defects in Down Syndrome
I.III.i. Prevalence and variability in phenotype for DS
I.III.ii. Genetic studies of CHD
I.III.iii. Genetic studies of DS CHD
II. Identifying modifying genes to explain the variation in severity of fragile X‐associated primary ovarian insufficiency
II.I. Abstract
II.II. Introduction
II.III. Methods
II.III.i. Participants
II.III.ii. Laboratory Methods
II.III.iii. Bioinformatic Analysis
II.III.iv. Common variant analysis
II.III.v. Rare variant analysis
II.III.vi. Polygenic risk score analyses
II.III.vii. Generation of a stable line expressing 90 CGG in the Drosophila germline
II.III.viii. Fecundity Testing
II.IV. Results
II.III.i. Genome wide association study of common variants
II.III.ii. Age at Menopause Polygenic Risk Score Analysis and its association with FXPOI
II.III.iii. Identifying modifying gene candidates with SKAT-O analysis
II.III.iv. Drosophila fecundity as a whole organism functional study
II.III.v. Fecundity of RNA binding proteins
II.V. Discussion
II.VI. Tables and Figures
II.VII. References
III. Identifying genetic factors that contribute to the increased risk of congenital heart defects in infants with Down syndrome
III.I. Abstract
III.II. Introduction
III.III. Methods
III.III.i. Subjects
III.III.ii. Whole exome sequencing
III.III.iii. Whole genome sequencing
III.III.iv. Samples with imputed genotypes based on microarray
III.III.v. SKAT-O variant analysis
III.III.vi. Polygenic risk score analyses
III.III.vi.i.Target dataset for primary analyses
III.III.vi.ii.Target dataset for secondary PRS analyses
III.III.vi.iii.Generating PRS for the primary analyses
III.III.vi.iv.Generating PRS for the secondary analyses
III.III.vi.iv.Testing association of PRS with DS+AVSD
III.IV. Results
III.IV.i. Gene discovery using SKAT analyses
III.IV.ii. CHD polygenic risk score and its association with DS+AVSD
III.IV.ii.i Primary analyses indicate a non-significant association of the CHD-based PRS with DS+AVSD
III.IV.ii.ii Adding data from chromosome 21 into the PRS calculation did not change the association with DS+AVSD
III.V. Discussion
III.VI. Tables and Figures
II.VII. References
II.VIII. Supplemental methods and References
IV. Bystro: rapid online variant annotation and natural-language filtering at whole-genome scale
IV.I. Abstract
IV.II. Introduction
IV.III. Results
IV.IV. Discussion
IV.V. Methods
III.V.i. Accessing Bystro
III.V.ii. Bystro Database
III.V.iii. WGS Datasets
III.V.iv. Online annotation comparisons
III.V.v. Variant filtering comparisons
III.V.vi. Filtering accuracy comparison
III.V.vii. Offline annotation comparisons
III.V.viii. Annotation accuracy comparison
III.VI. Tables and Figures
II.VII. References
V. Discussion
V.I Conclusions
V.II Limitations
V.III Implications and future directions
V.IV References
Tables
Table 2.1. Bloomington TRiP line stocks and corresponding human gene orthologs
Table 2.2. Odds ratios for PRS Quartiles
Table 2.2. Top candidate genes from SKAT-O analysis
Table 2.3. Quasipoisson regression model for top three candidates
Table 3.1. Summary of cohort
Table 3.2. Summary of gene sets for SKAT-O
Table 3.3. Diagnoses for first training set
Table 3.4. Diagnoses for second training set
Table 3.5. SKAT-O results of rare variants
Table 3.6. SKAT-O results of ultra-rare variants
Table 3.7. SKAT-O results of common variants
Table 3.8. SKAT-O results of rare variants for top two pathways
Table 3.9. PRS results using discovery GWAS of 2,594 mixed CHD cases and 5,159 controls and SNPs with MAF ≥ 0.35
Table 4.1. Bystro, VEP, ANNOVAR offline command-line performance.
Table 4.2. Online comparison of Bystro and recent programs in filtering
Table 4.3. Online comparison of Bystro and GEMINI/Galaxy in filtering 106 sites
Figures
Figure 1.1. Expression of the FMR1 mRNA and translation into FMRP differs at different sizes of the CGG repeat in the 5’ UTR of the FMR1 resulting in different phenotypes
Figure 1.2. Potential mechanisms involved in CGG PM-related pathology
Figure 2.1. Distribution of cohort
Figure 2.2. Manhattan plot of common variant (MAF > 0.05) GWAS
Figure 2.3. PRS analysis reveals a Nagelkerke’s R2 of 7.5% at a threshold below p-values < 0.002 in the discovery set
Figure 2.4. Fecundity of Drosophila Controls
Figure 2.5. Initial screen for top WGS candidate genes
Figure 2.6. Follow-up fecundity testing on top three WGS candidates
Figure 2.7. Fecundity screen of RNA binding proteins previously associated with Fragile-X associated disorders
Figure 3.1. Flowchart showing the multiple steps involved in generating the final data set for the primary PRS analyses
Figure 3.2. Representative SKAT-O Manhattan plot and QQ plot of common variants
Figure 3.3. PRS results using discovery GWAS of 2,594 mixed CHD cases and 5,159 controls and SNPs with MAF ≥ 0.35
Figure 3.4. PRS results using discovery GWAS of 2,594 mixed CHD cases and 5,159 controls and various MAF thresholds
Figure 3.5. PRS results using discovery GWAS of 406 mixed CHD cases and 2,976 controls and various MAF thresholds
Figure 3.6. PRS results using meta-analysis of two GWAS as discovery dataset and employing inverse variance weighted SNP effects for scoring, for various MAF thresholds
Figure 3.7. PRS results for all autosomes excluding chromosome 21
Figure 3.8. Maximum variance in target phenotype that can be explained by PRS (y-axis: liability scale r2) given a range of training sample sizes (x-axis: number of cases in thousands)
Figure 4.1. A) Bystro use overview
Figure 4.1. B) Variant selection using Bystro
Figure 4.2. Online performance comparison of Bystro, VEP, wANNOVAR, and GEMINI
About this Dissertation
School | |
---|---|
Department | |
Subfield / Discipline | |
Degree | |
Submission | |
Language |
|
Research Field | |
Parola chiave | |
Committee Chair / Thesis Advisor | |
Committee Members |
Primary PDF
Thumbnail | Title | Date Uploaded | Actions |
---|---|---|---|
Maximizing the ability to detect modifying genetic factors of rare complex disorders – Fragile X-Associated Primary Ovarian Insufficiency and Down Syndrome - Congenital Heart Defects () | 2020-07-05 15:38:30 -0400 |
|
Supplemental Files
Thumbnail | Title | Date Uploaded | Actions |
---|