A new high-throughput variant annotation and classification method and its implementation in the investigation of schizophrenia-associated copy number regions Open Access

Kotlar, Alex (Fall 2018)

Permanent URL: https://etd.library.emory.edu/concern/etds/bk128b833?locale=en
Published

Abstract

Schizophrenia research has undergone a recent transformation, through advances in sequencing technology and study design. This thesis reviews recent advances in the genetics of schizophrenia, presents two new, online and cloud-based computational methods, and then applies these methods in a rare-variant analysis in 3,504 individuals.

The first section of the thesis reviews recent advances in schizophrenia genetics, especially in the identification of large-effect size copy number variants. Among these, the 3q29 microdeletion is now known to be the single largest schizophrenia risk factor. Next-generation sequencing studies are increasingly used for rare variant association testing and have already facilitated identification of large effect alleles. Taken together, these results suggest the possibility of imminent breakthroughs in the molecular understanding of schizophrenia.

The second section describes a new computational method, Bystro, which is the first cloud-based application that makes variant annotation and filtering accessible online for terabyte-sized whole-genome experiments. Its key innovation is a general-purpose, natural-language search engine that enables users to identify and export alleles and samples of interest in milliseconds. The search engine dramatically simplifies complex filtering tasks that previously required programming experience or specialty command-line programs. Critically, Bystro’s annotation and filtering capabilities are orders of magnitude faster than previous solutions, saving weeks of processing time for large experiments.

The third section extends Bystro’s work to bring online a new primer design method, called MPD. It enables for the first time large-scale, multiplexed primer design genome-wide in a web application.

The fourth and final section describes the application of Bystro to the analysis of the largest, by effect size, schizophrenia-associated copy number variants (CNVs). While the disruption of these loci increase SZ risk by as much as 41x, it remains unclear which of the genes in those intervals explains the effect. By conducting high-coverage targeted sequencing of SZ-associated CNV genomic intervals we identify an excess of ultra-rare, deleterious single-nucleotide polymorphisms (SNPs) in the 16p11 region. To our knowledge this is the first time the joint effects of rare SNPs have implicated 16p11 and suggests that targeted sequencing approaches in SZ-associated regions can yield valuable insights into this polygenic disorder. 

Table of Contents

Table of Contents

1 Review of recent advances in the genetics of schizophrenia 2

1.1 Background 2

1.2 Copy number variants 5

1.2.1 1q21 Deletion and Duplication 6

1.2.2 2p16.3/NRXN1 Deletion 7

1.2.3 3q29 Deletion 7

1.2.4 7q36.3/VIPR2 Duplication: 8

1.2.5 7q11.23 Duplication: 9

1.2.6 15q11.2 and 15q13.3 Deletions and 15q11-q13 Duplication 9

1.2.7 16p11.2 and16p12.1 Deletion, 16p13.1 Deletion and Duplication: 10

1.2.8 17p12 Deletion, 17q12 Deletion and Duplication: 11

1.2.9 22q11.2 Deletion: 12

1.3 Rare variant and genome wide association studies 12

1.4 Next generation sequencing for rare variants 13

1.5 Genome wide association studies 13

1.6 Biological pathways 15

1.7 Overlap with other psychiatric disorders 17

1.8 Discussion 18

2 Bystro: rapid online variant annotation and natural-language filtering at whole-genome scale 21

2.1 Background 21

2.2 Results 22

2.3 Discussion 33

2.4 Conclusions 38

2.5 Methods 39

2.5.1 Accessing Bystro 39

2.5.2 Bystro Database 39

2.5.3 WGS Datasets 40

2.5.4 Online annotation comparisons 40

2.5.5 Variant filtering comparisons 42

2.5.6 Filtering accuracy comparison 44

2.5.7 Offline annotation comparisons 44

2.5.8 Annotation accuracy comparison 46

3 MPD: multiplex primer design for next-generation targeted sequencing 48

3.1 Background 48

3.2 Implementation 49

3.3 Web Application 52

3.4 Primer Design and Capture 52

3.5 Primer Design Validation 53

3.6 Results 54

3.7 Discussion 55

3.8 Conclusion 56

4. Targeted sequencing of schizophrenia-associated copy number intervals 58

4.1 Background 58

4.2 Methods Summary 59

4.2.1 Sequencing 59

4.2.2 Calling 60

4.2.3 Pre-Annotation Quality Control 60

4.2.4 Annotation quality control 60

4.2.5 Variant filters and sets 61

4.2.6 Variance-component tests 62

4.3 Results 64

4.3.1 Quality Control 64

4.3.2 Statistical tests 66

4.3.3 Deleterious variant tests 68

4.3.4 Rare deleterious variant set 70

4.3.5 Ultra-rare deleterious variant set 71

4.4 Discussion 71

4.5 Conclusion 72

5. Discussion 73

About this Dissertation

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Degree
Submission
Language
  • English
Research field
Keyword
Committee Chair / Thesis Advisor
Committee Members
Last modified

Primary PDF

Supplemental Files