High Resolution X Chromosome Copy Number Variation in Autism
The autism spectrum disorders (ASD) are a broadly defined set of developmental
disorders that include autism and Asperger syndrome. Individuals with ASD are defined
as having impairments in social interaction, deficiencies in communication, as well as
restricted and stereotyped behaviors and interests. Leo Kanner first described autism in
1943, and subsequent twin and family studies have demonstrated a substantial genetic
component underlying ASD. A marked increase in the prevalence of ASD has been noted
in the last decade, and the most recent estimate suggests a prevalence of 1:88. A four- to
ten-fold male preponderance of ASD suggests the existence of sex-specific risk alleles
and the possibility of a recessive susceptibility locus on the X-chromosome. In the last
eight years, copy number variation (CNV) has been appreciated as a rich source of both
inherited and de novo human genomic variation.
Technological advances in array Comparative Genomic Hybridization (aCGH), the
common microarray based assay used to assess copy number state, have enabled
detection of increasingly smaller variants. We developed a custom high-density array
consisting of 2.1 million oligonucleotide probes dedicated to the X-chromosome. Non-
repetitive sequence is probed at a resolution of one probe per 50 bp or 96.8 megabases of
unique sequence. Additionally, we further enhanced the stringency and thus rigor with
which samples are interrogated by developing a hybridization and wash protocol for the
Tecan HSPro 4800. The application of this machine standardized hybridization, wash,
and dry conditions across all samples.
Using our custom X-chromosome CGH microarrays, we screened three cohorts for X-
chromosome CNV. The first cohort was a series of 100 ASD males from the Autism
Genetic Resource Exchange collection. Our second cohort of 64 ASD males was derived
from the Simons Simplex Collection. Finally, our third cohort consisted of 100 males
from a National Institute of Mental Health control population where individuals were
selected as controls for the study of neuropsychiatric disease. In total, 164 ASD males
and 100 non-ASD males were evaluated for X-chromosome CNV.
Table of Contents
TABLE OF CONTENTS
ABSTRACT COVER PAGE
TABLE OF CONTENTS
LIST OF TABLES
LIST OF FIGURES
I.I.a Background and prevalence of the autism spectrum disorders
I.I.b A strong genetic component underlies ASD
I.I.c Genes and loci causal and implicated in ASD
I.II Known genomic structural changes in ASD
I.II.a Knowledge of CNV as discerned from oligonucleotide-array based studies
I.II.b CNV in the ASD
I.III A role for the X chromosome in ASD
I.III.a A hemizygous X chromosome as a susceptibility to ASD
I.III.b Sex-specific risk loci exist in ASD
I.III.c Skewed X-inactivation is increased in autistic females
I.IV.a Summary of autism, CNV in disease, and the X chromosome in ASD
I.IV.b Identification of novel ASD loci: Hypothesis and Experimental design
I.IV.c Summary of our experiments and samples studied
II. OPTIMIZATION OF THE NIMBLEGEN ARRAY COMPARATIVE GENOMIC HYBRIDAZATION PROTOCOL
II.I.a Properties of Oligonucleotide Microarray Data
II.I.b Specifications for the CGH arrays (385K, 2.1M)
II.I.c General description of the manufacturer's protocol
II.I.d Experimental steps that were altered
II.II Optimized Steps
II.II.a Experimental Protocol Changes and quality controls instituted
II.II.b Computational Protocol Changes and Quality Control Steps Instituted
II.III.a Results from Experimental Changes
II.III.b Results from Computational Changes
II.III.c Results from the Sum of All Optimized (Experimental and Computational) Steps
III. X CHROMOSOME COPY NUMBER VARIATION AND BREAKPOINT ANALYSIS
III.I Introduction to Copy Number Variation on the X Chromosome
III.II Copy Number Changes Identified by array CGH
III.II.a Study One: Four individuals from AGRE on the 385K array using the NimbleGen protocol
III.II.b Study Two: Fifty individuals from AGRE on the 2.1M array using the NimbleGen protocol
III.II.c Study Three: 100 affected males from AGRE and 64 affected males from SSC on the 2.1M array using the Optimized protocol
III.II.d Study Four: 100 unaffected males from NIMH on the 2.1M array using the NimbleGen protocol
III.III Validated Copy Number Changes
III.III.a Characteristics of validation assays developed for CNV identified in AGRE and SSC cohorts
III.III.b Inheritance and Population Data for validated CNV
III.III.c Candidate genes that have been validated and remain to be validated
III.IV Analysis of Junction Sequence
III.IV.a Validated CNV with breakpoint sequencing
III.IV.b Analysis of sequence motifs at breakpoint junctions
III.IV.c Development and evaluation of randomly chosen breakpoint sequences
IV.I The Optimization of array Comparative Genomic Hybridization and CNV Identification
IV.II Copy Number Variation in Autistic and Normal Populations
IV.II.a Chromosome X CNV in individuals with autism
IV.II.b CNV identification in 102 normal males
IV.III Why might we have not found loci involved with ASD?
IV.III.a The technological limitations of array CGH
IV.III.b Consistency of sample phenotyping in autism
IV.IV Alternative hypotheses that may explain our findings
IV.IV.a Our cohorts are comprised of a genetically heterogeneous population
IV.IV.b Observed CNV may have reduced penetrance in a normal population
IV.IV.c CNV identified in our cohorts may require additionally altered loci for a manifestation of autism
IV.IV.d Partial inactivation of X chromosome loci can act as susceptibility loci to ASD in males.
IV.V Final Summary
V. SUBJECTS, MATERIALS, and METHODS
V.I Sample DNA
V.I.a Autism Genetic Resource Exchange (AGRE)
V.I.b Simons Simplex Collection (SSC)
V.I.c National Institute of Mental Health (NIMH) control population
V.I.d Array Comparative Genomic Hybridization (aCGH) Reference DNA: NA10851
V.I.e Validation control DNA
V.II AGRE Pedigree Selection
V.III Database of Genomic Variants: CNV and indel analysis
V.IV Comparative Genomic Hybridization (CGH) Arrays
V.V aCGH Protocol and Scanning
V.V.a NimbleGen's protocol for sample processing, array hybridization and wash, and
V.V.b Optimized protocol for sample processing, array hybridization and wash
V.V.c Array scanning
V.VI CNV identification
V.VI.a Create .pair reports in NimbleScan
V.VI.b Remove poorly behaving probes (5% of all experimental)
V.VI.c Run NimbleScan analysis algorithm (segMNT)
V.VI.d Quality Control
V.VII Analysis of CNV identified by the NimbleGen and Optimized protocols
V.VII.a All CNV called by the NimbleGen and Optimized protocols
V.VII.b CNV characteristics generated by the NimbleGen and Optimized protocols
V.VII.c Analysis of false positive and true positive calls
V.VIII Probe Analysis
V.VIII.a Identification of 'poorly' behaving probes
V.VIII.b Evaluation of ' poor' and ' good' probes
V.VIII.c Evaluation of probe log2 values by processing protocol: NimbleGen versus
V.VIII.d Evaluation of probe coverage (probes/kb) by processing protocol: NimbleGen
V.IX Select high-confidence segments or copy number variants
V.IX.a Select segments greater than one standard deviation from the mean
V.IX.b Identify and merge multiple segments calling a single, variant locus
V.IX.c Select segments which have more than nine probes/kb
V.IX.d Remove samples with relatively high call rates
V.X Validation of array identified CNV
V.X.a PCR Confirmation
V.X.b Breakpoint sequencing
V.XI Junction analysis
V.XI.a Literature breakpoint evaluation
V.XI.b Single nucleotide insertion or deletion at the breakpoint
V.XI.c Development of random data set
V.XI.d Evaluation of Real and Random CNV for the frequency of motif occurrence at the
V.XII Assess functional activity of GRIA3 deletion: Luciferase Reporter Assay
V.XII.a Ligate region of interest into reporter backbone
V.XII.b Transfect reporter plasmids into Neuro2A cells
V.XII.c Assay luciferase activity
V.XII.d Evaluate luciferase data
A.I Tables A.1 - A.4
A.II Table A.5
A.III Tables A.6.a-b
Table 1.1 CNV detection and genotyping in autistic and unaffected populations
Table 2.1 Number of high variance probes.
Table 2.2 'Poor' and Well Behaving Probes Evaluated Across Four Parameters
Table 2.3Probe Behavior by NimbleGen and Optimized Protocols
Table 2.4 CNV identified by the NimbleGen and Optimized protocols.
Table 2.5 CNV characteristics by the NimbleGen and Optimized protocols.
Table 2.6 True and False positive loci by the NimbleGen and Optimized protocols.
Table 3.1.a All Copy Number Variants Identified by high-density aCGH
Table 3.1.b Distinct Copy Number Variants Identified by high-density aCGH
Table 3.2 Characteristics of validated copy number variants.
Table 3.3 CNV genotyping in progress for autistic and unaffected populations.
Table 3.4 Validated and bidirectionally sequenced CNV breakpoints identified in our study.
Table 3.5 Summary of articles identified in the literature.
Table 3.6 Characteristics of breakpoint sequenced CNV from the literature and our studies.
Table 3.7.a Characteristics of junction breaks from our study and the literature: Deletion and Duplication
Table 3.7.b Characteristics of junction breaks from our study and the literature: Homology Characteristics
Table 3.7.c Characteristics of junction breaks from our study and the literature: Insertion Characteristics
Table 3.8 Motifs previously associated with genomic rearrangement.
Table 3.9 Motifs with literature frequencies significantly different from random.
Table A.1 CNV from four AGRE samples run on 385K arrays by the NimbleGen protocol.
Table A.2 CNV from 50 AGRE samples run on 2.1M arrays by the NimbleGen protocol.
Table A.3 CNV from 100 AGRE and 64 SSC samples run on 2.1M arrays by the Optimized protocol.
Table A.4 CNV from 102 NIMH samples run on 2.1M arrays by NimbleGen protocol.
Table A.5 Sequenced Breakpoints Identified in the Literature
Table A.6.a Truly called CNV by the NimbleGen and Optimized protocols.
Table A.6.b Falsely called CNV by the NimbleGen and Optimized protocols.
Figure 1.1.a Proportion of References and CNV or indels reported in the DGV.
Figure 1.1.b Proportion of CNV or indels reported in the DGV by chromosome.
Figure 2.1 NimbleGen and Optimized DNA labeling strategy.
Figure 2.2 Scanning parameters as informed by the Intensity Distribution Histogram
Figure 2.3.a Variance Analysis of Subarray A01 Probes
Figure 2.3.b Variance Analysis of Subarray A02 Probes
Figure 2.3.c Variance Analysis of Subarray A03 Probes
Figure 2.4.a Acceptable MA Plot
Figure 2.4.b Poor MA Plot #1
Figure 2.4.c Poor MA Plot #2
Figure 2.5 Hypothetical example of merging multiple segments representing a single, deleted locus.
Figure 2.6 Array performance by NimbleGen and Optimized protocols.
Figure 2.7.a Boxplots of Good and Bad probes by Length.
Figure 2.7.b Boxplots of Good and Bad probes by GC Content.
Figure 2.7.c Boxplots of Good and Bad probes by AG Content.
Figure 2.7.d Boxplots of Good and Bad probes by melting temperature.
Figure 2.8.a CNV call overlap of NimbleGen and Optimized protocols: All calls
Figure 2.8.b CNV call overlap of NimbleGen and Optimized protocols: All deletions
Figure 2.8.c CNV call overlap of NimbleGen and Optimized protocols: All duplications
Figure 2.9.a CNV size distributions by the NimbleGen and Optimized protocols.
Figure 2.9.b CNV probes/kb by the NimbleGen and Optimized protocols.
Figure 2.9.c CNV GC content by the NimbleGen and Optimized protocols.
Figure 2.10.a Overlap of CNV truly called by NimbleGen and Optimized protocols.
Figure 2.10.b Overlap of CNV falsely called by NimbleGen and Optimized protocols.
Figure 2.11.a NimbleGen Protocol: Proportion of true and false calls plotted by GC Content
Figure 2.11.b Optimized Protocol: Proportion of true and false calls plotted by GC Content
Figure 3.1 Proportion of References and CNV or indels reported in the DGV.
Figure 3.2.a Size distribution of CNV and Indels from the Database of Genomic Variants.
Figure 3.2.b Size distribution of CNV and indels from all autosomes reported by Conrad et al.
Figure 3.2.c Size distribution of CNV and indels identified in the AGRE cohort folloing the NimbleGen protocol.
Figure 3.2.d Size distribution of CNV and indels from the AGRE cohort following the Optimized protocol.
Figure 3.2.e Size distribution of CNV and indels from the NIMH cohort following the NimbleGen protocol.
Figure 3.3 Proportion of true and false calls plotted by GC Content.
Figure 3.4 2.5kb intragenic deletion of FRMPD4.
Figure 3.5 561 bp deletion 1.5 kb upstream of GRIA3.
Figure 3.6 The GRIA3 promoter deletion has increased activity.
Figure 3.7.a Literature breakpoint sequence distribution throughout the genome.
Figure 3.7.b Median size of literature breakpoint sequence by chromosome.
Figure 3.8.a Size distribution of homologies at breakpoints.
Figure 3.8.b Grouped size distribution of homologies at breakpoints.
Figure 3.8.c Size distribution of insertions at breakpoints.
Figure 3.9.a Average GC content of 1,000 Random Sequences and Real Sequence.
Figure 3.9.b Average N content of 1,000 Random Sequences and Real Sequence.
Figure 3.10 Schematic for breakpoint junction analysis.
Figure 4.1 Three duplication calls map to the SYP (X-linked Mental Retardation gene) gene.
Figure 5.1 NimbleGen aCGH Protocol used.
Figure 5.2 Optimized aCGH Protocol.
Figure 5.3 Scanning parameters as informed by the Intensity Distribution Histogram
Figure 5.4 Probe Variance Analysis
Figure 5.5 Hypothetical example of merging multiple segments representing a single, deleted locus.
Figure 5.6 Histogram of the number of CNV calls per individual.
Figure 5.7 Validation of an array identified deletion.
Figure 5.8 Validation strategy for tandem duplications.
Figure 5.9 Validation of an array identified duplication.
Figure 5.10 Schematic for breakpoint junction analysis.
About this Dissertation
|Subfield / Discipline|
|Committee Chair / Thesis Advisor|
|High Resolution X Chromosome Copy Number Variation in Autism ()||2018-08-28||