High Resolution X Chromosome Copy Number Variation in Autism Open Access

Ikeda, Morna Aiko (2012)

Permanent URL: https://etd.library.emory.edu/concern/etds/tx31qj52t?locale=en
Published

Abstract

Abstract
High Resolution X Chromosome Copy Number Variation in Autism
The autism spectrum disorders (ASD) are a broadly defined set of developmental
disorders that include autism and Asperger syndrome. Individuals with ASD are defined
as having impairments in social interaction, deficiencies in communication, as well as
restricted and stereotyped behaviors and interests. Leo Kanner first described autism in
1943, and subsequent twin and family studies have demonstrated a substantial genetic
component underlying ASD. A marked increase in the prevalence of ASD has been noted
in the last decade, and the most recent estimate suggests a prevalence of 1:88. A four- to
ten-fold male preponderance of ASD suggests the existence of sex-specific risk alleles
and the possibility of a recessive susceptibility locus on the X-chromosome. In the last
eight years, copy number variation (CNV) has been appreciated as a rich source of both
inherited and de novo human genomic variation.
Technological advances in array Comparative Genomic Hybridization (aCGH), the
common microarray based assay used to assess copy number state, have enabled
detection of increasingly smaller variants. We developed a custom high-density array
consisting of 2.1 million oligonucleotide probes dedicated to the X-chromosome. Non-
repetitive sequence is probed at a resolution of one probe per 50 bp or 96.8 megabases of
unique sequence. Additionally, we further enhanced the stringency and thus rigor with
which samples are interrogated by developing a hybridization and wash protocol for the
Tecan HSPro 4800. The application of this machine standardized hybridization, wash,
and dry conditions across all samples.
Using our custom X-chromosome CGH microarrays, we screened three cohorts for X-
chromosome CNV. The first cohort was a series of 100 ASD males from the Autism
Genetic Resource Exchange collection. Our second cohort of 64 ASD males was derived
from the Simons Simplex Collection. Finally, our third cohort consisted of 100 males
from a National Institute of Mental Health control population where individuals were
selected as controls for the study of neuropsychiatric disease. In total, 164 ASD males
and 100 non-ASD males were evaluated for X-chromosome CNV.

Table of Contents

TABLE OF CONTENTS

DISTRIBUTION AGREEMENT

APPROVAL SHEET

ABSTRACT COVER PAGE

ABSTRACT

COVER PAGE

ACKNOWLEDGMENTS

TABLE OF CONTENTS

LIST OF TABLES

LIST OF FIGURES

I. INTRODUCTION

I.I Introduction

I.I.a Background and prevalence of the autism spectrum disorders

I.I.b A strong genetic component underlies ASD

I.I.c Genes and loci causal and implicated in ASD

I.II Known genomic structural changes in ASD

I.II.a Knowledge of CNV as discerned from oligonucleotide-array based studies

I.II.b CNV in the ASD

I.III A role for the X chromosome in ASD

I.III.a A hemizygous X chromosome as a susceptibility to ASD

I.III.b Sex-specific risk loci exist in ASD

I.III.c Skewed X-inactivation is increased in autistic females

I.IV Conclusion

I.IV.a Summary of autism, CNV in disease, and the X chromosome in ASD

I.IV.b Identification of novel ASD loci: Hypothesis and Experimental design

I.IV.c Summary of our experiments and samples studied

I.V References

I.VI Tables

I.VII Figures

II. OPTIMIZATION OF THE NIMBLEGEN ARRAY COMPARATIVE GENOMIC HYBRIDAZATION PROTOCOL

II.I Background

II.I.a Properties of Oligonucleotide Microarray Data

II.I.b Specifications for the CGH arrays (385K, 2.1M)

II.I.c General description of the manufacturer's protocol

II.I.d Experimental steps that were altered

II.II Optimized Steps

II.II.a Experimental Protocol Changes and quality controls instituted

II.II.b Computational Protocol Changes and Quality Control Steps Instituted

II.III Results

II.III.a Results from Experimental Changes

II.III.b Results from Computational Changes

II.III.c Results from the Sum of All Optimized (Experimental and Computational) Steps

II.IV Summary

II.V Tables

II.VI Figures

III. X CHROMOSOME COPY NUMBER VARIATION AND BREAKPOINT ANALYSIS

III.I Introduction to Copy Number Variation on the X Chromosome

III.II Copy Number Changes Identified by array CGH

III.II.a Study One: Four individuals from AGRE on the 385K array using the NimbleGen protocol

III.II.b Study Two: Fifty individuals from AGRE on the 2.1M array using the NimbleGen protocol

III.II.c Study Three: 100 affected males from AGRE and 64 affected males from SSC on the 2.1M array using the Optimized protocol

III.II.d Study Four: 100 unaffected males from NIMH on the 2.1M array using the NimbleGen protocol

III.III Validated Copy Number Changes

III.III.a Characteristics of validation assays developed for CNV identified in AGRE and SSC cohorts

III.III.b Inheritance and Population Data for validated CNV

III.III.c Candidate genes that have been validated and remain to be validated

III.IV Analysis of Junction Sequence

III.IV.a Validated CNV with breakpoint sequencing

III.IV.b Analysis of sequence motifs at breakpoint junctions

III.IV.c Development and evaluation of randomly chosen breakpoint sequences

III.V Summary

III.VI References

III.VII Tables

III.VIII Figures

IV. CONCLUSION

IV.I The Optimization of array Comparative Genomic Hybridization and CNV Identification

IV.II Copy Number Variation in Autistic and Normal Populations

IV.II.a Chromosome X CNV in individuals with autism

IV.II.b CNV identification in 102 normal males

IV.III Why might we have not found loci involved with ASD?

IV.III.a The technological limitations of array CGH

IV.III.b Consistency of sample phenotyping in autism

IV.IV Alternative hypotheses that may explain our findings

IV.IV.a Our cohorts are comprised of a genetically heterogeneous population

IV.IV.b Observed CNV may have reduced penetrance in a normal population

IV.IV.c CNV identified in our cohorts may require additionally altered loci for a manifestation of autism

IV.IV.d Partial inactivation of X chromosome loci can act as susceptibility loci to ASD in males.

IV.V Final Summary

IV.VI References

IV.VII Figures

V. SUBJECTS, MATERIALS, and METHODS

V.I Sample DNA

V.I.a Autism Genetic Resource Exchange (AGRE)

V.I.b Simons Simplex Collection (SSC)

V.I.c National Institute of Mental Health (NIMH) control population

V.I.d Array Comparative Genomic Hybridization (aCGH) Reference DNA: NA10851

V.I.e Validation control DNA

V.II AGRE Pedigree Selection

V.III Database of Genomic Variants: CNV and indel analysis

V.IV Comparative Genomic Hybridization (CGH) Arrays

V.V aCGH Protocol and Scanning

V.V.a NimbleGen's protocol for sample processing, array hybridization and wash, and

scanning

V.V.b Optimized protocol for sample processing, array hybridization and wash

V.V.c Array scanning

V.VI CNV identification

V.VI.a Create .pair reports in NimbleScan

V.VI.b Remove poorly behaving probes (5% of all experimental)

V.VI.c Run NimbleScan analysis algorithm (segMNT)

V.VI.d Quality Control

V.VII Analysis of CNV identified by the NimbleGen and Optimized protocols

V.VII.a All CNV called by the NimbleGen and Optimized protocols

V.VII.b CNV characteristics generated by the NimbleGen and Optimized protocols

V.VII.c Analysis of false positive and true positive calls

V.VIII Probe Analysis

V.VIII.a Identification of 'poorly' behaving probes

V.VIII.b Evaluation of ' poor' and ' good' probes

V.VIII.c Evaluation of probe log2 values by processing protocol: NimbleGen versus

Optimized

V.VIII.d Evaluation of probe coverage (probes/kb) by processing protocol: NimbleGen

versus Optimized

V.IX Select high-confidence segments or copy number variants

V.IX.a Select segments greater than one standard deviation from the mean

V.IX.b Identify and merge multiple segments calling a single, variant locus

V.IX.c Select segments which have more than nine probes/kb

V.IX.d Remove samples with relatively high call rates

V.X Validation of array identified CNV

V.X.a PCR Confirmation

V.X.b Breakpoint sequencing

V.XI Junction analysis

V.XI.a Literature breakpoint evaluation

V.XI.b Single nucleotide insertion or deletion at the breakpoint

V.XI.c Development of random data set

V.XI.d Evaluation of Real and Random CNV for the frequency of motif occurrence at the

breakpoints

V.XII Assess functional activity of GRIA3 deletion: Luciferase Reporter Assay

V.XII.a Ligate region of interest into reporter backbone

V.XII.b Transfect reporter plasmids into Neuro2A cells

V.XII.c Assay luciferase activity

V.XII.d Evaluate luciferase data

V.XIII References

V.XIV Tables

V.XV Figures

APPENDIX

A.I Tables A.1 - A.4

A.II Table A.5

A.III Tables A.6.a-b

A.IV Tables


TABLES

Table 1.1 CNV detection and genotyping in autistic and unaffected populations

Table 2.1 Number of high variance probes.

Table 2.2 'Poor' and Well Behaving Probes Evaluated Across Four Parameters

Table 2.3Probe Behavior by NimbleGen and Optimized Protocols

Table 2.4 CNV identified by the NimbleGen and Optimized protocols.

Table 2.5 CNV characteristics by the NimbleGen and Optimized protocols.

Table 2.6 True and False positive loci by the NimbleGen and Optimized protocols.

Table 3.1.a All Copy Number Variants Identified by high-density aCGH

Table 3.1.b Distinct Copy Number Variants Identified by high-density aCGH

Table 3.2 Characteristics of validated copy number variants.

Table 3.3 CNV genotyping in progress for autistic and unaffected populations.

Table 3.4 Validated and bidirectionally sequenced CNV breakpoints identified in our study.

Table 3.5 Summary of articles identified in the literature.

Table 3.6 Characteristics of breakpoint sequenced CNV from the literature and our studies.

Table 3.7.a Characteristics of junction breaks from our study and the literature: Deletion and Duplication

Proportions

Table 3.7.b Characteristics of junction breaks from our study and the literature: Homology Characteristics

Table 3.7.c Characteristics of junction breaks from our study and the literature: Insertion Characteristics

Table 3.8 Motifs previously associated with genomic rearrangement.

Table 3.9 Motifs with literature frequencies significantly different from random.

Table A.1 CNV from four AGRE samples run on 385K arrays by the NimbleGen protocol.

Table A.2 CNV from 50 AGRE samples run on 2.1M arrays by the NimbleGen protocol.

Table A.3 CNV from 100 AGRE and 64 SSC samples run on 2.1M arrays by the Optimized protocol.

Table A.4 CNV from 102 NIMH samples run on 2.1M arrays by NimbleGen protocol.

Table A.5 Sequenced Breakpoints Identified in the Literature

Table A.6.a Truly called CNV by the NimbleGen and Optimized protocols.

Table A.6.b Falsely called CNV by the NimbleGen and Optimized protocols.


FIGURES

Figure 1.1.a Proportion of References and CNV or indels reported in the DGV.

Figure 1.1.b Proportion of CNV or indels reported in the DGV by chromosome.

Figure 2.1 NimbleGen and Optimized DNA labeling strategy.

Figure 2.2 Scanning parameters as informed by the Intensity Distribution Histogram

Figure 2.3.a Variance Analysis of Subarray A01 Probes

Figure 2.3.b Variance Analysis of Subarray A02 Probes

Figure 2.3.c Variance Analysis of Subarray A03 Probes

Figure 2.4.a Acceptable MA Plot

Figure 2.4.b Poor MA Plot #1

Figure 2.4.c Poor MA Plot #2

Figure 2.5 Hypothetical example of merging multiple segments representing a single, deleted locus.

Figure 2.6 Array performance by NimbleGen and Optimized protocols.

Figure 2.7.a Boxplots of Good and Bad probes by Length.

Figure 2.7.b Boxplots of Good and Bad probes by GC Content.

Figure 2.7.c Boxplots of Good and Bad probes by AG Content.

Figure 2.7.d Boxplots of Good and Bad probes by melting temperature.

Figure 2.8.a CNV call overlap of NimbleGen and Optimized protocols: All calls

Figure 2.8.b CNV call overlap of NimbleGen and Optimized protocols: All deletions

Figure 2.8.c CNV call overlap of NimbleGen and Optimized protocols: All duplications

Figure 2.9.a CNV size distributions by the NimbleGen and Optimized protocols.

Figure 2.9.b CNV probes/kb by the NimbleGen and Optimized protocols.

Figure 2.9.c CNV GC content by the NimbleGen and Optimized protocols.

Figure 2.10.a Overlap of CNV truly called by NimbleGen and Optimized protocols.

Figure 2.10.b Overlap of CNV falsely called by NimbleGen and Optimized protocols.

Figure 2.11.a NimbleGen Protocol: Proportion of true and false calls plotted by GC Content

Figure 2.11.b Optimized Protocol: Proportion of true and false calls plotted by GC Content

Figure 3.1 Proportion of References and CNV or indels reported in the DGV.

Figure 3.2.a Size distribution of CNV and Indels from the Database of Genomic Variants.

Figure 3.2.b Size distribution of CNV and indels from all autosomes reported by Conrad et al.

Figure 3.2.c Size distribution of CNV and indels identified in the AGRE cohort folloing the NimbleGen protocol.

Figure 3.2.d Size distribution of CNV and indels from the AGRE cohort following the Optimized protocol.

Figure 3.2.e Size distribution of CNV and indels from the NIMH cohort following the NimbleGen protocol.

Figure 3.3 Proportion of true and false calls plotted by GC Content.

Figure 3.4 2.5kb intragenic deletion of FRMPD4.

Figure 3.5 561 bp deletion 1.5 kb upstream of GRIA3.

Figure 3.6 The GRIA3 promoter deletion has increased activity.

Figure 3.7.a Literature breakpoint sequence distribution throughout the genome.

Figure 3.7.b Median size of literature breakpoint sequence by chromosome.

Figure 3.8.a Size distribution of homologies at breakpoints.

Figure 3.8.b Grouped size distribution of homologies at breakpoints.

Figure 3.8.c Size distribution of insertions at breakpoints.

Figure 3.9.a Average GC content of 1,000 Random Sequences and Real Sequence.

Figure 3.9.b Average N content of 1,000 Random Sequences and Real Sequence.

Figure 3.10 Schematic for breakpoint junction analysis.

Figure 4.1 Three duplication calls map to the SYP (X-linked Mental Retardation gene) gene.

Figure 5.1 NimbleGen aCGH Protocol used.

Figure 5.2 Optimized aCGH Protocol.

Figure 5.3 Scanning parameters as informed by the Intensity Distribution Histogram

Figure 5.4 Probe Variance Analysis

Figure 5.5 Hypothetical example of merging multiple segments representing a single, deleted locus.

Figure 5.6 Histogram of the number of CNV calls per individual.

Figure 5.7 Validation of an array identified deletion.

Figure 5.8 Validation strategy for tandem duplications.

Figure 5.9 Validation of an array identified duplication.

Figure 5.10 Schematic for breakpoint junction analysis.

About this Dissertation

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Subfield / Discipline
Degree
Submission
Language
  • English
Research field
Keyword
Committee Chair / Thesis Advisor
Committee Members
Last modified

Primary PDF

Supplemental Files