A Biochemical Model of Hybridization on DNA Microarrays and its Application to Single Nucleotide Polymorphism and Copy Number Variation Genotyping in Trisomy 21 Individuals Public
Jakubek, Yasminka Aleksandra (2014)
Abstract
DNA microarrays have several uses in biological research. In the field of human genetics, they are used to characterize genome-wide patterns of variation. Affymetrix Genome-wide Human SNP array 6.0 microarrays genotype ~900,000 single nucleotide polymorphisms (SNPs) and copy number variants (CNVs) across the genome. Analysis methods for genotyping arrays rely on statistical approaches to generate accurate data. Two recurrent problems with these methods are evident. The first is the existence of batch effects. The second lies in the fact that these approaches often discard a large fraction of the raw data from probes that systematically fail across experiments. In order to address and understand these problems, we developed a novel analysis method that is based on a low-level model of hybridization on microarrays. We model binding between all probe-DNA duplexes that form on the array. In addition we model errors in probe synthesis, hybridization conditions (temperature, salt concentration), and details of the experimental protocol (target concentration, target fragmentation, wash stringency, and scanner settings). We used this model to predict probe intensities. The average correlation between expected and observed intensities was 0.701 with a range of 0.88 to 0.55. In this model batch effects are caused by differences in probe synthesis efficiency, target concentration, target fragmentation, wash stringency, and scanner settings. We used this model to develop a SNP and CNV genotyping algorithm that explicitly models batch effects and cross-hybridization. Our approach allows for the individual analysis of chips and can call SNPs and CNVs on chromosomes of any ploidy. We used this approach to analyze Down syndrome and normal samples. A significant percentage (13%) of SNPs that are targeted by Affymetrix 6.0 have high levels of cross-hybridization. Each SNP call has a quality score (QS). SNPs on trisomic chromosomes had lower QS scores (57% with QS> 0.99) than SNPs on diploid chromosomes (84% with QS > 0.99). Our approach uses direct estimates of DNA concentration to call CNVs. We called an average of 50 CNVs per samples of which 68% are in known CNV regions. Using only first-principles our method detects genetic variants with a comparable accuracy to current approaches.
Table of Contents
Chapter 1
General Introduction 1
References 11
Chapter 2
A Model of Binding on DNA Microarrays: 16
Understanding the Combined Effect of Probe
Synthesis Failure, Cross-Hybridization, DNA
Fragmentation and other Experimental
Details of Affymetrix Arrays
Abstract 17
Background 19
Methods 23
Results 33
Discussion 38
References 46
Chapter 3
SNP and CNV Detection in Normal and Trisomy 75
21 Individuals using a First-Principles Approach
Abstract 76
Introduction 77
Methods 81
Results 92
Discussion 94
References 97
Chapter 4
Conclusions 112
References 116
Figure and Tables
Chapter 1
General Introduction 1
Chapter 2
A Model of Binding on DNA Microarrays: 16
Understanding the Combined Effect of Probe
Synthesis Failure, Cross-Hybridization, DNA
Fragmentation and other Experimental
Details of Affymetrix Arrays
Table 1. Synthesis Errors 53
Table 2. Correlations 54
Table 3. Incorporation Rates 55
Table 4. Retention Rates 56
Figure 1. Probe and Target Sequences 57
Figure 2. Probe-Target Binding 58
Figure 3. Gompertz Curve 59
Figure 4. Observed and Expected Intensity Plots 60
Figure 5. Nearest-Neighbor Parameter Search 62
Figure 6. Predicted Effect of Mismatches on Intensity 64
Supplementary Table 1. Entropy and Enthalpy NN values 65
Supplementary Table 2. Individual Chip Data 66
Supplementary Figure 1. Forward/Reverse Probe 74
intensities as a function of
nucleotide composition
Chapter 3
SNP and CNV Detection in Normal and Trisomy 75
21 Individuals using a First-Principles Approach
Table 1. Data Summary 102
Table 2. Fraction of Mendelian Consistent SNPs 103
Table 3. Duplicate Agreement, Heterozygous calls 104
Trisomy 21
Table 4. Accuracy for Heterozygous Trisomy 21 calls 105
Table 5. Fraction of Autosomal Deletions 106
and Duplications in DGV
Table 6. CNV Trio Data 107
Figure 1. Distribution of Chi-Square Values for 108
Hardy-Weinberg Equilibrium
Figure 2. Duplicate Agreement 110
Figure 3. Genotype Combinations with Power 111
Chapter 4
Conclusions 112
About this Dissertation
School | |
---|---|
Department | |
Subfield / Discipline | |
Degree | |
Submission | |
Language |
|
Research Field | |
Mot-clé | |
Committee Chair / Thesis Advisor | |
Committee Members |
Primary PDF
Thumbnail | Title | Date Uploaded | Actions |
---|---|---|---|
A Biochemical Model of Hybridization on DNA Microarrays and its Application to Single Nucleotide Polymorphism and Copy Number Variation Genotyping in Trisomy 21 Individuals () | 2018-08-28 15:26:36 -0400 |
|
Supplemental Files
Thumbnail | Title | Date Uploaded | Actions |
---|