A Biochemical Model of Hybridization on DNA Microarrays and its Application to Single Nucleotide Polymorphism and Copy Number Variation Genotyping in Trisomy 21 Individuals Public

Jakubek, Yasminka Aleksandra (2014)

Permanent URL: https://etd.library.emory.edu/concern/etds/bk128b655?locale=fr
Published

Abstract

DNA microarrays have several uses in biological research. In the field of human genetics, they are used to characterize genome-wide patterns of variation. Affymetrix Genome-wide Human SNP array 6.0 microarrays genotype ~900,000 single nucleotide polymorphisms (SNPs) and copy number variants (CNVs) across the genome. Analysis methods for genotyping arrays rely on statistical approaches to generate accurate data. Two recurrent problems with these methods are evident. The first is the existence of batch effects. The second lies in the fact that these approaches often discard a large fraction of the raw data from probes that systematically fail across experiments. In order to address and understand these problems, we developed a novel analysis method that is based on a low-level model of hybridization on microarrays. We model binding between all probe-DNA duplexes that form on the array. In addition we model errors in probe synthesis, hybridization conditions (temperature, salt concentration), and details of the experimental protocol (target concentration, target fragmentation, wash stringency, and scanner settings). We used this model to predict probe intensities. The average correlation between expected and observed intensities was 0.701 with a range of 0.88 to 0.55. In this model batch effects are caused by differences in probe synthesis efficiency, target concentration, target fragmentation, wash stringency, and scanner settings. We used this model to develop a SNP and CNV genotyping algorithm that explicitly models batch effects and cross-hybridization. Our approach allows for the individual analysis of chips and can call SNPs and CNVs on chromosomes of any ploidy. We used this approach to analyze Down syndrome and normal samples. A significant percentage (13%) of SNPs that are targeted by Affymetrix 6.0 have high levels of cross-hybridization. Each SNP call has a quality score (QS). SNPs on trisomic chromosomes had lower QS scores (57% with QS> 0.99) than SNPs on diploid chromosomes (84% with QS > 0.99). Our approach uses direct estimates of DNA concentration to call CNVs. We called an average of 50 CNVs per samples of which 68% are in known CNV regions. Using only first-principles our method detects genetic variants with a comparable accuracy to current approaches.

Table of Contents

Chapter 1

General Introduction 1

References 11

Chapter 2

A Model of Binding on DNA Microarrays: 16

Understanding the Combined Effect of Probe

Synthesis Failure, Cross-Hybridization, DNA

Fragmentation and other Experimental

Details of Affymetrix Arrays

Abstract 17

Background 19

Methods 23

Results 33

Discussion 38

References 46

Chapter 3

SNP and CNV Detection in Normal and Trisomy 75

21 Individuals using a First-Principles Approach

Abstract 76

Introduction 77

Methods 81

Results 92

Discussion 94

References 97

Chapter 4

Conclusions 112

References 116

Figure and Tables

Chapter 1

General Introduction 1

Chapter 2

A Model of Binding on DNA Microarrays: 16

Understanding the Combined Effect of Probe

Synthesis Failure, Cross-Hybridization, DNA

Fragmentation and other Experimental

Details of Affymetrix Arrays

Table 1. Synthesis Errors 53

Table 2. Correlations 54

Table 3. Incorporation Rates 55

Table 4. Retention Rates 56

Figure 1. Probe and Target Sequences 57

Figure 2. Probe-Target Binding 58

Figure 3. Gompertz Curve 59

Figure 4. Observed and Expected Intensity Plots 60

Figure 5. Nearest-Neighbor Parameter Search 62

Figure 6. Predicted Effect of Mismatches on Intensity 64

Supplementary Table 1. Entropy and Enthalpy NN values 65

Supplementary Table 2. Individual Chip Data 66

Supplementary Figure 1. Forward/Reverse Probe 74

intensities as a function of

nucleotide composition

Chapter 3

SNP and CNV Detection in Normal and Trisomy 75

21 Individuals using a First-Principles Approach

Table 1. Data Summary 102

Table 2. Fraction of Mendelian Consistent SNPs 103

Table 3. Duplicate Agreement, Heterozygous calls 104

Trisomy 21

Table 4. Accuracy for Heterozygous Trisomy 21 calls 105

Table 5. Fraction of Autosomal Deletions 106

and Duplications in DGV

Table 6. CNV Trio Data 107

Figure 1. Distribution of Chi-Square Values for 108

Hardy-Weinberg Equilibrium

Figure 2. Duplicate Agreement 110

Figure 3. Genotype Combinations with Power 111

Chapter 4

Conclusions 112

About this Dissertation

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Subfield / Discipline
Degree
Submission
Language
  • English
Research Field
Mot-clé
Committee Chair / Thesis Advisor
Committee Members
Dernière modification

Primary PDF

Supplemental Files