Evaluation of the impact of DNA Sequence Variations on in vivo Transcription Factor Binding Affinity Open Access

Jiang, Jiahui (Spring 2021)

Permanent URL: https://etd.library.emory.edu/concern/etds/x346d545t?locale=en
Published

Abstract

Genome-wide association studies (GWASs) identified huge amounts of single nucleotide variants (SNVs) and thousands of SNVs within non-coding regions have associations with complex diseases. However, how non-coding SNVs specifically affect diseases is not clear yet. Recently, the number of studies focusing on the impact of these SNVs are increasing rapidly. A possible mechanism is that some non-coding SNVs can alter regulatory elements such as disrupting transcription factor (TF) binding sites, leading to the change of gene expression which result in diseases. Traditionally, it is assumed that SNVs within TF binding sites will impact the TF binding. However, increasing studies show that not all SNVs contribute to the TF binding since most TF binding motifs are not well conserved. Therefore, more information is needed to annotate SNVs within TF binding sites. In this study, we conducted a comprehensive survey to quantify the impact of SNVs on TF binding affinity using a creative sequence-based machine learning method. We found that only 20% SNVs within putative TF binding sites would be possible to significantly impact the in vivo TF binding.

Table of Contents

1. Introduction............................................................................................................................ 1

2. Method................................................................................................................................... 3

2.1 Using Phenotype–Genotype Integrator (PheGenI) and UCSC Table Browser to find SNVs.................................................................................................................................. 4

2.2 Using PWM method to evaluate motif......................................................................... 4

2.3 Using gkm-SVM method to evaluate motif................................................................. 5

3. Results.................................................................................................................................... 6

3.1 Correlation between PWM scores and gkm-SVM weights.......................................... 7

3.2 Potential association between TFs and complex diseases............................................. 7

4. Discussion.............................................................................................................................. 9

5. References............................................................................................................................ 11

6. Figures.................................................................................................................................. 13

7. Supplementary Materials...................................................................................................... 16

About this Master's Thesis

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Subfield / Discipline
Degree
Submission
Language
  • English
Research Field
Keyword
Committee Chair / Thesis Advisor
Committee Members
Last modified

Primary PDF

Supplemental Files