Systematic Evaluation of The Impact of DNA Sequence Variants on The in vivo Binding Affinity at Transcription Factor Binding Sites Open Access

Jin, Yutong (Fall 2018)

Permanent URL: https://etd.library.emory.edu/concern/etds/9k41zd523?locale=en%255D
Published

Abstract

 

The majority of the single nucleotide variants (SNVs) identified by genome-wide association studies (GWASs) fall outside of the protein-coding regions. Elucidating the functional implications of these variants has been a major challenge due to the lack of functional annotation in the non-coding part of the genome. A possible mechanism for some of the functional non-coding variants is that they disrupted the canonical transcription factor (TF) binding sites which affect the in vivo binding affinity of the TF. However, not all variants located within TF binding sites will impact TF binding since a substantial proportion of most TF binding motifs is not well conserved. Therefore, simply annotate all variants located in putative TF binding sites is not ideal. In this project, we conducted a comprehensive survey to study the effect of SNVs on the TF binding affinity. Using CTCF as an example, we found that mutations occur at about 30% of positions inside a putative CTCF binding motif sites will likely to have significant effect on the TF-DNA binding. Our results provide key guidance on annotating variants in terms of the impact of TF binding. 

Table of Contents

 

1. Introduction ................................................................................................................................1

2. Materials and Methods ................................................................................................................2

2.1 Overview ...........................................................................................................................2

2.2 Statistical Analysis Methods .............................................................................................3

2.2.1 Comparing correlation between difference in PWM score and weights.................3

2.2.2 Using delta-SVM score to determine the impact of variations on each particular position ........................................................................................................................................4

2.2.3 Determine the threshold value to identify significant SNPs ..................................5

3. Result ............................................................................................................................................5

4. Discussion .....................................................................................................................................7

5. Reference ......................................................................................................................................8

6. Tables and Figures ......................................................................................................................11

7. Supplementary Materials ............................................................................................................15

About this Master's Thesis

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Degree
Submission
Language
  • English
Research Field
Keyword
Committee Chair / Thesis Advisor
Committee Members
Partnering Agencies
Last modified

Primary PDF

Supplemental Files