Transmission Bottleneck Size Estimation from De Novo Viral Genetic Variation Open Access

Shi, Yike (Spring 2024)

Permanent URL: https://etd.library.emory.edu/concern/etds/8p58pf657?locale=en%255D
Published

Abstract

Sequencing of viral infections has become increasingly common over the last decade. Deep sequencing data in particular have proven useful in characterizing the roles that genetic drift and natural selection play in shaping within-host viral populations. They have also been used to estimate transmission bottleneck sizes from identified donor–recipient pairs. These bottleneck sizes quantify the number of viral particles that establish genetic lineages in the recipient host and are important to estimate due to their impact on viral evolution. Current approaches for estimating bottleneck sizes exclusively consider the subset of viral sites that are observed as polymorphic in the donor individual. However, these approaches have the potential to substantially underestimate true transmission bottleneck sizes. Here, we present a new statistical approach for instead estimating bottleneck sizes using patterns of viral genetic variation that arise de novo within a recipient individual. Specifically, our approach makes use of the number of clonal viral variants observed in a transmission pair, defined as the number of viral sites that are monomorphic in both the donor and the recipient but carry different alleles. We first test our approach on a simulated dataset and then apply it to both influenza A virus sequence data and SARS-CoV-2 sequence data from identified transmission pairs. Our results confirm the existence of extremely tight transmission bottlenecks for these 2 respiratory viruses.

Table of Contents

Introduction .................................................................................................................................... 1

Methods .......................................................................................................................................... 4

The Stochastic Within-Host Model .................................................................................................. 4

Derivation of the Probability Distribution for the Number of Clonal Variants ...................................... 8

Results............................................................................................................................................. 10

Application to Simulated Data ......................................................................................................... 10

Application to Empirical Data .......................................................................................................... 15

Application to IAV ........................................................................................................................... 16

Application to SARS-CoV-2 .............................................................................................................. 19

Guarding Against the Erroneous Calling of Clonal Variants ................................................................. 21

Considering Alternative Distributions for the Initial Number of Viral Particles That Start an Infection....24

Discussion......................................................................................................................................... 27

Supplemental Material ....................................................................................................................... 31

Derivation of the probability distribution for the number of clonal variants.......................................... 31

Rederivation of the Bozic et al. (2016) equation for the mean number of clonal variants ....................... 38

Calculation of the mean transmission bottleneck size 𝐍𝐛 ................................................................... 39

Quantification of the number of clonal variants for the influenza A virus data set ................................ 39

Quantification of the number of clonal variants for the SARS-CoV-2 data set ....................................... 40

Probability of a donor iSNV transmitting and fixing in a recipient........................................................ 41

Supplemental Tables .......................................................................................................................... 42

Supplemental Figures ......................................................................................................................... 43

References ......................................................................................................................................... 51

About this Honors Thesis

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Degree
Submission
Language
  • English
Research Field
Keyword
Committee Chair / Thesis Advisor
Committee Members
Last modified

Primary PDF

Supplemental Files