From species-wide to single colony: Multi-scale analysis of the ubiquitous pathogen Staphylococcus aureus Open Access
Raghuram, Vishnu (Summer 2023)
Abstract
Whole genome sequencing (WGS) is a powerful tool for both large- and small-scale analysis of any given species. With the increasing accessibility of WGS, bacterial pathogens have been sequenced at an explosive rate over the past decade. This abundance of sequencing data has allowed us to answer questions about pathogens in the context of human infection like never before. In this dissertation, I use collections of sequences from a ubiquitous opportunistic pathogen, Staphylococcus aureus, as a model system to answer questions regarding speciation, genome evolution, mutation signatures and human clinical sampling strategies. S. aureus is a prominent healthcare-associated pathogen that causes bloodstream, skin, and respiratory infections. S. aureus comprises many genomically distinct strains having abundant but selective gene exchange. Therefore, diverse sampling of S. aureus is key to understanding the introduction and evolution of new lineages in a given population. Here, I (1) used a dataset of > 80,000 S. aureus sequences to outline genomic characteristics that distinguish strains and substrains; (2) developed a software pipeline for rapid mutational analysis of specific loci and identified signatures of convergent evolution on a key S. aureus virulence regulator; (3) described optimal clinical sampling strategies for maximising observed genomic diversity; and finally (4) showed how strain specific microdiversity can impact polymicrobial interactions. The overall goal of this work is to describe methods and provide resources to the broader scientific community for analysis of large bacterial sequence datasets as well as sampling strategies for small-scale pathogen evolution studies.
Table of Contents
Chapter I – Introduction
Using WGS to study pathogen evolution across scales 1
Bacterial species, strains and genomes 3
Fig 1: Simplified schematic depicting two proposed mechanisms of speciation. 3
Fig 2: Bar chart showing number of bacterial short-read sequences deposited per year from January 2010 to February 2023 in NCBI. 5
Staphylococcus aureus 5
General background 5
Sequence Types, Clonal Complexes and genomic characterization 7
Fig 3: Core genome unrooted maximum likelihood phylogeny of 380 diverse S. aureus strains comprising several STs and CCs (25). 8
S. aureus and polymicrobial interactions 9
Quorum sensing and regulation of virulence 10
Fig 4: The agr (accessory gene regulator) operon is a global transcriptional regulator in S. aureus. 12
Clonal lineages and population structure 14
The Pangenome and species-wide diversity 16
The goals of this dissertation 18
Macrodiversity 18
Microdiversity 19
References 21
Chapter II - A species-scale pangenomic exploration of Staphylococcus aureus
Author contributions 30
Abstract 31
Importance 32
Introduction 33
Results 37
Filtering 37
Fig 1: Samples with high average minor allele frequency (MAF), low bactopia quality or high number of variants were filtered out. 37
Clustering 38
Fig 2: Sankey diagram showing the fate of 83,383 S. aureus genomes after processing and filtering. 39
Pangenome construction 40
Fig 3: Description of the pangenome of S. aureus. 40
Lineage assignment 41
Fig 4: Natural boundaries in core genome SNP distances can be used to categorise strains. 41
Gene discovery and lineage discovery 42
Fig 5: The dereplicated dataset provides an increased number of genes and an increased number of strain groups with the same number of genomes sampled compared to the total dataset. 43
Relationship between core and accessory genome 44
Fig 6: Prominent strain groups from their own clades on a core genome phylogeny and distinct clusters based on their accessory genome composition. 44
Fig S1: Pairwise core-genome SNP distance histogram (LEFT) and tSNE based on accessory genome composition (RIGHT) suggest CC1 S. aureus strains defined by MLST are likely multiple strain groups. 46
Lineage specific genes and fixation index 46
Fig S2: Intermediate FST genes show bimodal distribution of either high or low FST. 47
Fig 7: Mobile genetic elements were not associated with high or low FST in intermediate genes. 47
Fig 8: Strain-group specificity and co-occurrence of specific Staphylococcal toxins. 49
Fig S3: There are no agr group specific intermediate genes aside from agrD. 49
Discussion 51
Why our pangenome is “not another pangenome” 51
What did we learn about S. aureus? 51
What can we do further? 54
Methods 56
Genome collection and processing by Bactopia 56
Filtering low quality samples: 56
Filtering mixed strain samples: 56
Clustering and dereplication: 57
Pangenome analysis: 57
Statistical analysis and data visualisation: 58
References 59
Chapter III - Species-wide phylogenomics of the Staphylococcus aureus agr operon reveals convergent evolution of frameshift mutations
Author contributions 65
Abstract 66
Importance 67
Introduction 68
Results 71
AgrVATE: A tool for kmer based assignment of agr groups and agr operon frameshift detection 71
agr type distribution in the Staphopia database 74
Fig 1: Distribution of agr groups across 40,890 S. aureus genomes from the Staphopia database. 74
agr cluster recombination within clonal complexes is rare 76
Fig 2: AgrA evolves independently of agr group with only two major amino acid sequence configurations across S. aureus. 77
Non-functional agr operons are common across diverse S. aureus genomes 79
Fig 3: Presence of putative non-functional variants of the agr operon. 81
Some agr frameshift mutations have occurred repeatedly through convergent evolution 83
Fig 4: Identical agr mutations evolve independent of phylogeny across different clonal complexes. 85
Discussion 86
Methods 90
AgrVATE workflow: 90
Identifying a unique set of 31mers for each agr group: 91
agr operon and agr gene extraction: 91
Identifying variants in the agr operon: 92
Staphopia metadata 92
Whole genome phylogeny and Linkage Disequilibrium 93
Comparing indel rate of agr to other S. aureus global regulators 93
Classifiers for predicting frameshift mutations in the agr operon: 94
Dereplication of Staphopia database genomes: 95
Simulating mutant agr operons: 95
Calculating consistency indices: 96
Sputum sample collection and whole genome sequencing: 96
Data availability 98
Supplementals 98
Acknowledgements 98
References 100
Chapter IV - Comparison of genomic diversity between single and pooled Staphylococcus aureus colonies isolated from human colonisation cultures
Author contributions 110
Abstract 111
Data summary 113
Importance 113
Introduction 114
Results 116
Fig 1: Schematic representation of colony collection strategy, names, and descriptions of isolate groups analysed in this study. 117
82% of collections (eight single genomes) and their corresponding pools have only one Multilocus Sequence Type (MLST). 118
Fig 2: Pairwise SNP distance between and within collections. 119
Pool-seq samples with elevated average minor allele frequency, elevated number of contigs, higher nucleotide diversity and untypable MLST were associated with strain mixtures 120
Fig 3: Assembly quality can be used to assess population heterogeneity. 122
Fig 4: Average MAF and average π can be used to detect multi-ST pools 126
Fig S1: Variation in pools was primarily driven by contamination and allelic diversity. 128
Table S1: Summary of all five principal components (PC1 - PC5) for five parameters used in Fig S1. All 254 pools and 2032 singles were used for principal component analysis. 129
Numbers of variants in pool-seq and eight singles from the same sample are correlated but pool-seq had greater number 129
Fig 5:Pools were a better representation of the total number of variants in the population 132
Numbers of segregating sites in pools and singles from the same sample are positively correlated 133
Fig 6 legend: Allelic variation in pools and singles from the same sample were positively correlated 135
A median of one more AMR gene was detected in the pools compared to singles 136
Fig 7: A median of one additional AMR class can be observed in the pools compared to singles. 138
Discussion 139
Fig 8 legend: Number of new variants or new AMR genes observed with the addition of more sequencing runs. 140
Methods 143
Strain sampling 143
Library preparation and sequencing 144
Genome assembly, annotation and variant calling using Bactopia 144
Pairwise SNP distance calculation, dereplication, and phylogeny 145
Number of variants, segregating sites, and allele frequency calculation 145
Logistic regression 146
Statistical analyses and data visualisation 147
Data availability 147
Acknowledgements 147
References 148
Chapter V - Staphylococcus aureus and Pseudomonas aeruginosa isolates from the same cystic fibrosis respiratory sample coexist in coculture
Author contributions 153
Abstract 154
Importance 155
Introduction 156
Materials and Methods 158
Bacterial strains 158
Coculture assay 158
Statistical analysis 159
Results 161
S. aureus survives better with its co-infecting CF P. aeruginosa 161
Table 1: Survival of S. aureus (Sa) isolates when cocultured with concurrently isolated P. aeruginosa (Pa), grouped by patient ID. 163
Figure 1: S. aureus (Sa) survives better with its co-infecting cystic fibrosis (CF) P. aeruginosa (Pa). 164
Figure 2: P. aeruginosa (Pa) survives similarly with its co-infecting cystic fibrosis (CF) S. aureus (Sa) and JE2. 165
P. aeruginosa survives similarly with its co-infecting CF S. aureus as it does with JE2 167
Discussion 168
Supplementals 171
Acknowledgments 171
References 173
Chapter VI – Conclusions and future directions
Macrodiversity 177
Microdiversity 182
Fig 1: Relative proportions and relative genetic identity are major determinants of our ability to detect sequence mixtures 183
Infectious disease microbiology in the era of big data 187
Fig 2: Bar chart showing total number of short-read sequences available per species from January 2010 to February 2023 in NCBI. 188
References 189
About this Dissertation
School | |
---|---|
Department | |
Subfield / Discipline | |
Degree | |
Submission | |
Language |
|
Research Field | |
Keyword | |
Committee Chair / Thesis Advisor | |
Committee Members |
Primary PDF
Thumbnail | Title | Date Uploaded | Actions |
---|---|---|---|
|
From species-wide to single colony: Multi-scale analysis of the ubiquitous pathogen Staphylococcus aureus () | 2023-05-31 12:22:44 -0400 |
|
Supplemental Files
Thumbnail | Title | Date Uploaded | Actions |
---|