From species-wide to single colony: Multi-scale analysis of the ubiquitous pathogen Staphylococcus aureus Open Access

Raghuram, Vishnu (Summer 2023)

Permanent URL: https://etd.library.emory.edu/concern/etds/qj72p873f?locale=zh.Accessed20Jul.2024.Fuchs
Published

Abstract

Whole genome sequencing (WGS) is a powerful tool for both large- and small-scale analysis of any given species. With the increasing accessibility of WGS, bacterial pathogens have been sequenced at an explosive rate over the past decade. This abundance of sequencing data has allowed us to answer questions about pathogens in the context of human infection like never before. In this dissertation, I use collections of sequences from a ubiquitous opportunistic pathogen, Staphylococcus aureus, as a model system to answer questions regarding speciation, genome evolution, mutation signatures and human clinical sampling strategies. S. aureus is a prominent healthcare-associated pathogen that causes bloodstream, skin, and respiratory infections. S. aureus comprises many genomically distinct strains having abundant but selective gene exchange. Therefore, diverse sampling of S. aureus is key to understanding the introduction and evolution of new lineages in a given population. Here, I (1) used a dataset of > 80,000 S. aureus sequences to outline genomic characteristics that distinguish strains and substrains; (2) developed a software pipeline for rapid mutational analysis of specific loci and identified signatures of convergent evolution on a key S. aureus virulence regulator; (3) described optimal clinical sampling strategies for maximising observed genomic diversity; and finally (4) showed how strain specific microdiversity can impact polymicrobial interactions. The overall goal of this work is to describe methods and provide resources to the broader scientific community for analysis of large bacterial sequence datasets as well as sampling strategies for small-scale pathogen evolution studies.

Table of Contents

Chapter I – Introduction

Using WGS to study pathogen evolution across scales 1

Bacterial species, strains and genomes 3

Fig 1: Simplified schematic depicting two proposed mechanisms of speciation. 3

Fig 2: Bar chart showing number of bacterial short-read sequences deposited per year from January 2010 to February 2023 in NCBI. 5

Staphylococcus aureus 5

General background 5

Sequence Types, Clonal Complexes and genomic characterization 7

Fig 3: Core genome unrooted maximum likelihood phylogeny of 380 diverse S. aureus strains comprising several STs and CCs (25). 8

S. aureus and polymicrobial interactions 9

Quorum sensing and regulation of virulence 10

Fig 4: The agr (accessory gene regulator) operon is a global transcriptional regulator in S. aureus. 12

Clonal lineages and population structure 14

The Pangenome and species-wide diversity 16

The goals of this dissertation 18

Macrodiversity 18

Microdiversity 19

References 21

Chapter II - A species-scale pangenomic exploration of Staphylococcus aureus

Author contributions 30

Abstract 31

Importance 32

Introduction 33

Results 37

Filtering 37

Fig 1: Samples with high average minor allele frequency (MAF), low bactopia quality or high number of variants were filtered out. 37

Clustering 38

Fig 2: Sankey diagram showing the fate of 83,383 S. aureus genomes after processing and filtering. 39

Pangenome construction 40

Fig 3: Description of the pangenome of S. aureus. 40

Lineage assignment 41

Fig 4: Natural boundaries in core genome SNP distances can be used to categorise strains. 41

Gene discovery and lineage discovery 42

Fig 5: The dereplicated dataset provides an increased number of genes and an increased number of strain groups with the same number of genomes sampled compared to the total dataset. 43

Relationship between core and accessory genome 44

Fig 6: Prominent strain groups from their own clades on a core genome phylogeny and distinct clusters based on their accessory genome composition. 44

Fig S1: Pairwise core-genome SNP distance histogram (LEFT) and tSNE based on accessory genome composition (RIGHT) suggest CC1 S. aureus strains defined by MLST are likely multiple strain groups. 46

Lineage specific genes and fixation index 46

Fig S2: Intermediate FST genes show bimodal distribution of either high or low FST. 47

Fig 7: Mobile genetic elements were not associated with high or low FST in intermediate genes. 47

Fig 8: Strain-group specificity and co-occurrence of specific Staphylococcal toxins. 49

Fig S3: There are no agr group specific intermediate genes aside from agrD. 49

Discussion 51

Why our pangenome is “not another pangenome” 51

What did we learn about S. aureus? 51

What can we do further? 54

Methods 56

Genome collection and processing by Bactopia 56

Filtering low quality samples: 56

Filtering mixed strain samples: 56

Clustering and dereplication: 57

Pangenome analysis: 57

Statistical analysis and data visualisation: 58

References 59

Chapter III - ­­Species-wide phylogenomics of the Staphylococcus aureus agr operon reveals convergent evolution of frameshift mutations

Author contributions 65

Abstract 66

Importance 67

Introduction 68

Results 71

AgrVATE: A tool for kmer based assignment of agr groups and agr operon frameshift detection 71

agr type distribution in the Staphopia database 74

Fig 1: Distribution of agr groups across 40,890 S. aureus genomes from the Staphopia database. 74

agr cluster recombination within clonal complexes is rare 76

Fig 2: AgrA evolves independently of agr group with only two major amino acid sequence configurations across S. aureus. 77

Non-functional agr operons are common across diverse S. aureus genomes 79

Fig 3: Presence of putative non-functional variants of the agr operon. 81

Some agr frameshift mutations have occurred repeatedly through convergent evolution 83

Fig 4: Identical agr mutations evolve independent of phylogeny across different clonal complexes. 85

Discussion 86

Methods 90

AgrVATE workflow: 90

Identifying a unique set of 31mers for each agr group: 91

agr operon and agr gene extraction: 91

Identifying variants in the agr operon: 92

Staphopia metadata 92

Whole genome phylogeny and Linkage Disequilibrium 93

Comparing indel rate of agr to other S. aureus global regulators 93

Classifiers for predicting frameshift mutations in the agr operon: 94

Dereplication of Staphopia database genomes: 95

Simulating mutant agr operons: 95

Calculating consistency indices: 96

Sputum sample collection and whole genome sequencing: 96

Data availability 98

Supplementals 98

Acknowledgements 98

References 100

Chapter IV - Comparison of genomic diversity between single and pooled Staphylococcus aureus colonies isolated from human colonisation cultures

Author contributions 110

Abstract 111

Data summary 113

Importance 113

Introduction 114

Results 116

Fig 1: Schematic representation of colony collection strategy, names, and descriptions of isolate groups analysed in this study. 117

82% of collections (eight single genomes) and their corresponding pools have only one Multilocus Sequence Type (MLST). 118

Fig 2: Pairwise SNP distance between and within collections. 119

Pool-seq samples with elevated average minor allele frequency, elevated number of contigs, higher nucleotide diversity and untypable MLST were associated with strain mixtures 120

Fig 3: Assembly quality can be used to assess population heterogeneity. 122

Fig 4: Average MAF and average π can be used to detect multi-ST pools 126

Fig S1: Variation in pools was primarily driven by contamination and allelic diversity. 128

Table S1: Summary of all five principal components (PC1 - PC5) for five parameters used in Fig S1. All 254 pools and 2032 singles were used for principal component analysis. 129

Numbers of variants in pool-seq and eight singles from the same sample are correlated but pool-seq had greater number 129

Fig 5:Pools were a better representation of the total number of variants in the population 132

Numbers of segregating sites in pools and singles from the same sample are positively correlated 133

Fig 6 legend: Allelic variation in pools and singles from the same sample were positively correlated 135

A median of one more AMR gene was detected in the pools compared to singles 136

Fig 7: A median of one additional AMR class can be observed in the pools compared to singles. 138

Discussion 139

Fig 8 legend: Number of new variants or new AMR genes observed with the addition of more sequencing runs. 140

Methods 143

Strain sampling 143

Library preparation and sequencing 144

Genome assembly, annotation and variant calling using Bactopia 144

Pairwise SNP distance calculation, dereplication, and phylogeny 145

Number of variants, segregating sites, and allele frequency calculation 145

Logistic regression 146

Statistical analyses and data visualisation 147

Data availability 147

Acknowledgements 147

References 148

Chapter V -  Staphylococcus aureus and Pseudomonas aeruginosa isolates from the same cystic fibrosis respiratory sample coexist in coculture

Author contributions 153

Abstract 154

Importance 155

Introduction 156

Materials and Methods 158

Bacterial strains 158

Coculture assay 158

Statistical analysis 159

Results 161

S. aureus survives better with its co-infecting CF P. aeruginosa 161

Table 1: Survival of S. aureus (Sa) isolates when cocultured with concurrently isolated P. aeruginosa (Pa), grouped by patient ID. 163

Figure 1: S. aureus (Sa) survives better with its co-infecting cystic fibrosis (CF) P. aeruginosa (Pa). 164

Figure 2: P. aeruginosa (Pa) survives similarly with its co-infecting cystic fibrosis (CF) S. aureus (Sa) and JE2. 165

P. aeruginosa survives similarly with its co-infecting CF S. aureus as it does with JE2 167

Discussion 168

Supplementals 171

Acknowledgments 171

References 173

Chapter VI – Conclusions and future directions

Macrodiversity 177

Microdiversity 182

Fig 1: Relative proportions and relative genetic identity are major determinants of our ability to detect sequence mixtures 183

Infectious disease microbiology in the era of big data 187

Fig 2: Bar chart showing total number of short-read sequences available per species from January 2010 to February 2023 in NCBI. 188

References 189

About this Dissertation

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Subfield / Discipline
Degree
Submission
Language
  • English
Research Field
Keyword
Committee Chair / Thesis Advisor
Committee Members
Last modified

Primary PDF

Supplemental Files