Macro-scale genomic studies of bacterial pathogens Öffentlichkeit

Petit, Robert (Summer 2018)

Permanent URL: https://etd.library.emory.edu/concern/etds/tq57nr08s?locale=de

Published

Abstract

The low cost of genome sequencing has led to a significant increase in publicly available datasets of bacterial pathogens. Taking advantage of this data requires new strategies for using computational resources and bioinformatics, as well as applying traditional organism-specific knowledge. With this understanding, I used public datasets to investigate two important bacterial pathogens Bacillus anthracis and Staphylococcus aureus.

In my first research project, I focused on Bacillus anthracis, the etiologic agent of anthrax, which shares over 99% average nucleotide identity with Bacillus cereus Group (BCerG) bacteria. This closeness, coupled with sequencing error rates, can cause B. cereus to be falsely identified as B. anthracis. To address this issue, I developed a typing schema for fine-scale differentiation of these two species. I identified a set of 31-mers specific to B. anthracis and another set specific to all BCerG including B. anthracis. I determined the limits of detection of these k-mers on synthetic data and developed a model to predict the presence of true B. anthracis sequences. I then reanalyzed a New York subway metagenome dataset, which falsely identified evidence for B. anthracis. I found no evidence for anthrax but instead the presence of unsampled close relatives to B. anthracis.

My second project concerned Staphylococcus aureus, a major antibiotic-resistant pathogen responsible for a wide spectrum of hospital and community-associated infections. S. aureus was well represented in genome sequencing studies submitted to public repositories but there were no tools available to make use of this useful data. To fill this void, I developed Staphopia, an analysis pipeline, database and application programming interface focused on S. aureus and processed over 44,000 publicly available S. aureus genomes. I found patterns in antibiotic resistance between S. aureus sequence types and a bias towards sequencing clinically relevant methicillin-resistant S. aureus strains.

I conclude, with a discussion about future macro-scale comparative genomic studies consisting of tens of thousands of genomes. I also provide comments on the expected rewards and challenges associated with macro-scale studies. Overall, this body of work illustrates the importance of public datasets for bacterial pathogens and integrating organism specific knowledge into bacterial sequence analyses.

Chapter 1: Introduction ____________________________________ 1

Bacterial sequence analysis step by step________________________ 1

Sequence Quality Control ______________________________________________2

Genome assembly ___________________________________________________3

Genome annotation __________________________________________________5

Genotyping bacteria based on genome sequence _______________________________5

Identifying Variation__________________________________________________6

Antimicrobial Resistance and Virulence Factors________________________________7

Comparative genomic analyses ______________________________ 7

Phylogenetics _____________________________________________________ 8

Pan-genome _______________________________________________________9

Genome wide association studies _________________________________________9

A deluge of bacterial sequences_____________________________ 10

A brief history of DNA sequencing technologies_______________________________ 10

Affordable high-throughput sequencing ____________________________________ 11

New opportunities in existing data _______________________________________ 12

Outline for this dissertation _______________________________ 13

Appendix_____________________________________________ 15

Chapter 2: Fine-scale differentiation between Bacillus anthracis and Bacillus cereus group signatures in metagenome shotgun data ______ 27

Abstract _____________________________________________ 28

Introduction __________________________________________ 29

Methods _____________________________________________ 32

Metagenome data and reference genome sequences ____________________________32

Mapping metagenome data to B. anthracis plasmids and chromosomes_______________33

Custom 31-mer assay for B. anthracis and Bacillus cereus Group ___________________33

Finding the limits for lethal factor-based detection of B. anthracis __________________35

Assessing Quality of B. anthracis and B. cereus Group specific 31-mers _______________35

Prediction of low coverage B. anthracis chromosome in shotgun sequencing datasets _____36

Results ______________________________________________ 37

NY subway metagenome sequences map to core regions of B. anthracis and B. cereus chromosome and plasmids but not to lethal factor gene_____37

B. anthracis genome coverage below 0.18x is a “gray area” for detection, where lethal toxin genes may not be sampled _38

Conserved and specific 31-mer sets for B. anthracis and BCerG chromosomes __________39

High background levels of B. cereus strains produce false positive B. anthracis specific k-mers due to random sequence errors_______40

A “specialist” model to interpret patterns of B. anthracis genetic signatures in metagenome samples ________ 41

Discussion____________________________________________ 42

Conclusions___________________________________________ 46

Acknowledgements _____________________________________ 46

Funding _____________________________________________ 46

Appendix_____________________________________________ 48

Chapter 3: Staphylococcus aureus viewed from the perspective of 40,000+ genomes ______________ 63

Abstract _____________________________________________ 64

Introduction __________________________________________ 65

Materials & Methods ____________________________________ 66

Staphopia Analysis Pipeline ____________________________________________66

Web Application, Relational Database and Application Programming Interface__________70

Processing Public Data _______________________________________________70

Metadata Collection _________________________________________________ 71

Creating non-redundant S. aureus diversity set _______________________________ 73

Results ______________________________________________ 74

Design of the Staphopia Analysis Pipeline and processing 43,000+ genomes ___________74

Sequence and assembly quality trends _____________________________________ 75

Genetic diversity measured by MLST ______________________________________ 77

Antibiotic resistance genes_____________________________________________ 77

Publication, metadata and strain geographic distribution ________________________79

A non-redundant S. aureus diversity set____________________________________80

Discussion____________________________________________ 81

Conclusions___________________________________________ 85

Links________________________________________________ 86

Appendix_____________________________________________ 87

Chapter 4: The influence of horizontal gene transfer barriers on Staphylococcus aureus and the potential of gene transfer networks to identify novel barriers._________ 98

Abstract _____________________________________________ 98

Introduction __________________________________________ 99

MRSA - a case where human action can break down barriers to HGT _ 102

VRSA - a case where barriers to HGT can have great public health consequences ______________________105

Using high-throughput DNA sequencing to build gene transfer networks _____________________________ 108

Using gene transfer networks to predict and monitor future spread of antibiotic resistance___________113

Conclusions__________________________________________ 115

Appendix____________________________________________ 116

Chapter 5: Summary and Future Directions____________________ 123

Summary ___________________________________________ 123

Future Directions: Macro-scale bacterial genomics______________ 126

Rewards of macro-scale genomics__________________________ 127

Statistical power __________________________________________________ 127

A better overview of a species __________________________________________ 127

Rational sampling _________________________________________________ 128

Challenges of macro-scale genomics ________________________ 129

Imperfect data ____________________________________________________ 129

Evolving sequencing technologies _______________________________________ 130

Data management and distribution ______________________________________ 130

Scalability_______________________________________________________ 131

Emerging macro-scale genomic projects _____________________ 132

Final remarks ________________________________________ 133

Appendix: Other Published Work ___________________________ 135

Bibliography __________________________________________ 138

About this Dissertation

Rights statement

Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.

School	Laney Graduate School
Department	Biological and Biomedical Sciences
Subfield / Discipline	Population Biology, Ecology & Evolution
Degree	Ph.D.
Submission	Dissertation
Language	English
Research Field	Biology, Bioinformatics Health Sciences, Public Health Biology, Microbiology
Stichwort	MRSA Bacillus anthracis Staphylococcus aureus anthrax bioinformatics
Committee Chair / Thesis Advisor	Read, Timothy, Emory University
Committee Members	Conneely, Karen, Emory University Morran, Levi, Emory University Goldberg, Joanna, Emory University Wu, Hao, Emory University

Zuletzt geändert

Primary PDF

Thumbnail	Title	Date Uploaded	Actions
	Macro-scale genomic studies of bacterial pathogens ()	2018-07-09 17:23:57 -0400	Download

Macro-scale genomic studies of bacterial pathogens Öffentlichkeit

Petit, Robert (Summer 2018)

Abstract

Table of Contents

About this Dissertation

Primary PDF

Supplemental Files