New Statistical Methods for Analyzing Microbiome Data Open Access

Yue, Ye (Summer 2023)

Permanent URL: https://etd.library.emory.edu/concern/etds/q524jq18c?locale=en
Published

Abstract

Microbiome research has proliferated due to booming interests in the scientific community, increasing power of high-throughput sequencing, and rapid advancement of data analytics. The analysis for microbiome data from sequencing studies is challenging because of high-dimensionality, overdispersion, sparsity, compositionality, and experimental bias. In addition, microbiome studies typically have small sample, complex traits of interest and confounding covariates. New methods that can fully account for the complexities of data are needed.

In the first topic, we develop a new statistical method for testing mediation effects of microbiome at both the community and individual taxon levels. We have seen a rapidly growing volume of evidence linking the microbiome and human diseases or clinical outcomes, as well as evidence linking the microbiome and environmental exposures. Understanding whether and which microbes played a mediating role between an exposure and a disease outcome are essential for researchers to develop clinical interventions by modulating the microbes. Our new method allows an arbitrary number of taxa to be tested simultaneously, supports different types of exposures and outcomes, and so on.

In the second topic, we extend the most commonly used distance-based method PERMANOVA to testing microbiome mediation effects at the community level. Use of distance matrices is a popular approach to analyzing complex microbiome data. Our extension allows adjustment of confounders, accommodates various types of exposures and outcomes, and provides an omnibus test that combines the results from analyzing multiple distance matrices.

In the third topic, we develop a novel method for integrative analysis of datasets generated by both 16 marker-gene sequencing and shotgun metagenomics sequencing. Many microbiome studies have performed both experiments on the same cohort of samples. The two datasets often yield consistent patterns; however, each is subject to distinct experimental biases in an experiment-specific manner. These experimental biases, together with partially overlapping samples and differential library sizes between the two datasets, pose tremendous challenges when combining the datasets. Our new method combines data from both experiments for differential abundance tests, while accounting for differential experimental biases, assigning adaptive weights to each observation, and accommodating samples and taxa unique to an experiment.

Table of Contents

1 Introduction 1

1.1 Introduction to high-throughput microbiome data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2

1.2 Mediation analysis of microbiome data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2.1 Mediation analysis at both the taxon and community levels . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2.2 Mediation analysis based on distance matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.3 Integrative analysis of 16S marker-gene and shotgun metagenomic sequencing data . 10

2 Topic 1: A new approach to testing mediation of the microbiome at both the

community and individual taxon levels 15

2.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.1.2 Inverse regression model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.1.3 Testing mediation effects at individual taxa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.1.4 Testing the overall mediation effect in a community . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .19

2.2 Numerical Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.2.1 Simulation studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.2.2 Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .22

2.2.3 Murine microbiome study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.3 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3 Topic 2: Extension of PERMANOVA to testing the mediation effect of the microbiome . . 33

3.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.1.2 PERMANOVA-med: Extension of PERMANOVA to mediation analysis . . . . . . . . . . . . . . . 35

3.2 Numerical Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .35

3.2.1 Simulation studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .35

3.2.2 Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

3.2.3 Real data on melanoma immunotherapy response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .39

3.3 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .43

4 Topic 3: Integrative analysis of 16S marker-gene and shotgun metagenomic sequencing

data improves efficiency of testing microbiome hypotheses 46

4.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .47

4.1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .47

4.1.2 Integrative analysis of 16S marker-gene and shotgun metagenomic sequencing

data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .49

4.2 Numerical Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.2.1 Simulation studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.2.2 Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .55

4.2.3 ORIGINS data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

4.2.4 Dietary data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

4.3 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .62

Appendices 63

A Topic 1

B Topic 3

Bibliography

About this Dissertation

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Degree
Submission
Language
  • English
Research Field
Keyword
Committee Chair / Thesis Advisor
Committee Members
Last modified

Primary PDF

Supplemental Files