Techniques for Pattern Recognition in High-Throughput Metabolomic Data Open Access
Blackstock, Anna Jolly (2011)
Abstract
Nuclear magnetic resonance spectroscopy (NMR) and mass
spectrometry (MS) have
been introduced to studies of metabolite composition of biological
fluids and tissues,
providing information relating to disease states and other
conditions. The collection
of data from many types of biological samples is both simple and
inexpensive, and
most of these substances require very little pre-processing.
However, the best methods
for processing and analyzing complex metabolomic data are still
being sought. We
present NMR data pre-processing methods appropriate for metabolomic
studies and
techniques for recognizing patterns of metabolite levels in time
course and cross-
sectional data.
First, methods for overcoming issues that complicate metabolomic
data are discussed,
with a focus on NMR data. While NMR spectroscopy and MS provide a
wealth
of information about the collection of metabolites found in
biological substances,
the resulting data are extremely complex. Small changes in
conditions can lead
to significant shifts in NMR spectra, making it critical that
metabolomic data be
appropriately processed before analysis. NMR data are used to
demonstrate existing
pre-processing methods and to introduce a set of techniques
appropriate for
studies aiming to characterize the behavior of individual
features.
Next, the possible periodic behavior of NMR features is assessed
using time course
data. Metabolite levels change throughout the day, and the
identification of features
with sinusoidal periodic behavior is of interest. Periodic
regression is used to obtain
estimates of the parameter corresponding to period for individual
NMR features. A
mixture model is then used to develop clusters of peaks, taking
into account the
variability of the regression parameter estimates. Methods are
applied to NMR data
collected from human blood plasma over a 24-hour period, and
simulation results are
presented.
Lastly, we present a method for investigating age-related changes
in metabolite levels
using MS data. Metabolite levels associated with some biological
processes are
thought to experience shifts at different times in life. An
algorithm is used to identify
metabolite level changes in blood plasma collected from marmosets
of different ages.
Clusters of breakpoints with similar locations of metabolite level
shifts are determined,
and metabolites corresponding to MS features with breakpoints are
identified.
Table of Contents
Contents 1 Introduction 1 1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Nuclear Magnetic Resonance (NMR) Spectroscopy and Mass Spectrometry (MS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Challenges of Metabolomic Data . . . . . . . . . . . . . . . . . . . . . 5 1.4 Analysis of Metabolomic Data . . . . . . . . . . . . . . . . . . . . . . 6 1.4.1 Applications of Metabolomic Methodology . . . . . . . . . . . 6 1.4.2 Statistical Tools for Analysis . . . . . . . . . . . . . . . . . . . 8 1.5 Proposed Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.5.1 Motivating Datasets . . . . . . . . . . . . . . . . . . . . . . . 10 1.5.2 Peak Identification and Pre-processing Methods for NMR Data 10 1.5.3 Clustering on Periodicity Using Metabolomic Time Course Data 11 1.5.4 Using Mass Spectra to Determine Possible Ages of Metabolic System-Level Shifts . . . . . . . . . . . . . . . . . . . . . . . . 11 2 Peak Identification and Pre-processing Methods for NMR Data 12 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2 Existing Adjustment and Processing Techniques . . . . . . . . . . . . 13 2.2.1 Shimming, Phasing, Baseline Correction, and Assigning a Calibration Peak . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2.2 Correcting Chemical Shift Differences . . . . . . . . . . . . . . 14 2.2.3 Elimination of Uninformative Peaks . . . . . . . . . . . . . . . 16 2.2.4 Normalization and Scaling . . . . . . . . . . . . . . . . . . . . 16 2.3 Application of Processing Techniques: Blood Plasma NMR Data . . . 18 2.3.1 Initial Processing . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.3.2 Wavelet Smoothing . . . . . . . . . . . . . . . . . . . . . . . . 19 2.3.3 Identification of Peaks . . . . . . . . . . . . . . . . . . . . . . 21 2.3.4 Determination of Peak Locations . . . . . . . . . . . . . . . . 23 2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3 Clustering on Periodicity Using Metabolomic Time Course Data 27 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.2 Data Pre-Processing and Peak Identification . . . . . . . . . . . . . . 31 3.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.3.1 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.3.2 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.4 Application to Blood Plasma NMR Data . . . . . . . . . . . . . . . . 37 3.5 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4 Using Mass Spectra to Determine Possible Ages of Metabolic System- Level Shifts 47 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 4.2 Pre-Processing of the Data . . . . . . . . . . . . . . . . . . . . . . . . 49 4.3 Segmentation and Clustering . . . . . . . . . . . . . . . . . . . . . . . 50 4.3.1 Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.3.2 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.4 Application to Marmoset MS Data . . . . . . . . . . . . . . . . . . . 56 4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 4.6 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 Appendices 84 A Appendix for Chapter 3 85 A.1 Full Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . 86 A.2 Derivation of Maximum Likelihood Estimates for µk, σk2, and πk. . . 89 B Appendix for Chapter 4 92 B.1 Selection of parameters for the implementation of GLAD . . . . . . . 93 B.2 Metabolite Matches from the Madison-Qingdao Metabolomics Consortium Database (MMCD). . . . . . . . . . . . . . . . . . . . . . . . . 97
About this Dissertation
School | |
---|---|
Department | |
Degree | |
Submission | |
Language |
|
Research Field | |
Keyword | |
Committee Chair / Thesis Advisor | |
Committee Members |
Primary PDF
Thumbnail | Title | Date Uploaded | Actions |
---|---|---|---|
Techniques for Pattern Recognition in High-Throughput Metabolomic Data () | 2018-08-28 11:49:54 -0400 |
|
Supplemental Files
Thumbnail | Title | Date Uploaded | Actions |
---|