Techniques for Pattern Recognition in High-Throughput Metabolomic Data Open Access

Blackstock, Anna Jolly (2011)

Permanent URL: https://etd.library.emory.edu/concern/etds/1v53jx265?locale=en
Published

Abstract

Nuclear magnetic resonance spectroscopy (NMR) and mass spectrometry (MS) have
been introduced to studies of metabolite composition of biological fluids and tissues,
providing information relating to disease states and other conditions. The collection
of data from many types of biological samples is both simple and inexpensive, and
most of these substances require very little pre-processing. However, the best methods
for processing and analyzing complex metabolomic data are still being sought. We
present NMR data pre-processing methods appropriate for metabolomic studies and
techniques for recognizing patterns of metabolite levels in time course and cross-
sectional data.


First, methods for overcoming issues that complicate metabolomic data are discussed,
with a focus on NMR data. While NMR spectroscopy and MS provide a wealth
of information about the collection of metabolites found in biological substances,
the resulting data are extremely complex. Small changes in conditions can lead
to significant shifts in NMR spectra, making it critical that metabolomic data be
appropriately processed before analysis. NMR data are used to demonstrate existing
pre-processing methods and to introduce a set of techniques appropriate for
studies aiming to characterize the behavior of individual features.


Next, the possible periodic behavior of NMR features is assessed using time course
data. Metabolite levels change throughout the day, and the identification of features
with sinusoidal periodic behavior is of interest. Periodic regression is used to obtain
estimates of the parameter corresponding to period for individual NMR features. A
mixture model is then used to develop clusters of peaks, taking into account the
variability of the regression parameter estimates. Methods are applied to NMR data
collected from human blood plasma over a 24-hour period, and simulation results are
presented.


Lastly, we present a method for investigating age-related changes in metabolite levels
using MS data. Metabolite levels associated with some biological processes are
thought to experience shifts at different times in life. An algorithm is used to identify
metabolite level changes in blood plasma collected from marmosets of different ages.
Clusters of breakpoints with similar locations of metabolite level shifts are determined,
and metabolites corresponding to MS features with breakpoints are identified.

Table of Contents

Contents 1 Introduction 1 1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Nuclear Magnetic Resonance (NMR) Spectroscopy and Mass Spectrometry (MS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Challenges of Metabolomic Data . . . . . . . . . . . . . . . . . . . . . 5 1.4 Analysis of Metabolomic Data . . . . . . . . . . . . . . . . . . . . . . 6 1.4.1 Applications of Metabolomic Methodology . . . . . . . . . . . 6 1.4.2 Statistical Tools for Analysis . . . . . . . . . . . . . . . . . . . 8 1.5 Proposed Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.5.1 Motivating Datasets . . . . . . . . . . . . . . . . . . . . . . . 10 1.5.2 Peak Identification and Pre-processing Methods for NMR Data 10 1.5.3 Clustering on Periodicity Using Metabolomic Time Course Data 11 1.5.4 Using Mass Spectra to Determine Possible Ages of Metabolic System-Level Shifts . . . . . . . . . . . . . . . . . . . . . . . . 11 2 Peak Identification and Pre-processing Methods for NMR Data 12 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2 Existing Adjustment and Processing Techniques . . . . . . . . . . . . 13 2.2.1 Shimming, Phasing, Baseline Correction, and Assigning a Calibration Peak . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2.2 Correcting Chemical Shift Differences . . . . . . . . . . . . . . 14 2.2.3 Elimination of Uninformative Peaks . . . . . . . . . . . . . . . 16 2.2.4 Normalization and Scaling . . . . . . . . . . . . . . . . . . . . 16 2.3 Application of Processing Techniques: Blood Plasma NMR Data . . . 18 2.3.1 Initial Processing . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.3.2 Wavelet Smoothing . . . . . . . . . . . . . . . . . . . . . . . . 19 2.3.3 Identification of Peaks . . . . . . . . . . . . . . . . . . . . . . 21 2.3.4 Determination of Peak Locations . . . . . . . . . . . . . . . . 23 2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3 Clustering on Periodicity Using Metabolomic Time Course Data 27 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.2 Data Pre-Processing and Peak Identification . . . . . . . . . . . . . . 31 3.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.3.1 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.3.2 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.4 Application to Blood Plasma NMR Data . . . . . . . . . . . . . . . . 37 3.5 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4 Using Mass Spectra to Determine Possible Ages of Metabolic System- Level Shifts 47 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 4.2 Pre-Processing of the Data . . . . . . . . . . . . . . . . . . . . . . . . 49 4.3 Segmentation and Clustering . . . . . . . . . . . . . . . . . . . . . . . 50 4.3.1 Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.3.2 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.4 Application to Marmoset MS Data . . . . . . . . . . . . . . . . . . . 56 4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 4.6 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 Appendices 84 A Appendix for Chapter 3 85 A.1 Full Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . 86 A.2 Derivation of Maximum Likelihood Estimates for µk, σk2, and πk. . . 89 B Appendix for Chapter 4 92 B.1 Selection of parameters for the implementation of GLAD . . . . . . . 93 B.2 Metabolite Matches from the Madison-Qingdao Metabolomics Consortium Database (MMCD). . . . . . . . . . . . . . . . . . . . . . . . . 97

About this Dissertation

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Degree
Submission
Language
  • English
Research field
Keyword
Committee Chair / Thesis Advisor
Committee Members
Last modified

Primary PDF

Supplemental Files