Machine Learning Methods in Large Scale Neuroimaging Study Open Access

He, Qing (Fall 2017)

Permanent URL: https://etd.library.emory.edu/concern/etds/h128nd70b?locale=en
Published

Abstract

The focus of this dissertation is on developing machine learning methods for analysis of the large-scale neuroimaging data. It consists of three topics.In the first topic, we develop a spatial-temporal Gaussian process regression (STGPR) model for Bayesian analysis of longitudinal imaging data. Our goal is to study progressions of the brain activities in different brain regions and how they are associated with time-independent predictors (disease status, gender, etc.) and time-varying predictors (age, weight, etc.). We assign Gaussian processes priors to spatial-temporal varying coefficients in the model. To cope with the large-scale dataset, we develop three fast posterior computation algorithms based on the Karhunen-Loeve expansions on the Gaussian processes. Compared with a voxel-wise linear model approach, we demonstrate the advantages of the proposed method in a simulation study, where we propose two metrics: relative L1 loss and gradients relative L1 loss for mea- suring coefficient estimation accuracy. We apply the proposed method to the analysis of the longitudinal positron emission tomography (PET) data in the Alzheimer’s Dis- ease Neuroimaging Initiative (ADNI) study and obtain some meaningful results. In the second topic, we use ensemble classification methods to predict disease status using neuroimaging data as biomakers in clinical studies. According to the existing brain atlas, the whole brain can be partitioned into many brain regions. For each brain region, we use voxel-level brain image to generate important classification features, using which we develop many region-level basic classifiers. Then we combine those basic classifiers through linear programming boost (LPBoost) to find an optimal feature combination rule for classification. We develop an efficient column generation algorithm to solve both binary and multi-class LPBoost problem in high-dimensional feature space. We show the proposed method can improve the performance of basic support vector classifiers (SVC) dramatically and outperform other existing alterna- tives. We use the proposed method to analyze a large-scale resting state fMRI data in the Autism Brain Imaging Data Exchange (ABIDE) study data, leading to a better prediction accuracy than the existing best result.

In the third topic, we make Bayesian inference on peaks of smooth curves in a nonparametric regression model, where we determine the peak location based on gradients of the curve. We assign a Gaussian process prior to the smooth curve of interest. We show that the joint posterior distribution of the curve, its first derivative and the second derivative follow a multivariate Gaussian process. This result leads a straightforward posterior inference on peak locations and magnitudes. In the simula- tion study, we demonstrate that the proposed peak identifier outperforms the existing non-parametric kernel smoothing method in different scenarios. We apply the pro- posed method to analysis of electroencephalogram (EEG) time series in a study of alcoholism. In particular, the proposed method is applied to find the peaks of the EEG time series in the temporal domain and peaks of the signal power in the fre- quency domain. We construct a peak-based classifier on alcoholism versus normals, which achieves a 80% classification accuracy.

Table of Contents

ACKNOWLEDGEMENTS .......................... ii

LIST OF FIGURES......................................... vii

LIST OF TABLES........................................... ix

CHAPTER I.Introduction .............................. 1

1.1 Overview............................. 1

1.2 Human Brains and Functional Neuroimaging Data . . . . . . 2

      1.2.1 BasicKnowledgeofHumanBrain . . . . . . . . . . 2

         1.2.2 Positron Emission Tomography (PET) Imaging . . . 5

         1.2.3 Functional Magnetic Resonance Imaging (fMRI) . . 6

         1.2.4 Electroencephalography (EEG) of Brain Electrical Activity ........................ 7

1.3 Statistical Methods of Neuroimaging Studies . . . . . . . . . 8

1.3.1 Preprocessing Pipeline ................ 9

1.3.2 Activation Studies................... 10

1.3.3 Connectivity Analysis................. 11

1.3.4 Predictionand Classification............. 13

1.4 Motivation Example and Proposed Research . . . . . . . . . . 14

1.4.1 Alzheimer’s Disease Neuroimaging Initiative Study (ADNI)......................... 14

1.4.2 Autism Spectrum Disorder (ASD) Study . . . . . . 15

1.4.3 Electroencephalogram Alcoholism Study . . . . . . 16

II. Topic 1: Bayesian Gaussian Process Modeling of Large Scale LongitudinalNeuroimagingData ..............17

        2.1 Introduction ........................... 17

        2.1.1 Longitudinal Neuroimaging Analysis . . . . . . . . . 17

        2.1.2 Machine Learning methods for Neuroimaging Study 19

        2.1.3 Gaussian Processes and Its Properties . . . . . . . . 21

       2.2 TheModel ............................ 23

                  2.2.1 ASpatial-TemporalModel .............. 23

                  2.2.2 PriorSpecifications .................. 23

                  2.2.3 ModelRepresentation................. 25

                  2.2.4 PosteriorComputation ................ 27

                  2.2.5 AnSTGPRbasedClassifier ............. 29

       2.3 Simulation Studies........................ 30

       2.4 ADNI Data Analysis....................... 34

       2.5 Discussion ............................ 37

       2.6 Appendix............................. 38

      2.6.1 Explicit Forms of Eigen Values and Eigen Vectors . 38

      2.6.2 Full Conditional Posterior Density of Hyperparameters 40

III. Topic 2: Ensemble Classification Methods For Feature Com- binationofLargeScaleNeuroimagingData........... 42

3.1 Introduction ........................... 42

3.1.1 Disease Prediction Using Neuroimaging Data ... 44

3.1.2 Support Vector Machine Classifiers . . . . 47

3.1.3 Ensemble Classification Methods . . . . . . .48

3.2 Methods ............................. 50

3.2.1 Binary Support Vector Classifiers . . . . . . . . . . 52

3.2.2 Binary LP Boosting through Column Generation . . 56

3.2.3 Multiclass SV MandLP Boost ............ 59

3.3 Simulation Study......................... 63

3.4 ABIDE DataAnalysis...................... 71

IV. Topic 3: Bayesian Nonparameteric Inference on Peaks via Spatially Adaptive Non-stationary Gaussian Processes . . . . 80

4.1 Introduction ........................... 80

4.2 Methods ............................. 82

       4.2.1 Posterior Computation ................ 84

4.3 Simulation Study......................... 86

       4.3.1 Effect of Noise..................... 86

       4.3.2 Effect of Multiple Equilibrium Points . . . . . . . . 87

4.4 EEG Data Analysis ....................... 87

4.4.1 Peak of Alpha Band in Temporal Domain . . .... 90

4.4.2 Peak of Power in Frequency Domain . . . . . .... 91

4.4.3 Classification based on Peak and Magnitude .... 92

4.5 Appendix............................. 97

BIBLIOGRAPHY................................ 102

About this Dissertation

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Degree
Submission
Language
  • English
Research Field
Keyword
Committee Chair / Thesis Advisor
Committee Members
Last modified

Primary PDF

Supplemental Files