Monkeying Around: Automatically Analyzing Malaria Infections in Rhesus Macaques Open Access

Hexter, Lindsay (Spring 2018)

Permanent URL: https://etd.library.emory.edu/concern/etds/bk128992v?locale=en
Published

Abstract

In today’s age of big data, automatic processing techniques are becoming more important than ever, especially in the field of biology and medicine. Many studies focus on genomic data, following the rise of high throughput sequencing; this project instead analyzes certain blood data parameters taken from rhesus macaques housed in Yerkes National Primate Research Center at Emory University.

The Joyner et al. 2016 paper, “Plasmodium cynomolgi infections in rhesus macaques display clinical and parasitological features pertinent to modelling vivax malaria pathology and relapse infections,” was the initial motivation for this study (Joyner et al., 2016). Joyner and his team follow the infection of malaria species P.cynomolgi in monkeys, taking blood data and other biological information daily. While the paper discusses possible points of difference between monkeys of varying disease severity, we endeavored to find an automatic way to use these “clinical and parasitological features” to characterize and predict aspects of malaria, including severity and stage of the longitudinal infection.

We propose to replicate existing analyses and to add new insights via various computational techniques. Machine learning is traditionally used for very large datasets, and thus this thesis intends to provide a proof of concept for automatically analyzing these types of smaller datasets, given restrictions studying monkeys. The flow of computation is as follows: normalization of data, creation of mathematical models, residual calculation, formation of residual matrices for clustering, and lastly the generation of regression models. The aforementioned procedure is then applied to shifted data for comparison, using Bayesian optimization. This study therefore provides a comprehensive framework for automatic analysis of medical data, which can be applied to other datasets.

Table of Contents

1 Introduction .................................... 1

1.1 Thesis Statement.................................... 3  

2 Background .................................... 4

2.1  Why Malaria ...................................... 5

2.2  Preprocessing and Normalization of Data ...................... 6

2.3  Nonlinear Least Squares and Gaussian Fitting.................... 8

2.4  Regression Modeling .................................. 8

      2.4.1 Ridge Regression................................ 9

2.5  Clustering........................................ 11

2.6  Bayesian Optimization................................. 11  

3 Approach .................................... 13

3.1  Data Storage ...................................... 13

      3.1.1 Normalization of Data ............................. 15

3.2  Gaussian Fitting .................................... 16

      3.2.0.1 Peak finding ............................. 18

3.3  Regression Modeling .................................. 21

      3.3.1 Combined Model ................................ 22

      3.3.2 ‘Phased’ Regression .............................. 22

3.4  Clustering........................................ 23

      3.4.1 Residual Matrices................................ 24

3.5  Bayesian Optimization................................. 25

      3.5.1 Spearmint Package............................... 27

      3.5.2 Sign-Match Matrices.............................. 28  

4 Results .............................. 29

4.1  Gaussian Fitting and Normalization ......................... 29

4.2  Regression Modeling .................................. 33

      4.2.1 Combined Model with Bayesian Optimization Shifts .............. 39

      4.2.2 ‘Phased’ Regression .............................. 45       

4.3  Clustering........................................ 47

      4.3.1 Residual Matrices................................ 47

            4.3.1.1 Clustering Results .............. 71

      4.3.2 Sign-Match Matrices.............................. 78

            4.3.2.1 Kmeans Clustering.............. 79

      4.3.3 Other Experiments............................... 84  

5 Conclusions .................................... 89

5.1 Contributions...................................... 90

5.2 Future Work ...................................... 90

5.3 Challenges and Learning................................ 91  

A Other Results .................................... 93

A.1 Normalized Results................................... 93

A.1.1 Residual Matrices................................ 93  

B Abbreviations .................................... 112   

References .................................... 114

 

 

 

About this Honors Thesis

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Degree
Submission
Language
  • English
Research field
Keyword
Committee Chair / Thesis Advisor
Committee Members
Last modified

Primary PDF

Supplemental Files