This text will build upon the research conducted by Mustonen et al. to use a Bayesian method to identify strains of the P. falciparum species of malaria from mixed diagnostic samples. In their StrainRecon algorithm, a single weight vector used to measure the presence of malaria in an infected individual is utilized in order to infer the quantity of strains of malaria, the identity of each strain, and the proportion in which each strain is present. This information is grouped into matrix-vector combinations, with matrices containing information on the identity of each strain and the corresponding vector containing information on the proportion in which each strain is represented. Due to the fact that this inference problem is under- determined, there are multiple matrix-vector pairs presented as possible solutions. This work will build upon this prior research by deriving a novel method to compare the solutions produced by the StrainRecon algorithm. We will rigorously justify this metric and find an efficient implementation before performing hierarchical clustering over real-world data from the Centers for Disease Control and Prevention (CDC). In particular, we will focus our analysis on understanding how disease outbreaks of malaria have changed over time and attempt to track how the number of strains of malaria has changed in the field. This analysis is of key importance to researchers at the CDC, since there is a sparsity of information on how the number of strains of malaria has changed over time. Throughout this work, an emphasis will be placed on making mathematical results consumable to practitioners at the CDC.
Table of Contents
This table of contents is under embargo until 24 May 2020
About this Honors Thesis
|Committee Chair / Thesis Advisor|
|File download under embargo until 24 May 2020||2018-04-10||File download under embargo until 24 May 2020|