Deriving a Metric to Compare Solutions of Malarial Strain Identification Problems and Performing Network Analysis of Disease Outbreaks Across Time Public

Bharwani, Safiyah (Spring 2018)

Permanent URL: https://etd.library.emory.edu/concern/etds/12579s24j?locale=fr
Published

Abstract

This text will build upon the research conducted by Mustonen et al. to use a Bayesian method to identify strains of the P. falciparum species of malaria from mixed diagnostic samples. In their StrainRecon algorithm, a single weight vector used to measure the presence of malaria in an infected individual is utilized in order to infer the quantity of strains of malaria, the identity of each strain, and the proportion in which each strain is present. This information is grouped into matrix-vector combinations, with matrices containing information on the identity of each strain and the corresponding vector containing information on the proportion in which each strain is represented. Due to the fact that this inference problem is under- determined, there are multiple matrix-vector pairs presented as possible solutions. This work will build upon this prior research by deriving a novel method to compare the solutions produced by the StrainRecon algorithm. We will rigorously justify this metric and find an efficient implementation before performing hierarchical clustering over real-world data from the Centers for Disease Control and Prevention (CDC). In particular, we will focus our analysis on understanding how disease outbreaks of malaria have changed over time and attempt to track how the number of strains of malaria has changed in the field. This analysis is of key importance to researchers at the CDC, since there is a sparsity of information on how the number of strains of malaria has changed over time. Throughout this work, an emphasis will be placed on making mathematical results consumable to practitioners at the CDC.

Table of Contents

Abstract...2

Introduction...5

Motivation....9

Dataset Description....13

Developing the SR Metric....17

Implementing the Metric....30

Clustering Methodology....38

Exploring the CDC Pilot Data....41

Analyzing Changes in Malaria Over Time...50

Conclusion....58

References....60

About this Honors Thesis

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Degree
Submission
Language
  • English
Research Field
Mot-clé
Committee Chair / Thesis Advisor
Committee Members
Dernière modification

Primary PDF

Supplemental Files