Predicting Combined Chemotherapeutic Agents’ Efficacy Synergy via Multiple Regression Models and Cross Validation Technique, Upon Inter-Patient and Intra-Patient Levels Open Access

Zhang, Xiaozhu (Spring 2019)

Permanent URL: https://etd.library.emory.edu/concern/etds/h128nf92k?locale=en
Published

Abstract

Background: Precise medicine is crucial to cancer treatment for minimizing potentially lethal side-effects and maximize drug efficacy, and accurately modeling the individual drug efficacy is the key step. However, previous studies mostly modeled single drug efficacy while combination chemotherapy is more frequently applied in practice. In this study, we compared and integrated several models and algorithms to predict individual multiple-drug-polymer response on both intra-patient and inter-patient levels. Eventually, we aim to push cancer treatment one step down to the road of precise medicine.

Methods: We are interested in three key variables: two drug dosages, gene expression, and gene mutation. By adding these variables one by one to the model and evaluating model performance, we can determine their relative importance in the prediction. Linear regression, ridge regression, lasso regression, elastic net regression and random forest algorithms are applied in model construction. The goodness of fit is evaluated through R-square value tested by 10-fold cross validation and leave one out cross validation. Model was built upon single cell line data as well as data composed of four cell lines’ information to investigate models’ ability of predicting synergy at inter-cell-line level and intra-cell-line level.

Results and Conclusion: Compared to baseline model in which dosage information are only explanatory variables, secondary model with added gene expression data generated significantly larger R-square. However, adding mutation data into final model did not improve model accuracy, and R-squares are nearly the same to secondary model. In addition, model built upon multiple cell lines were incompetent in predicting drug synergy. Among five regression methods, random forest algorithm consistently produces largest R-square in each model. 10-fold CV is proved to have better generality and LOOCV coupled with random forest algorithm built best model. In conclusion, this study proved feasibility of predicting multiple chemotherapeutic agents’ efficacy synergy utilizing their dosage information and gene expression data with-in cell line. The efforts of adding mutation information returned result that lower than expectation. More information is needed to model the drug synergy among patients.

Table of Contents

Introduction ………………………………………………………………………………………………………………………… 1

Methods ……………………………………………………………………………………………………………………………….. 4

1.Data description …………………………………………………………………………………………………………………… 4

2.Baseline model establishment ……………………………………………………………………………………………… 5

3.Secondary model construction ……………………………………………………………………………………………. 8

4.Final model and model comparison …………………………………………………………………………………….. 9

Results …………………………………………………………………………………………………………………………………. 10

Discussion ……………………………………………………………………………………………………..…………………… 17

Acknowledgment ……………………………………………………………………………………….…………………. 20

Bibliography ……………………………………………………………………………………………………….……………… 20

About this Master's Thesis

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Subfield / Discipline
Degree
Submission
Language
  • English
Research Field
Keyword
Committee Chair / Thesis Advisor
Last modified

Primary PDF

Supplemental Files