Epigenetic prediction of smoking status using machine-learning methods Open Access

Liu, Tianxiao (Fall 2019)

Permanent URL: https://etd.library.emory.edu/concern/etds/0k225c080?locale=en


Background: Tobacco smoking has been recognized as a major risk factor for many adverse health outcomes. Although many DNA methylation sites have been reported to be associated with tobacco smoking, few studies have focused on establishing prediction models of smoking status from DNA methylation data. This study aims at smoking status prediction using machine learning algorithms with precision, generalizability and a small number of predictors. Methods: An epigenetic prediction analysis of smoking status was performed on 218 male Caucasian twins, using DNA methylation data and two machine learning methods, random forests and elastic net. Training and testing of the prediction models were performed in two non-overlapping subsets. Results: Accuracy of the prediction model is higher in differentiating current and non-current smokers, than that in differentiating past and never smokers. In predicting past and never smokers, elastic net has a higher accuracy for smaller predictor sets compared with random forests. After variable tuning and predictor selection, the performance of random forests in predicting past and never smokers increases for all predictor sets. Conclusion: This study suggested that machine learning approaches could be utilized in understanding smoking risks using DNA methylation data with a relatively small set of DNA methylation data.

Table of Contents

Background Methods Study population DNA methylation data DNAm sites selection Machine learning methods and data analysis Random forests Elastic net regularization Results Sample characteristics Classification of current and non-current smoking Classification of past and never smoking Optimization of tuning parameters Predictor selection Summary Discussion References Appendix

About this Master's Thesis

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
Subfield / Discipline
  • English
Research Field
Committee Chair / Thesis Advisor
Last modified

Primary PDF

Supplemental Files