Epigenetic prediction of smoking status using machine-learning methods Pubblico
Liu, Tianxiao (Fall 2019)
Abstract
Background: Tobacco smoking has been recognized as a major risk factor for many adverse health outcomes. Although many DNA methylation sites have been reported to be associated with tobacco smoking, few studies have focused on establishing prediction models of smoking status from DNA methylation data. This study aims at smoking status prediction using machine learning algorithms with precision, generalizability and a small number of predictors. Methods: An epigenetic prediction analysis of smoking status was performed on 218 male Caucasian twins, using DNA methylation data and two machine learning methods, random forests and elastic net. Training and testing of the prediction models were performed in two non-overlapping subsets. Results: Accuracy of the prediction model is higher in differentiating current and non-current smokers, than that in differentiating past and never smokers. In predicting past and never smokers, elastic net has a higher accuracy for smaller predictor sets compared with random forests. After variable tuning and predictor selection, the performance of random forests in predicting past and never smokers increases for all predictor sets. Conclusion: This study suggested that machine learning approaches could be utilized in understanding smoking risks using DNA methylation data with a relatively small set of DNA methylation data.
Table of Contents
Background Methods Study population DNA methylation data DNAm sites selection Machine learning methods and data analysis Random forests Elastic net regularization Results Sample characteristics Classification of current and non-current smoking Classification of past and never smoking Optimization of tuning parameters Predictor selection Summary Discussion References Appendix
About this Master's Thesis
School | |
---|---|
Department | |
Subfield / Discipline | |
Degree | |
Submission | |
Language |
|
Research Field | |
Parola chiave | |
Committee Chair / Thesis Advisor |
Primary PDF
Thumbnail | Title | Date Uploaded | Actions |
---|---|---|---|
Epigenetic prediction of smoking status using machine-learning methods () | 2019-12-09 03:00:13 -0500 |
|
Supplemental Files
Thumbnail | Title | Date Uploaded | Actions |
---|