Machine Learning-Based Drug-Drug Interaction Prediction Using Multi-Feature Analysis Restricted; Files Only
Feng, Qiuyang (Spring 2025)
Abstract
Drug–drug interactions (DDIs) remain a critical concern in healthcare, as the concurrent administration of multiple drugs often results in adverse effects through pharmacodynamic mechanisms. While various machine learning algorithms have been proposed to predict potential DDIs, most focus solely on whether an interaction exists. In this study, we integrate data from DrugBank, KEGG, and PubChem to derive four key drug features—molecular fingerprints, enzymes, pathways, and targets. Additionally, we extract 138 categories of DDIs from DrugBank, using LLM to convert free-text descriptions into triplet representations and categorizing the resulting interactions as either direction-independent or direction-dependent. To evaluate the performance of multiple machine learning models (k-nearest neighbors, random forest, XGBoost, and deep neural networks [DNN]) on DDI prediction, we address data imbalance by applying the Synthetic Minority Over-sampling Technique (SMOTE). Our DNN model achieves an overall accuracy of 88.32% and demonstrates a 6% increase in PR AUC, underscoring the effectiveness of the proposed methodology for more accurate DDI prediction.
The source code and data are available at https://github.com/FrankFengF/Drug-drug-interaction-prediction-
Keywords: drug-drug interaction, machine learning, LLM, direction-dependent interaction, SMOTE, deep neural networks
Table of Contents
1. Introduction........................................................................................................ 1
2. Material and Methods........................................................................................... 3
2.1 Datasets............................................................................................................ 3
2.2 Feature Extraction............................................................................................ 4
3. Method................................................................................................................... 5
3.1 Jaccard Similarity.............................................................................................. 5
3.2 SMOTE.............................................................................................................. 6
4. Results.................................................................................................................... 8
Conclusion.................................................................................................................... 10
References.................................................................................................................... 10
About this Honors Thesis
School | |
---|---|
Department | |
Degree | |
Submission | |
Language |
|
Research Field | |
Keyword | |
Committee Chair / Thesis Advisor | |
Committee Members |

Primary PDF
Thumbnail | Title | Date Uploaded | Actions |
---|---|---|---|
![]() |
File download under embargo until 22 May 2027 | 2025-04-09 00:36:33 -0400 | File download under embargo until 22 May 2027 |
Supplemental Files
Thumbnail | Title | Date Uploaded | Actions |
---|