Machine Learning-Based Drug-Drug Interaction Prediction Using Multi-Feature Analysis Restricted; Files Only

Feng, Qiuyang (Spring 2025)

Permanent URL: https://etd.library.emory.edu/concern/etds/0k225c57x?locale=en++PublishedPublished
Published

Abstract

Drug–drug interactions (DDIs) remain a critical concern in healthcare, as the concurrent administration of multiple drugs often results in adverse effects through pharmacodynamic mechanisms. While various machine learning algorithms have been proposed to predict potential DDIs, most focus solely on whether an interaction exists. In this study, we integrate data from DrugBank, KEGG, and PubChem to derive four key drug features—molecular fingerprints, enzymes, pathways, and targets. Additionally, we extract 138 categories of DDIs from DrugBank, using LLM to convert free-text descriptions into triplet representations and categorizing the resulting interactions as either direction-independent or direction-dependent. To evaluate the performance of multiple machine learning models (k-nearest neighbors, random forest, XGBoost, and deep neural networks [DNN]) on DDI prediction, we address data imbalance by applying the Synthetic Minority Over-sampling Technique (SMOTE). Our DNN model achieves an overall accuracy of 88.32% and demonstrates a 6% increase in PR AUC, underscoring the effectiveness of the proposed methodology for more accurate DDI prediction.

The source code and data are available at https://github.com/FrankFengF/Drug-drug-interaction-prediction-

Keywords: drug-drug interaction, machine learning, LLM, direction-dependent interaction, SMOTE, deep neural networks   

Table of Contents

1. Introduction........................................................................................................ 1

2.    Material and Methods........................................................................................... 3

       2.1 Datasets............................................................................................................ 3

       2.2 Feature Extraction............................................................................................ 4

3.    Method................................................................................................................... 5

       3.1 Jaccard Similarity.............................................................................................. 5

       3.2 SMOTE.............................................................................................................. 6

4.    Results.................................................................................................................... 8

Conclusion.................................................................................................................... 10

References.................................................................................................................... 10

About this Honors Thesis

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Degree
Submission
Language
  • English
Research Field
Keyword
Committee Chair / Thesis Advisor
Committee Members
Last modified Preview image embargoed

Primary PDF

Supplemental Files