Multi-Modal Data Integration with LLMs for Cardiovascular Disease Prediction Restricted; Files Only
Liu, Yiyi (Fall 2024)
Abstract
The integration of multimodal data in healthcare has emerged as a promising avenue for improving diagnostic accuracy and risk assessment. This study explores an approach for predicting cardiovascular disease (CVD) risk by integrating CT images, lung cancer diagnostics, and electronic health records (EHR), aiming to provide a perspective on patient health. The proposed model leverages transformers and fine-tuned large language models (LLMs) for biomedical applications, enabling feature extraction and multimodal data fusion.
The method utilizes a concatenation-based fusion strategy and a cross-attention mechanism to combine image and text features, leveraging their complementary information. Experimental results highlight the critical role of EHR data in achieving high predictive accuracy (72.91% in the text-only model), while image data, although secondary, offers valuable supplementary insights. The multimodal concatenation fusion model achieved an accuracy of 71.47%, indicating that further optimization of fusion strategies is needed to fully exploit the potential of multimodal integration.
Challenges remain in improving the lung cancer diagnostics component, as the performance of BiomedGPT on this dataset is limited, which constrains the overall model's effectiveness. Future work will focus on refining fusion methods, enhancing feature representation with advanced LLMs, and incorporating more detailed diagnostic information to further improve CVD risk prediction.
Table of Contents
Chapter 1 Introduction 1
1.1 Background and Motivation 1
1.2 Clinical Relevance between Lung Cancer and Cardiovascular Disease (CVD) 2
1.3 Problem Statement and Research Goals 3
1.4 Contributions of This Work 4
Chapter 2 Related Work 6
2.1 Cardiovascular Disease Prediction 6
2.2 Lung Cancer Detection and Prediction 7
2.3 Multimodal Data Fusion in Medical Diagnostics 10
2.4 LLMs in Medical Applications 11
Chapter 3 Methodology 13
3.1 Overall Model Architecture 13
3.2 Fine-Tuning BiomedGPT 14
3.3 Transformer 15
3.4 Bidirectional Encoder Representations from Transformers (BERT) 19
3.5 Cross-Attention for Embedding Fusion 21
Chapter 4 Experiments and Results 25
4.1 Dataset 25
4.2 Preprocessing 26
4.3 Experimental Setup 27
Chapter 5 Result Analysis and Discussion 31
5.1 Multimodal CVD Prediction 31
5.2 Advantages and Limitations of the Proposed Model 32
5.3 Challenges in Multimodal Data Integration 32
5.5 Future Works 33
Chapter 6 Conclusion 35
References 36
About this Master's Thesis
School | |
---|---|
Department | |
Degree | |
Submission | |
Language |
|
Research Field | |
关键词 | |
Committee Chair / Thesis Advisor | |
Committee Members |

Primary PDF
Thumbnail | Title | Date Uploaded | Actions |
---|---|---|---|
![]() |
File download under embargo until 09 January 2027 | 2024-11-30 19:39:11 -0500 | File download under embargo until 09 January 2027 |
Supplemental Files
Thumbnail | Title | Date Uploaded | Actions |
---|