Low Resource RAG: From Slide Data Processing to RAG Systems Public
Chung, Andrew (Spring 2025)
Abstract
To advance the understanding and development of successful retrieval-augmented generation systems, we examine various components to identify essential elements and potential performance improvements across different methodologies. Through collaboration with Hyundai, we develop a low-resource domain retrieval-augmented generation system designed to answer questions about automotive safety collision tests using information from multimodal slides. Our approach introduces a novel, language model-centric data processing pipeline that effectively transforms slide information into textual content suitable for retrieval and answer generation. We evaluate the performance of different state-of-the-art retrieval-augmented generation frameworks on our processed data, as well as different variations of embedding models. To assess our system's effectiveness, we generate synthetic question-answer pairs from our refined data to test the accuracy of different retrieval models. Furthermore, we create additional synthetic question-answer pairs specifically targeting the multimodal table and chart information extracted from the slides. Our findings indicate that utilizing fine-tuned embedding models and language models with the original retrieval-augmented generation framework achieves the highest accuracy. We also finetune Vision Large Language Models to see if open-sourcing our data processing pipeline is possible. We conclude by outlining next steps to encourage research toward developing open-source retrieval-augmented generation frameworks for low-resource domains.
Table of Contents
Contents
1 Introduction 1
1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Research Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Thesis Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Background 6
2.1 Background and trends in NLP and
information systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Exploring Retrieval-Augmented Generation . . . . . . . . . . . . . . . 7
2.3 Embedding Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4 Fine-tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.5 Multimodal Data Processing in RAG . . . . . . . . . . . . . . . . . . 12
2.5.1 Synthetic QA and Data Generation . . . . . . . . . . . . . . . 14
Approach 16
3.1 Data Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.1.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.1.2 Slide Data Processing . . . . . . . . . . . . . . . . . . . . . . . 18
3.1.3 Additional Headers . . . . . . . . . . . . . . . . . . . . . . . . 20
3.1.4 Synthetic QA Generation . . . . . . . . . . . . . . . . . . . . 20
3.1.5 Evaluation of Synthetic Data . . . . . . . . . . . . . . . . . . 21
3.2 Embedding Model - BGE-M3 . . . . . . . . . . . . . . . . . . . . . . 22
3.2.1 Fine-Tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.3 RAG Frameworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.3.1 Frameworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.3.2 Storage of Data . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3.3 Evaluation of Retrieval . . . . . . . . . . . . . . . . . . . . . . 26
3.4 Open Source VLLMs for OCR Processing . . . . . . . . . . . . . . . . 28
3.4.1 Fine-tuning VLLM . . . . . . . . . . . . . . . . . . . . . . . . 28
3.4.2 Data Preparation . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.4.3 Evaluation of VLLM . . . . . . . . . . . . . . . . . . . . . . . 30
Experiments 34
4.1 Evaluation of RAG and Embedding Models . . . . . . . . . . . . . . 34
4.2 Evaluation of Table and Chart Data . . . . . . . . . . . . . . . . . . 37
4.3 Evaluation of VLLM . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Analysis 41
5.1 Synthetic Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.1.1 Qualitative Analysis . . . . . . . . . . . . . . . . . . . . . . . 41
5.1.2 Quantitative Analysis . . . . . . . . . . . . . . . . . . . . . . . 46
5.2 RAG Frameworks & Retrieval Analysis . . . . . . . . . . . . . . . . . 48
5.2.1 Embedding Models . . . . . . . . . . . . . . . . . . . . . . . . 48
5.2.2 RAG Frameworks . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.3 Qwen 2.5 VL Fine-tuning Analysis . . . . . . . . . . . . . . . . . . . 49
5.3.1 Image Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.3.2 Base Model vs. Finetuned Model Data . . . . . . . . . . . . . 50
5.3.3 LLM Judge Preference - Qualitative Analysis . . . . . . . . . 52
Conclusion 54
6.0.1 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . 54
6.0.2 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Appendix 57
A.0.1 Converting image-to-text prompt: . . . . . . . . . . . . . . . . 57
A.0.2 Correction of results: . . . . . . . . . . . . . . . . . . . . . . . 59
Bibliography 60
About this Honors Thesis
School | |
---|---|
Department | |
Degree | |
Submission | |
Language |
|
Research Field | |
Mot-clé | |
Committee Chair / Thesis Advisor | |
Committee Members |
Primary PDF
Thumbnail | Title | Date Uploaded | Actions |
---|
Supplemental Files
Thumbnail | Title | Date Uploaded | Actions |
---|---|---|---|
|
/concern/parent/kd17cv244/file_sets/xk81jm707 () | 2025-04-07 20:08:49 -0400 |
|