Low Resource RAG: From Slide Data Processing to RAG Systems Öffentlichkeit
Chung, Andrew (Spring 2025)
Abstract
To advance the understanding and development of successful retrieval-augmented generation systems, we examine various components to identify essential elements and potential performance improvements across different methodologies. Through collaboration with Hyundai, we develop a low-resource domain retrieval-augmented generation system designed to answer questions about automotive safety collision tests using information from multimodal slides. Our approach introduces a novel, language model-centric data processing pipeline that effectively transforms slide information into textual content suitable for retrieval and answer generation. We evaluate the performance of different state-of-the-art retrieval-augmented generation frameworks on our processed data, as well as different variations of embedding models. To assess our system's effectiveness, we generate synthetic question-answer pairs from our refined data to test the accuracy of different retrieval models. Furthermore, we create additional synthetic question-answer pairs specifically targeting the multimodal table and chart information extracted from the slides. Our findings indicate that utilizing fine-tuned embedding models and language models with the original retrieval-augmented generation framework achieves the highest accuracy. We also finetune Vision Large Language Models to see if open-sourcing our data processing pipeline is possible. We conclude by outlining next steps to encourage research toward developing open-source retrieval-augmented generation frameworks for low-resource domains.
Table of Contents
Contents
1 Introduction 1
1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Research Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Thesis Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Background 6
2.1 Background and trends in NLP and
information systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Exploring Retrieval-Augmented Generation . . . . . . . . . . . . . . . 7
2.3 Embedding Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4 Fine-tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.5 Multimodal Data Processing in RAG . . . . . . . . . . . . . . . . . . 12
2.5.1 Synthetic QA and Data Generation . . . . . . . . . . . . . . . 14
Approach 16
3.1 Data Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.1.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.1.2 Slide Data Processing . . . . . . . . . . . . . . . . . . . . . . . 18
3.1.3 Additional Headers . . . . . . . . . . . . . . . . . . . . . . . . 20
3.1.4 Synthetic QA Generation . . . . . . . . . . . . . . . . . . . . 20
3.1.5 Evaluation of Synthetic Data . . . . . . . . . . . . . . . . . . 21
3.2 Embedding Model - BGE-M3 . . . . . . . . . . . . . . . . . . . . . . 22
3.2.1 Fine-Tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.3 RAG Frameworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.3.1 Frameworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.3.2 Storage of Data . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3.3 Evaluation of Retrieval . . . . . . . . . . . . . . . . . . . . . . 26
3.4 Open Source VLLMs for OCR Processing . . . . . . . . . . . . . . . . 28
3.4.1 Fine-tuning VLLM . . . . . . . . . . . . . . . . . . . . . . . . 28
3.4.2 Data Preparation . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.4.3 Evaluation of VLLM . . . . . . . . . . . . . . . . . . . . . . . 30
Experiments 34
4.1 Evaluation of RAG and Embedding Models . . . . . . . . . . . . . . 34
4.2 Evaluation of Table and Chart Data . . . . . . . . . . . . . . . . . . 37
4.3 Evaluation of VLLM . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Analysis 41
5.1 Synthetic Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.1.1 Qualitative Analysis . . . . . . . . . . . . . . . . . . . . . . . 41
5.1.2 Quantitative Analysis . . . . . . . . . . . . . . . . . . . . . . . 46
5.2 RAG Frameworks & Retrieval Analysis . . . . . . . . . . . . . . . . . 48
5.2.1 Embedding Models . . . . . . . . . . . . . . . . . . . . . . . . 48
5.2.2 RAG Frameworks . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.3 Qwen 2.5 VL Fine-tuning Analysis . . . . . . . . . . . . . . . . . . . 49
5.3.1 Image Dimensions . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.3.2 Base Model vs. Finetuned Model Data . . . . . . . . . . . . . 50
5.3.3 LLM Judge Preference - Qualitative Analysis . . . . . . . . . 52
Conclusion 54
6.0.1 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . 54
6.0.2 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Appendix 57
A.0.1 Converting image-to-text prompt: . . . . . . . . . . . . . . . . 57
A.0.2 Correction of results: . . . . . . . . . . . . . . . . . . . . . . . 59
Bibliography 60
About this Honors Thesis
- Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School | |
---|---|
Department | |
Degree | |
Submission | |
Language |
|
Research Field | |
Stichwort | |
Committee Chair / Thesis Advisor | |
Committee Members |
Primary PDF
Thumbnail | Title | Date Uploaded | Actions |
---|
Supplemental Files
Thumbnail | Title | Date Uploaded | Actions |
---|---|---|---|
|
/concern/parent/kd17cv244/file_sets/xk81jm707 () | 2025-04-07 20:08:49 -0400 |
|