When Large Language Models Meet Religious Text Restricted; Files Only
Choi, Jacob (Spring 2024)
Abstract
The field of AI has been quickly expanding outside of Computer Science, including areas such as healthcare, transportation, and the humanities. The intersection between AI and religion is also a growing field, but there exists a lack of computational work done from an application-based perspective. The current intersection in research between AI and religion often involves observing information that the models have learned, such as religious bias. For works that more directly impact communities, commercial AI-powered tools are available to help users learn more about religious texts, but lack transparency, which may be alarming for some.
To contribute to the field of AI application in religion from a computational perspective outside of AI model bias observation, we perform a case study on the Bible by creating a verse extraction tool using deep learning techniques to showcase the process of creating such a tool for religious communities to use. To do this, we first explore a challenge common to those who study the bible by finding references. We utilized a semantic similarity search and the Hungarian algorithm to identify references, which we found infeasible yet impactful. We then introduce six datasets that we use to train a llama-2-7b-chat model to respond to user queries with Bible verses. Additionally, we create two test sets to evaluate models, the first asking fact-based questions and the second asking theological questions. We find that state-of-the-art commercial models still come out on top with the highest accuracy of 62.5 and 58.5, and we describe the next steps to encourage research toward this direction of application-based tools in the computer science domain for religion.
Table of Contents
1 Introduction 1
1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Research Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Background 5
2.1 Background and Trends in NLP for Religious Text Analysis . . . . . 5
2.2 Exploring NLP Tasks in Religious Texts . . . . . . . . . . . . . . . . 6
3 Finding References: An Exploration 9
3.1 Semantic Similarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2 Maximum Weight Matching . . . . . . . . . . . . . . . . . . . . . . . 14
3.3 Takeaway . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4 Model Training 20
4.1 What is a language model? . . . . . . . . . . . . . . . . . . . . . . . . 20
4.2 What is fine-tuning? . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.3 Model Choice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.4 Training Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5 Datasets 25
5.1 Overview of Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
i
5.2 Bible Versions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.3 Instruction Fine Tuning Format . . . . . . . . . . . . . . . . . . . . . 26
5.4 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
5.4.1 Dataset 1: Similarity . . . . . . . . . . . . . . . . . . . . . . . 27
5.4.2 Dataset 2: Named Entity Recognition . . . . . . . . . . . . . . 29
5.4.3 Dataset 3: Version . . . . . . . . . . . . . . . . . . . . . . . . 30
5.4.4 Dataset 4: Situation . . . . . . . . . . . . . . . . . . . . . . . 30
5.4.5 Dataset 5: Single . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.4.6 Dataset 6: References . . . . . . . . . . . . . . . . . . . . . . . 32
5.4.7 Combined Dataset . . . . . . . . . . . . . . . . . . . . . . . . 32
6 Experiments and Results 33
6.1 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
6.2 Further Exploration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
7 Conclusion 39
7.1 Challenges and Limitations . . . . . . . . . . . . . . . . . . . . . . . 39
7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
8 Final Remarks 41
A Appendix 46
Bibliography 50
About this Honors Thesis
School | |
---|---|
Department | |
Degree | |
Submission | |
Language |
|
Research Field | |
关键词 | |
Committee Chair / Thesis Advisor | |
Committee Members |
Primary PDF
Thumbnail | Title | Date Uploaded | Actions |
---|---|---|---|
File download under embargo until 24 May 2025 | 2024-04-28 21:27:26 -0400 | File download under embargo until 24 May 2025 |
Supplemental Files
Thumbnail | Title | Date Uploaded | Actions |
---|