When Large Language Models Meet Religious Text 公开

Choi, Jacob (Spring 2024)

Permanent URL: https://etd.library.emory.edu/concern/etds/j9602216f?locale=zh

Published

Abstract

The field of AI has been quickly expanding outside of Computer Science, including areas such as healthcare, transportation, and the humanities. The intersection between AI and religion is also a growing field, but there exists a lack of computational work done from an application-based perspective. The current intersection in research between AI and religion often involves observing information that the models have learned, such as religious bias. For works that more directly impact communities, commercial AI-powered tools are available to help users learn more about religious texts, but lack transparency, which may be alarming for some.

To contribute to the field of AI application in religion from a computational perspective outside of AI model bias observation, we perform a case study on the Bible by creating a verse extraction tool using deep learning techniques to showcase the process of creating such a tool for religious communities to use. To do this, we first explore a challenge common to those who study the bible by finding references. We utilized a semantic similarity search and the Hungarian algorithm to identify references, which we found infeasible yet impactful. We then introduce six datasets that we use to train a llama-2-7b-chat model to respond to user queries with Bible verses. Additionally, we create two test sets to evaluate models, the first asking fact-based questions and the second asking theological questions. We find that state-of-the-art commercial models still come out on top with the highest accuracy of 62.5 and 58.5, and we describe the next steps to encourage research toward this direction of application-based tools in the computer science domain for religion.

1 Introduction 1

1.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Research Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Background 5

2.1 Background and Trends in NLP for Religious Text Analysis . . . . . 5

2.2 Exploring NLP Tasks in Religious Texts . . . . . . . . . . . . . . . . 6

3 Finding References: An Exploration 9

3.1 Semantic Similarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.2 Maximum Weight Matching . . . . . . . . . . . . . . . . . . . . . . . 14

3.3 Takeaway . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4 Model Training 20

4.1 What is a language model? . . . . . . . . . . . . . . . . . . . . . . . . 20

4.2 What is fine-tuning? . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4.3 Model Choice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4.4 Training Pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

5 Datasets 25

5.1 Overview of Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

5.2 Bible Versions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

5.3 Instruction Fine Tuning Format . . . . . . . . . . . . . . . . . . . . . 26

5.4 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

5.4.1 Dataset 1: Similarity . . . . . . . . . . . . . . . . . . . . . . . 27

5.4.2 Dataset 2: Named Entity Recognition . . . . . . . . . . . . . . 29

5.4.3 Dataset 3: Version . . . . . . . . . . . . . . . . . . . . . . . . 30

5.4.4 Dataset 4: Situation . . . . . . . . . . . . . . . . . . . . . . . 30

5.4.5 Dataset 5: Single . . . . . . . . . . . . . . . . . . . . . . . . . 31

5.4.6 Dataset 6: References . . . . . . . . . . . . . . . . . . . . . . . 32

5.4.7 Combined Dataset . . . . . . . . . . . . . . . . . . . . . . . . 32

6 Experiments and Results 33

6.1 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

6.2 Further Exploration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

7 Conclusion 39

7.1 Challenges and Limitations . . . . . . . . . . . . . . . . . . . . . . . 39

7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

8 Final Remarks 41

A Appendix 46

Bibliography 50

About this Honors Thesis

Rights statement

Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.

School	Emory College
Department	Computer Science
Degree	B.S.
Submission	Honors Thesis
Language	English
Research Field	Computer Science Artificial Intelligence Religion, Biblical Studies
关键词	Large Language Models Bible Machine Learning Religion AI
Committee Chair / Thesis Advisor	Jinho Choi, Emory University
Committee Members	Helen J. Kim, Emory University Davide Fossati, Emory University Hiram Maxim, Emory University

Primary PDF

Thumbnail	Title	Date Uploaded	Actions
	When Large Language Models Meet Religious Text ()	2024-04-28 21:27:26 -0400	Download

When Large Language Models Meet Religious Text 公开

Choi, Jacob (Spring 2024)

Abstract

Table of Contents

About this Honors Thesis

Primary PDF

Supplemental Files