Automated Assessment of Concrete Language in Clinical High-Risk for Psychosis: A Novel Large Model Approach Open Access

Dixon, Benjamin (Spring 2025)

Permanent URL: https://etd.library.emory.edu/concern/etds/tb09j728n?locale=en++PublishedPublished
Published

Abstract

           Language disturbances are key indicators of altered thought processes and serve as reliable markers for emerging psychotic disorders, making them a crucial target for early detection. Current diagnostic methods, relying primarily on behavioral observation and self-reporting, are limited in their ability to predict schizophrenia conversion among clinical high-risk (CHR) populations. Research has shown that individuals with schizophrenia tend to have difficulty processing abstract concepts. Examining concreteness in CHR individuals may reveal deficits in abstract language before psychosis onset. The development of automated tools using large language models offers a novel approach to quantifying these linguistic features objectively and at scale, potentially advancing our ability to detect early warning signs of psychosis. Participants: The study includes 225 CHR and 62 matched healthy controls (HC) in a first approach and 385 CHR and 82 HC in a second approach from the Accelerating Medicines Partnership® in Schizophrenia (AMP® SCZ) dataset. All participants underwent an open-ended interview at their baseline visit. Analysis: Interviewee speech is extracted. Within each sentence, content words (nouns, verbs, adjectives) are identified and sequentially occluded. For each occluded word, Llama-3 generates a “contrast set” of alternative predictions based on the preceding sentences of context and compares the concreteness of each to the occluded word. A second approach directly prompts the model for a word’s concreteness. Using both approaches, no significant differences between HC and CHR are found at baseline. However, visual examination of CHR pilot data extending past baseline reveals a bimodal distribution, indicating the possibility of a CHR subset with higher levels of concrete speech. The limitations and areas for improvement of the current method are discussed. The novel methodological approach leverages Llama-3 to provide a scalable alternative to manual concreteness ratings. By generating and comparing contextually-appropriate word alternatives, this approach captures subtle linguistic differences that may characterize early psychosis risk. Future research will explore longitudinal changes in concreteness in CHR, as a subset of individuals may exhibit heightened concrete language use that could serve as a predictive marker. Additionally, investigating linguistic concreteness within specific cognitive contexts could help elucidate the heterogeneity of specific deficits in the psychosis spectrum. 

Table of Contents

Introduction......................................................................................................................... 1

Background......................................................................................................................... 2

1.1 Linguistic disturbances in schizophrenia...................................................................... 2

1.2 The use of natural language processing tools for speech analysis................................ 3

1.3 Concreteness / abstractness in psychosis....................................................................... 4

1.4 Importance of context for concreteness / abstractness.................................................. 5

1.5 Using LLMs to make psycholinguistic judgements...................................................... 6

1.6 The present study........................................................................................................... 6

Method and Materials.......................................................................................................... 8

2.1 Dataset........................................................................................................................... 8

2.2 Model............................................................................................................................. 8

2.3 Interviewee selection..................................................................................................... 8

2.4 Transcript processing..................................................................................................... 8

2.5 Generating concreteness ratings.................................................................................... 9

2.6 Data analysis plan........................................................................................................ 10

2.6.1 Statistical Analysis Framework................................................................................ 12

2.6.2 Visualization Strategy............................................................................................... 13

Results............................................................................................................................... 13

3.1 Approach 1.................................................................................................................. 13

3.2 Approach 2.................................................................................................................. 15

3.3 Analysis by part-of-speech.......................................................................................... 16

Discussion......................................................................................................................... 16

4.1 Interpreting absence of concreteness differences in prodromal speech...................... 16

4.2 Symptom progression may influence concreteness.................................................... 18

4.3 Limitations and possible improvements...................................................................... 18

4.3.1 Employment of compensatory strategies by CHR................................................... 19           

4.3.2 Insufficient definition of concreteness..................................................................... 20

4.3.3 Contrast set generation............................................................................................. 20

4.4 Possible manifestations of reduced abstract speech in other linguistic domains........ 21           

4.4.1 Concreteness in creative reasoning.......................................................................... 22

4.4.2 Concreteness in metaphor........................................................................................ 23

Conclusion......................................................................................................................... 24

References......................................................................................................................... 26

Appendix........................................................................................................................... 31

About this Honors Thesis

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Degree
Submission
Language
  • English
Research Field
Keyword
Committee Chair / Thesis Advisor
Committee Members
Last modified

Primary PDF

Supplemental Files