Major depressive disorder (MDD) is a highly debilitating condition. Early treatment optimization is crucial for a favorable prognosis, but reliably predicting who is most likely to benefit from which treatment remains a major challenge. One way to address the problem is through a better understanding of the heterogeneity in the disease. Previous research identified language use as a potential indicator of individual differences in depression, and recent technological advancements permit a more systematic approach to the use of language in this regard. In the current study, we demonstrate how large language models (LLMs) can be used to identify sub-types of depression in the early stages of treatment based on people’s natural speech productions. We introduce a computational technique for determining the relative similarity of two narratives by measuring how one narrative affects an LLM’s ability to predict sentences in another narrative when it is used as a context. The resulting narrative similarities were analyzed using hierarchical clustering to reveal three major subgroups of depression. Subsequent feature analyses indicated distinguishing semantic and syntactic properties of each cluster and predictions about future remission status. The findings demonstrate how AI models applied to the analysis of people’s natural speech can be used in subtyping and predicting treatment outcomes for depression. 

About this Master's Thesis

