New Statistical Methods for Analyzing Measurement Bias in Multi-Item Surveys Restricted; Files Only

He, Zeling (Summer 2025)

Permanent URL: https://etd.library.emory.edu/concern/etds/c534fq36j?locale=en
Published

Abstract

Item response theory (IRT) provides a principled framework for modeling multi-item assessments by relating item responses to an underlying latent trait. It is widely used in the development and validation of clinical, psychological, and educational instruments. A key assumption in traditional IRT models is that item parameters are invariant across individuals. When this assumption is violated such that individuals with the same latent trait level respond differently due to covariates like age, race, or education, differential item functioning (DIF) arises. DIF represents systematic measurement bias that threatens the validity of group comparisons and can distort inferences drawn from total scores or latent trait estimates. Detecting and adjusting for DIF is therefore essential for ensuring fairness and interpretability in survey-based research.

This dissertation develops a series of likelihood-based models to detect and correct for DIF in multi-item instruments, motivated by challenges in cognitive and functional assessment. The first topic introduces the Likelihood-based Investigation of DIF (LIDIF) framework for binary responses, which incorporates multiple covariates and captures both uniform and non-uniform DIF without relying on anchor item assumptions. The model enables joint estimation and formal inference through Wald-type tests. Simulation studies demonstrate strong performance in parameter recovery, Type I error control, and power for DIF detection.

The second topic extends the LIDIF framework to ordinal item responses by incorporating ordered threshold structures consistent with the graded response model. This extension accommodates the ordinal nature of instruments such as the Functional Activities Questionnaire (FAQ), while preserving the interpretability of DIF parameters and computational efficiency. Application to FAQ responses reveals significant DIF related to participant and informant characteristics. 

The third topic generalizes the model to a longitudinal setting by jointly modeling time-invariant DIF and individual-specific latent trait trajectories. Assuming subject-level linear growth and using Bayesian estimation via Hamiltonian Monte Carlo, the framework yields DIF-adjusted scores that support bias-corrected group comparisons over time. Simulations validate the method’s ability to recover dynamic traits and detect DIF effects, while application to repeated FAQ assessments illustrates how accounting for both measurement bias and individual change leads to smoother and more interpretable representations of functional decline.

Table of Contents

1 Introduction 1

1.1 Background and Motivation 2

1.2 Importance of the Research Problem 4

1.3 Overview of the Dissertation Aims 7

1.4 Dissertation Structure 9

2 Literature Review 11

2.1 Introduction 12

2.2 Item Response Theory (IRT): General Framework 13

2.3 Differential Item Functioning (DIF) Detection 16

2.3.1 Non-IRT-Based Approaches 17

2.3.2 IRT-Based Approaches 18

2.4 Longitudinal IRT and DIF Models 26

2.5 Summary and Gaps in the Literature 29

3 Likelihood-Based Inference for Assessing Measurement Bias in Multi-Item Responses to a Latent Trait 31

3.1 Motivation 32

3.2 Method 33

3.2.1 Model Specification 33

3.2.2 Estimation Procedure 35

3.2.3 Identifiability and Inference 37

3.2.4 Latent Trait Estimation 38

3.2.5 Handling Missing Responses 39

3.3 Simulation Study 40

3.4 Conclusion and Discussion 43

4 Ordinal Extension and Application to Functional Activities Questionnaire Ratings 46

4.1 Motivation 47

4.2 Method 49

4.2.1 Model Specification for Ordinal Data 49

4.2.2 Estimation Under Ordered Threshold Constraints 50

4.3 Simulation Study 52

4.4 Application to Functional Activities Questionnaire (FAQ) 55

4.4.1 FAQ Data and Model Setup 55

4.4.2 DIF Detection and Interpretation 57

4.5 Conclusion and Discussion 59

5 Assessment of Measurement Bias in Longitudinal Multi-Item Responses 65

5.1 Introduction and Motivation 66

5.2 Method 67

5.2.1 Model Specification 65

5.2.2 Challenges with Maximum Likelihood Estimation 67

5.2.3 Bayesian Estimation 69

5.3 Simulation Study 71

5.4 Application to Longitudinal FAQ 72

5.4.1 Study Sample 76

5.4.2 Analysis Results 79

5.5 Conclusion and Discussion 80

A Appendix for Chapter 3 86

B Appendix for Chapter 4 93

C Appendix for Chapter 5 96

Bibliography

About this Dissertation

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Degree
Submission
Language
  • English
Research Field
Keyword
Committee Chair / Thesis Advisor
Committee Members
Last modified Preview image embargoed

Primary PDF

Supplemental Files