Evaluating Rater-Mediated Assessments with Rasch Measurement Theory and Mokken Scaling Open Access

Wind, Stefanie Anne (2014)

Permanent URL: https://etd.library.emory.edu/concern/etds/2f75r8811?locale=en
Published

Abstract

Models based on Rasch Measurement Theory (Rasch, 1960/1980) are frequently used to explore the quality of ratings assigned in large-scale rater-mediated educational assessments (Engelhard, 2013; Wolfe, 2009) because they meet the requirements for invariant measurement. In contrast, the utility of nonparametric models that meet the requirements for invariant measurement for monitoring rating quality is unexplored. Because they are less restrictive, nonparametric models may provide useful information to inform the interpretation and use of rater-assigned scores. The purpose of this study is to describe, illustrate, and extend current indices of rating quality with concepts from Mokken scaling. The major methods used to address the guiding questions for this study include a literature review, illustrative data analyses, and the application of parametric and nonparametric models to data from large-scale rater-mediated assessments. Mokken-based analyses are conducted using the mokken package for the R statistical software program (van der Ark, 2013; R Development Core Team, 2013). Rasch-based analyses are conducted using the Facets program (Linacre, 2010).

Major findings suggest that Mokken scale analysis provide diagnostic information that supplements indices of measurement quality based on Rasch measurement theory. Further, findings suggest that parametric and nonparametric indicators of measurement quality provide related, but slightly different, information about measurement quality in the context of rater-mediated assessments. The diagnostic information provided by the Mokken-based indicators illustrated in this study is especially promising for assessment development, including rater training and the development of scoring rubrics. In response to the increased emphasis on the use of evidence to guide policy and practice in education (Cooper, Levin, & Campbell, 2009; Huff, Steinberg, & Matts, 2010; Mislevy, Steinberg, Breyer, Almond, & Johnson, 2002), the use of assessments that require constructed responses (e.g., essays and portfolios) is increasing, such as those included in the next-generation assessments included in the Race to the Top initiative (U.S. Department of Education, 2010). Within the framework of invariant measurement, this study proposes and applies a coherent set of indicators of rating quality based on measurement models with useful properties that can be used in practice to inform the development, interpretation, and use of rater-mediated assessments.

Table of Contents

Chapter One: Introduction - 4

Theoretical Framework - 6

Statement of the Problem - 21

Purpose of the Study - 21

Research Questions - 22

Definitions - 22

Overview of Dissertation - 27

Chapter Two: Review of Literature - 29

What are the major underlying measurement issues related to rater-mediated assessments? - 30

How have these measurement issues been traditionally addressed in previous research? - 34

1. What are the major indices of rater agreement? - 36

2. What are the major indices of rater errors and systematic biases? - 49

3. What are the major indices of rater accuracy? - 56

Summary - 63

Chapter Three: Illustration of Modern Rating Quality Indices based on Rasch Measurement Theory - 68

What is Item Response Theory? - 70

What is Rasch Measurement Theory? - 73

Rasch Measurement Theory for Dichotomous Data - 74

Rasch Measurement Theory for Polytomous Ratings - 81

Using Rasch Measurement Theory to Examine the Quality of Ratings - 84

Model I: Many-Facet Rasch Model for Rater Invariance - 90

Model II: Many-Facet Rasch Model for Rater Accuracy - 103

Summary - 108

Chapter Four: Illustration of Modern Rating Quality Indices based on Mokken Scaling - 110

What is Mokken Scaling? - 110

Mokken Scaling for Dichotomous Data - 115

Mokken Scaling for Polytomous Ratings - 131

Using Mokken Scaling to Examine the Quality of Ratings - 136

Model III: Monotone Homogeneity for Ratings (MH-R) Model - 137

Model IV: Double Monotonicity for Ratings (DM-R) model - 143

Summary - 153

Chapter Five: Examining Rating Scales using Rasch and Mokken Models for Rater-Mediated Assessments - 155

Introduction - 155

Purpose - 157

Research Questions - 157

Procedures - 158

Data Analysis - 159

Results - 176

Conclusions - 183

Chapter Six: Discussion and Conclusions - 187

Research Question 1: What are the major underlying measurement issues related to rating quality? - 188

Research Question 2: How have these measurement issues been traditionally addressed in previous research? - 190

Research Question 3: How has Rasch measurement theory been used to examine the quality of ratings? - 192

Research Question 4: How can Mokken scaling be used to examine the quality of ratings? - 193

Research Question 5: What is the relationship between Rasch- and Mokken-based indices of rating quality? - 195

Limitations - 197

Implications for Research, Theory, Policy, and Practice - 198

References - 203

About this Dissertation

Rights statement
  • Permission granted by the author to include this thesis or dissertation in this repository. All rights reserved by the author. Please contact the author for information regarding the reproduction and use of this thesis or dissertation.
School
Department
Degree
Submission
Language
  • English
Research field
Keyword
Committee Chair / Thesis Advisor
Committee Members
Last modified

Primary PDF

Supplemental Files