Evaluating Rater-Mediated Assessments with Rasch Measurement Theory and Mokken Scaling Público
Wind, Stefanie Anne (2014)
Abstract
Models based on Rasch Measurement Theory (Rasch, 1960/1980) are frequently used to explore the quality of ratings assigned in large-scale rater-mediated educational assessments (Engelhard, 2013; Wolfe, 2009) because they meet the requirements for invariant measurement. In contrast, the utility of nonparametric models that meet the requirements for invariant measurement for monitoring rating quality is unexplored. Because they are less restrictive, nonparametric models may provide useful information to inform the interpretation and use of rater-assigned scores. The purpose of this study is to describe, illustrate, and extend current indices of rating quality with concepts from Mokken scaling. The major methods used to address the guiding questions for this study include a literature review, illustrative data analyses, and the application of parametric and nonparametric models to data from large-scale rater-mediated assessments. Mokken-based analyses are conducted using the mokken package for the R statistical software program (van der Ark, 2013; R Development Core Team, 2013). Rasch-based analyses are conducted using the Facets program (Linacre, 2010).
Major findings suggest that Mokken scale analysis provide diagnostic information that supplements indices of measurement quality based on Rasch measurement theory. Further, findings suggest that parametric and nonparametric indicators of measurement quality provide related, but slightly different, information about measurement quality in the context of rater-mediated assessments. The diagnostic information provided by the Mokken-based indicators illustrated in this study is especially promising for assessment development, including rater training and the development of scoring rubrics. In response to the increased emphasis on the use of evidence to guide policy and practice in education (Cooper, Levin, & Campbell, 2009; Huff, Steinberg, & Matts, 2010; Mislevy, Steinberg, Breyer, Almond, & Johnson, 2002), the use of assessments that require constructed responses (e.g., essays and portfolios) is increasing, such as those included in the next-generation assessments included in the Race to the Top initiative (U.S. Department of Education, 2010). Within the framework of invariant measurement, this study proposes and applies a coherent set of indicators of rating quality based on measurement models with useful properties that can be used in practice to inform the development, interpretation, and use of rater-mediated assessments.
Table of Contents
Chapter One: Introduction - 4
Theoretical Framework - 6
Statement of the Problem - 21
Purpose of the Study - 21
Research Questions - 22
Definitions - 22
Overview of Dissertation - 27
Chapter Two: Review of Literature - 29
What are the major underlying measurement issues related to rater-mediated assessments? - 30
How have these measurement issues been traditionally addressed in previous research? - 34
1. What are the major indices of rater agreement? - 36
2. What are the major indices of rater errors and systematic biases? - 49
3. What are the major indices of rater accuracy? - 56
Summary - 63
Chapter Three: Illustration of Modern Rating Quality Indices based on Rasch Measurement Theory - 68
What is Item Response Theory? - 70
What is Rasch Measurement Theory? - 73
Rasch Measurement Theory for Dichotomous Data - 74
Rasch Measurement Theory for Polytomous Ratings - 81
Using Rasch Measurement Theory to Examine the Quality of Ratings - 84
Model I: Many-Facet Rasch Model for Rater Invariance - 90
Model II: Many-Facet Rasch Model for Rater Accuracy - 103
Summary - 108
Chapter Four: Illustration of Modern Rating Quality Indices based on Mokken Scaling - 110
What is Mokken Scaling? - 110
Mokken Scaling for Dichotomous Data - 115
Mokken Scaling for Polytomous Ratings - 131
Using Mokken Scaling to Examine the Quality of Ratings - 136
Model III: Monotone Homogeneity for Ratings (MH-R) Model - 137
Model IV: Double Monotonicity for Ratings (DM-R) model - 143
Summary - 153
Chapter Five: Examining Rating Scales using Rasch and Mokken Models for Rater-Mediated Assessments - 155
Introduction - 155
Purpose - 157
Research Questions - 157
Procedures - 158
Data Analysis - 159
Results - 176
Conclusions - 183
Chapter Six: Discussion and Conclusions - 187
Research Question 1: What are the major underlying measurement issues related to rating quality? - 188
Research Question 2: How have these measurement issues been traditionally addressed in previous research? - 190
Research Question 3: How has Rasch measurement theory been used to examine the quality of ratings? - 192
Research Question 4: How can Mokken scaling be used to examine the quality of ratings? - 193
Research Question 5: What is the relationship between Rasch- and Mokken-based indices of rating quality? - 195
Limitations - 197
Implications for Research, Theory, Policy, and Practice - 198
References - 203
About this Dissertation
School | |
---|---|
Department | |
Degree | |
Submission | |
Language |
|
Research Field | |
Palavra-chave | |
Committee Chair / Thesis Advisor | |
Committee Members |
Primary PDF
Thumbnail | Title | Date Uploaded | Actions |
---|---|---|---|
Evaluating Rater-Mediated Assessments with Rasch Measurement Theory and Mokken Scaling () | 2018-08-28 14:57:56 -0400 |
|
Supplemental Files
Thumbnail | Title | Date Uploaded | Actions |
---|