Evaluating the Quality of Ratings in Writing Assessment: Rater Agreement, Error, and Accuracy Pubblico
Wind, Stefanie Anne (2012)
Abstract
Evaluating the quality of ratings in writing assessment:
Rater agreement, error, and accuracy
The purpose of this study is to examine the congruence among
methods used to
evaluate the quality of ratings obtained in large-scale performance
assessments. Within
the context of a large-scale writing assessment, this study focuses
on the alignment
between operationally used indices of rater agreement, error and
systematic bias, and
direct measures of accuracy within a traditional and Rasch-based
approach. This study
uses 365 essays from the Georgia High School Writing Test that were
rated by 20
operational raters and by a committee of "expert raters," whose
scores were used to
compute direct accuracy measures. The Facets computer program
(Linacre, 2010) is used
to compute all of the indices of rating quality. Major empirical
findings suggest that
Rasch-based indices of model-data fit for ratings as well as
indices of rater agreement
from Facets (Linacre, 2010) provide information about raters that
is comparable to direct
measures of accuracy. Because direct measures of rater accuracy are
often not attainable
in operational settings, the use of easily obtained approximations
of direct accuracy
measures holds significant implications for monitoring rating
quality in large-scale rater-
mediated performance assessments.
Evaluating the quality of ratings in writing assessment:
Rater agreement, error, and accuracy
Bachelor of Arts
Bachelor of Music
Advisor: George Engelhard, Jr.
A thesis submitted to the Faculty of the
James T. Laney School of Graduate Studies of Emory University in
partial fulfillment of
the requirements for the degree of
Master of Arts
in Educational Studies: Quantitative Methodology
2012
September 12, 2011
Table of Contents
Table of Contents
Theoretical Framework
...................................................................................................................
2!
Invariant Measurement
................................................................................................................
2!
Rasch Measurement Theory
........................................................................................................
3!
Brunswik's Lens Model
..............................................................................................................
6!
Significance of the Study
................................................................................................................
8!
Purpose of the Study
.......................................................................................................................
9!
Research Questions
.........................................................................................................................
9!
Definitions.......................................................................................................................................
9!
Review of the Literature
...............................................................................................................
11!
Rater-Mediated Writing Assessment
............................................................................................
11!
Indices of Rating Quality in Writing Assessment
.........................................................................
12!
Rater Agreement
.......................................................................................................................
13!
Selection of an Agreement Coefficient for Rater-Mediated Writing
Assessments .............. 15!
Rater Error and Systematic Bias
...............................................................................................
16!
Rater Error and Systematic Bias within a Traditional Approach
......................................... 18!
Rater Error and Systematic Bias within a Rasch-based Approach
....................................... 18!
Interpreting Rater Error and Systematic Bias in Context
..................................................... 21!
Rater Accuracy
..........................................................................................................................
21!
Accuracy Measures within a Traditional Approach
.............................................................
22!
Accuracy Measures within a Rasch-based Approach
...........................................................
24!
Interpreting Rater Accuracy in Context
................................................................................
25!
Methodology
.................................................................................................................................
26!
Instrument
..................................................................................................................................
26!
Participants
................................................................................................................................
27!
Data Analysis
................................................................................................................................
28!
Selected Indices within a Traditional Approach
.......................................................................
29!
Selected Indices within a Rasch-based Approach
.....................................................................
30!
Results
...........................................................................................................................................
31!
Summary of Statistical and Psychometric Measures
................................................................
31!
Calibration of Ratings
...........................................................................................................
31!
Domain Calibrations! for Ratings
......................................................................................
32!
Rating Scale Category Use
...................................................................................................
32!
Calibration of Accuracy Scores
............................................................................................
33!
Domain Calibrations for Accuracy Ratings
..........................................................................
33!
Rating Quality Analyses
............................................................................................................
34!
Rater Agreement
...................................................................................................................
34!
Rater Error and Systematic Bias
...........................................................................................
34!
Rater Accuracy
......................................................................................................................
36!
Comparison of Rating Quality Indices
......................................................................................
36!
Discussion
.....................................................................................................................................
38!
Limitations and Delimitations
...................................................................................................
41!
Conclusions in terms of Research Questions
............................................................................
43!
Research Question One
.........................................................................................................
43!
Research Question Two
........................................................................................................
45!
Implications
...............................................................................................................................
46!
Theory
...................................................................................................................................
46!
Research
................................................................................................................................
46!
Policy and Practice
................................................................................................................
47!
References
.....................................................................................................................................
49!
List of Tables and Figures
Table 1. Instrument
Description.......................................................................................
55
Table 2. Categories of Rating Quality Indices
.................................................................
56
Table 3. Summary Statistics for Ratings
..........................................................................
57
Table 4. Indices of Rater Error and Systematic Bias
....................................................... 58
Table 5. Calibration of the Domain Facet
........................................................................
59
Table 6. Rating Scale Structure
.......................................................................................
60
Table 7. Summary Statistics for Accuracy Ratings
......................................................... 61
Table 8. Indices of Rater Accuracy
..................................................................................
62
Table 9. Calibration of Accuracy Ratings within Domains
............................................. 63
Table 10. Correlations Among Traditional Indices
......................................................... 64
Table 11. Correlations Among Rasch-based Indices
....................................................... 65
Figure 1. Brunswik's Lens Model for Probabilistic Functionalism
................................. 66
Figure 2. Variable Map for Rating Data
..........................................................................
67
Figure 3. Variable Map for Accuracy Data
.....................................................................
68
Figure 4. Scatter Plots for Traditional Rating Quality Indices
........................................ 69
Figure 5. Scatter Plots for Rasch-based Rating Quality Indices
...................................... 70
Appendix A
IRB Documentation
.........................................................................................................
71
About this Master's Thesis
School | |
---|---|
Department | |
Degree | |
Submission | |
Language |
|
Research Field | |
Parola chiave | |
Committee Chair / Thesis Advisor | |
Committee Members |
Primary PDF
Thumbnail | Title | Date Uploaded | Actions |
---|---|---|---|
Evaluating the Quality of Ratings in Writing Assessment: Rater Agreement, Error, and Accuracy () | 2018-08-28 16:10:27 -0400 |
|
Supplemental Files
Thumbnail | Title | Date Uploaded | Actions |
---|