Effects of Rater Training on the Assessment of L2 English Oral Proficiency


The main objective of this study was to examine whether a Rater Identity Development (RID) program would increase interrater reliability and improve calibration of scores against benchmarks in the assessment of second/foreign language English oral proficiency. Eleven primary school teachers-as-raters participated. A pretest–intervention/RID–posttest design was employed and data included 220 assessments of student performances. Two types of rater-reliability analyses were conducted: first, estimates of the intraclass correlation coefficient two-way random effects model, in order to indicate the extent to which raters were consistent in their rankings, and second, a many-facet Rasch measurement analysis, extended through FACETS®, to explore variation regarding systematic differences of rater severity/leniency. Results showed improvement in terms of consistency, presumably as a result of training; simultaneously, the differences in severity became greater. Results suggest that future rater training may draw on central components of RID, such as core concepts in language assessment, individual feedback, and social moderation work.

Author Biographies

Pia Sundqvist, University of Oslo and Karlstad University

Associate Professor of English Language Education

Department of Teacher Education and School Research


Erica Sandlund, Karlstad University

Associate Professor of English Linguistics

Gustaf B. Skar, Department of Education, Norwegian University of Science and Technology

Associate Professor of Norwegian Language Education

Department of Education, Norwegian University of Science and Technology

Michael Tengberg, Karlstad University

Professor of Educational Work

Faculty of Arts and Social Sciences, Karlstad University

Section A