Ratings on performance tasks from two scoring sessions of an eighth grade mathematics examination, developed by the California State Department of Education, were used (a) to study the feasibility of estimating IRT rater severity information within a scoring session, (b) to investigate the variation in rater severity within rating sessions (which we called rater drift), and (c) to examine the r...