Measuring Test Measurement Error: A General Approach

Test-based accountability as well as value-added asessments and much experimental and quasi-experimental research in education rely on achievement tests to measure student skills and knowledge. Yet, we know little regarding fundamental properties of these tests, an important example being the extent...

Full description

Saved in:

Bibliographic Details
Published in	Journal of educational and behavioral statistics Vol. 38; no. 6; pp. 629 - 663
Main Authors	Boyd, Donald, Lankford, Hamilton, Loeb, Susanna, Wyckoff, James
Format	Journal Article
Language	English
Published	Los Angeles, CA SAGE Publications 01.12.2013 American Educational Research Association
Subjects	Academic Achievement Accountability Achievement Gains Achievement Tests Bayesian Statistics Coefficients Correlation Correlations Covariance Education policy Educational Policy Educational Research Educational Testing Effect Size Error of Measurement Error rates Estimating techniques Estimators Generalizability Theory Grade 5 High Stakes Tests International education Language Arts Longitudinal Studies Mathematics Tests Measurement errors Measurement Techniques New York Reading Tests Scores Skill Development Standardized Tests Statistical Analysis Statistical discrepancies Test Reliability Test scores Urban Areas New York United States > US New York City New York high-stakes testing reliability testing effect size correlational analysis generalizability theory longitudinal studies
Online Access	Get full text
ISSN	1076-9986 1935-1054
DOI	10.3102/1076998613508584

Cover

More Information
Summary:	Test-based accountability as well as value-added asessments and much experimental and quasi-experimental research in education rely on achievement tests to measure student skills and knowledge. Yet, we know little regarding fundamental properties of these tests, an important example being the extent of measurement error and its implications for educational policy and practice. While test vendors provide estimates of split-test reliability, these measures do not account for potentially important day-to-day differences in student performance. In this article, we demonstrate a credible, low-cost approach for estimating the overall extent of measurement error that can be applied when students take three or more tests in the subject of interest (e.g., state assessments in consecutive grades). Our method generalizes the test-retest framework by allowing for (a) growth or decay in knowledge and skills between tests, (b) tests being neither parallel nor vertically scaled, and (c) the degree of measurement error varying across tests. The approach maintains relatively unrestrictive, testable assumptions regarding the structure of student achievement growth. Estimation only requires descriptive statistics (e.g., test-score correlations). With student-level data, the extent and pattern of measurement-error heteroscedasticity also can be estimated. In turn, one can compute Bayesian posterior means of achievement and achievement gains given observed scores—estimators having statistical properties superior to those for the observed score (score gain). We employ math and English language arts test-score data from New York City to demonstrate these methods and estimate the overall extent of test measurement error is at least twice as large as that reported by the test vendor.
Bibliography:	SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14
ISSN:	1076-9986 1935-1054
DOI:	10.3102/1076998613508584