Measuring Test Measurement Error: A General Approach
Test-based accountability as well as value-added asessments and much experimental and quasi-experimental research in education rely on achievement tests to measure student skills and knowledge. Yet, we know little regarding fundamental properties of these tests, an important example being the extent...
Saved in:
| Published in | Journal of educational and behavioral statistics Vol. 38; no. 6; pp. 629 - 663 |
|---|---|
| Main Authors | , , , |
| Format | Journal Article |
| Language | English |
| Published |
Los Angeles, CA
SAGE Publications
01.12.2013
American Educational Research Association |
| Subjects | |
| Online Access | Get full text |
| ISSN | 1076-9986 1935-1054 |
| DOI | 10.3102/1076998613508584 |
Cover
| Summary: | Test-based accountability as well as value-added asessments and much experimental and quasi-experimental research in education rely on achievement tests to measure student skills and knowledge. Yet, we know little regarding fundamental properties of these tests, an important example being the extent of measurement error and its implications for educational policy and practice. While test vendors provide estimates of split-test reliability, these measures do not account for potentially important day-to-day differences in student performance. In this article, we demonstrate a credible, low-cost approach for estimating the overall extent of measurement error that can be applied when students take three or more tests in the subject of interest (e.g., state assessments in consecutive grades). Our method generalizes the test-retest framework by allowing for (a) growth or decay in knowledge and skills between tests, (b) tests being neither parallel nor vertically scaled, and (c) the degree of measurement error varying across tests. The approach maintains relatively unrestrictive, testable assumptions regarding the structure of student achievement growth. Estimation only requires descriptive statistics (e.g., test-score correlations). With student-level data, the extent and pattern of measurement-error heteroscedasticity also can be estimated. In turn, one can compute Bayesian posterior means of achievement and achievement gains given observed scores—estimators having statistical properties superior to those for the observed score (score gain). We employ math and English language arts test-score data from New York City to demonstrate these methods and estimate the overall extent of test measurement error is at least twice as large as that reported by the test vendor. |
|---|---|
| Bibliography: | SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14 |
| ISSN: | 1076-9986 1935-1054 |
| DOI: | 10.3102/1076998613508584 |