Limited-information goodness-of-fit testing of diagnostic classification item response models

Despite the growing popularity of diagnostic classification models (e.g., Rupp et al., 2010, Diagnostic measurement: theory, methods, and applications, Guilford Press, New York, NY) in educational and psychological measurement, methods for testing their absolute goodness of fit to real data remain r...

Full description

Saved in:

Bibliographic Details
Published in	British journal of mathematical & statistical psychology Vol. 69; no. 3; pp. 225 - 252
Main Authors	Hansen, Mark, Cai, Li, Monroe, Scott, Li, Zhen
Format	Journal Article
Language	English
Published	England Blackwell Publishing Ltd 01.11.2016 British Psychological Society
Subjects	Algorithms Classification Computer Simulation Data Interpretation, Statistical Diagnosis, Computer-Assisted - methods diagnostic classification models item response models Likelihood Functions limited-information goodness of fit local item independence Matrix Measurement Models, Statistical Numerical Analysis, Computer-Assisted Outcome Assessment (Health Care) - methods Outcome Assessment (Health Care) - statistics & numerical data Reproducibility of Results Sample Size Sensitivity and Specificity Simulation item response models local item independence limited-information goodness of fit diagnostic classification models
Online Access	Get full text
ISSN	0007-1102 2044-8317 2044-8317
DOI	10.1111/bmsp.12074

Cover

More Information
Summary:	Despite the growing popularity of diagnostic classification models (e.g., Rupp et al., 2010, Diagnostic measurement: theory, methods, and applications, Guilford Press, New York, NY) in educational and psychological measurement, methods for testing their absolute goodness of fit to real data remain relatively underdeveloped. For tests of reasonable length and for realistic sample size, full‐information test statistics such as Pearson's X2 and the likelihood ratio statistic G2 suffer from sparseness in the underlying contingency table from which they are computed. Recently, limited‐information fit statistics such as Maydeu‐Olivares and Joe's (2006, Psychometrika, 71, 713) M2 have been found to be quite useful in testing the overall goodness of fit of item response theory models. In this study, we applied Maydeu‐Olivares and Joe's (2006, Psychometrika, 71, 713) M2 statistic to diagnostic classification models. Through a series of simulation studies, we found that M2 is well calibrated across a wide range of diagnostic model structures and was sensitive to certain misspecifications of the item model (e.g., fitting disjunctive models to data generated according to a conjunctive model), errors in the Q‐matrix (adding or omitting paths, omitting a latent variable), and violations of local item independence due to unmodelled testlet effects. On the other hand, M2 was largely insensitive to misspecifications in the distribution of higher‐order latent dimensions and to the specification of an extraneous attribute. To complement the analyses of the overall model goodness of fit using M2, we investigated the utility of the Chen and Thissen (1997, J. Educ. Behav. Stat., 22, 265) local dependence statistic XLD2 for characterizing sources of misfit, an important aspect of model appraisal often overlooked in favour of overall statements. The XLD2 statistic was found to be slightly conservative (with Type I error rates consistently below the nominal level) but still useful in pinpointing the sources of misfit. Patterns of local dependence arising due to specific model misspecifications are illustrated. Finally, we used the M2 and XLD2 statistics to evaluate a diagnostic model fit to data from the Trends in Mathematics and Science Study, drawing upon analyses previously conducted by Lee et al., (2011, IJT, 11, 144).
Bibliography:	istex:05F448094774A3E09E1DE535ACD48EE0388AE1D7 National Science Foundation - No. SES-1260746 ark:/67375/WNG-T4LSF2HR-W Institute of Education Sciences - No. R305D140046 ArticleID:BMSP12074 Appendix S1. 'FM1_HDINA.txt' specifies a restricted higher-order DINA model with Q-matrix structure that matches the data generation.Appendix S2. 'FM2_CRUM_for_one_item.txt' specifies a restricted higher-order model with DINA-like form for all items except item 8, which has a CRUM-like response model.Appendix S3. 'FM3_CRUM_for_all_items.txt' specifies a restricted higher-order model with CRUM-like response model for all items.Appendix S4. 'FM4_omit_paths.txt' specifies a restricted higher-order DINA model in which the dependence of items 5 and 16 on attribute 1 is ignored.Appendix S5. 'FM5_add_paths.txt' specifies a restricted higher-order DINA model in which items 3 and 23 are wrongly specified to depend on attribute 1.Appendix S6. 'FM6_omit_attribute.txt' specifies a restricted higher-order DINA model in which attribute 4 is wrongly omitted (its influence on the item responses is ignored).Appendix S7. 'FM7_add_attribute.txt' specifies a restricted higher-order DINA model in which an extraneous attribute (x5) is specified. SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14 ObjectType-Article-1 ObjectType-Feature-2 content type line 23
ISSN:	0007-1102 2044-8317 2044-8317
DOI:	10.1111/bmsp.12074