Comparison of Machine Learning Performance Using Analytic and Holistic Coding Approaches Across Constructed Response Assessments Aligned to a Science Learning Progression

We systematically compared two coding approaches to generate training datasets for machine learning (ML): (i) a holistic approach based on learning progression levels and (ii) a dichotomous, analytic approach of multiple concepts in student reasoning, deconstructed from holistic rubrics. We evaluate...

Full description

Saved in:

Bibliographic Details
Published in	Journal of science education and technology Vol. 30; no. 2; pp. 150 - 167
Main Authors	Jescovitch, Lauren N., Scott, Emily E., Cerchiara, Jack A., Merrill, John, Urban-Lurain, Mark, Doherty, Jennifer H., Haudek, Kevin C.
Format	Journal Article
Language	English
Published	Dordrecht Springer Science + Business Media 01.04.2021 Springer Netherlands Springer Springer Nature B.V
Subjects	Algorithms Artificial Intelligence Classification Coding Complexity Computational linguistics Datasets Education Educational Technology Feedback (Response) Holistic Approach Holistic Evaluation Language processing Learning algorithms Learning Processes Logical Thinking Machine learning Man Machine Systems Mathematical analysis Mathematics Natural language interfaces Physiological aspects Physiology Responses Science Education Science Instruction Scoring Scoring Rubrics Training Undergraduate Students Automated analysis Constructed response Analytic rubrics Machine learning Learning progressions Holistic rubrics
Online Access	Get full text
ISSN	1059-0145 1573-1839 1573-1839
DOI	10.1007/s10956-020-09858-0

Cover

More Information
Summary:	We systematically compared two coding approaches to generate training datasets for machine learning (ML): (i) a holistic approach based on learning progression levels and (ii) a dichotomous, analytic approach of multiple concepts in student reasoning, deconstructed from holistic rubrics. We evaluated four constructed response assessment items for undergraduate physiology, each targeting five levels of a developing flux learning progression in an ion context. Human-coded datasets were used to train two ML models: (i) an 8-classification algorithm ensemble implemented in the Constructed Response Classifier (CRC), and (ii) a single classification algorithm implemented in LightSide Researcher’s Workbench. Human coding agreement on approximately 700 student responses per item was high for both approaches with Cohen’s kappas ranging from 0.75 to 0.87 on holistic scoring and from 0.78 to 0.89 on analytic composite scoring. ML model performance varied across items and rubric type. For two items, training sets from both coding approaches produced similarly accurate ML models, with differences in Cohen’s kappa between machine and human scores of 0.002 and 0.041. For the other items, ML models trained with analytic coded responses and used for a composite score, achieved better performance as compared to using holistic scores for training, with increases in Cohen’s kappa of 0.043 and 0.117. These items used a more complex scenario involving movement of two ions. It may be that analytic coding is beneficial to unpacking this additional complexity.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1059-0145 1573-1839 1573-1839
DOI:	10.1007/s10956-020-09858-0