What are the differences between student and ChatGPT-generated pseudocode? Detecting AI-generated pseudocode in high school programming using explainable machine learning

The ability of large language models (LLMs) to generate code has raised concerns in computer science education, as students may use tools like ChatGPT for programming assignments. While much research has focused on higher education, especially for languages like Java and Python, little attention has...

Full description

Saved in:

Bibliographic Details
Published in	Education and information technologies Vol. 30; no. 11; pp. 14853 - 14892
Main Authors	Liu, Zifeng, Xing, Wanli, Jiao, Xinyue, Li, Chenglu, Zhu, Wangda
Format	Journal Article
Language	English
Published	New York Springer US 01.07.2025 Springer Nature B.V
Subjects	Accountability Accuracy Algorithms Artificial Intelligence Automation C plus plus Chatbots Cheating Computer Appl. in Social and Behavioral Sciences Computer Science Computers and Education Critical thinking Datasets Decision making Education Educational Technology Elementary Secondary Education Exhibits Generative artificial intelligence Higher education Information Systems Applications (incl.Internet) Integrity Java Language Processing Large language models Literature Reviews Machine learning Natural language processing Plagiarism Problem solving Program Evaluation Programming Languages Python Self Efficacy Students Teachers Thinking Skills Transparency User Interfaces and Human Computer Interaction Online programming education Generative AI Pseudocode Explainable ChatGPT Plagiarism detection
Online Access	Get full text
ISSN	1360-2357 1573-7608
DOI	10.1007/s10639-025-13385-z

Cover

More Information
Summary:	The ability of large language models (LLMs) to generate code has raised concerns in computer science education, as students may use tools like ChatGPT for programming assignments. While much research has focused on higher education, especially for languages like Java and Python, little attention has been given to K-12 settings, particularly for pseudocode. This study seeks to bridge this gap by developing explainable machine learning models for detecting pseudocode plagiarism in online programming education. A comprehensive pseudocode dataset was constructed, comprising 7,838 pseudocode submissions from 2,578 high school students enrolled in an online programming foundations course from 2020 to 2023, along with 6,300 pseudocode samples generated by three versions of ChatGPT. An ensemble model (EM) was then proposed to detect AI-generated pseudocode and was compared with six other baseline models. SHapley Additive exPlanations were used to explain how these models differentiate AI-generated pseudocode from student submissions. The results show that students’ submissions have higher similarity with GPT-3 than with the other two GPT models. The proposed model can achieve a high accuracy score of 98.97%. The differences between AI-generated pseudocode and student submissions lies in several aspects: AI-generated pseudocode often begins with more complex verbs and features shorter sentence lengths. It frequently includes clear numerical or word-based indicators of sequence and tends to incorporate more comments throughout the code. This research provides practical insights for online programming and contributes to developing educational technologies and methods that strengthen academic integrity in such courses.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1360-2357 1573-7608
DOI:	10.1007/s10639-025-13385-z