A Support Vector Machine based approach for plagiarism detection in Python code submissions in undergraduate settings

Mechanisms for plagiarism detection play a crucial role in maintaining academic integrity, acting both to penalize wrongdoing while also serving as a preemptive deterrent for bad behavior. This manuscript proposes a customized plagiarism detection algorithm tailored to detect source code plagiarism...

Full description

Saved in:
Bibliographic Details
Published inFrontiers in computer science (Lausanne) Vol. 6
Main Authors Gandhi, Nandini, Gopalan, Kaushik, Prasad, Prajish
Format Journal Article
LanguageEnglish
Published Frontiers Media S.A 13.06.2024
Subjects
Online AccessGet full text
ISSN2624-9898
2624-9898
DOI10.3389/fcomp.2024.1393723

Cover

More Information
Summary:Mechanisms for plagiarism detection play a crucial role in maintaining academic integrity, acting both to penalize wrongdoing while also serving as a preemptive deterrent for bad behavior. This manuscript proposes a customized plagiarism detection algorithm tailored to detect source code plagiarism in the Python programming language. Our approach combines textual and syntactic techniques, employing a support vector machine (SVM) to effectively combine various indicators of similarity and calculate the resulting similarity scores. The algorithm was trained and tested using a sample of code submissions of 4 coding problems each from 45 volunteers; 15 of these were original submissions while the other 30 were plagiarized samples. The submissions of two of the questions was used for training and the other two for testing-using the leave-p-out cross-validation strategy to avoid overfitting. We compare the performance of the proposed method with two widely used tools-MOSS and JPlag—and find that the proposed method results in a small but significant improvement in accuracy compared to JPlag, while significantly outperforming MOSS in flagging plagiarized samples.
ISSN:2624-9898
2624-9898
DOI:10.3389/fcomp.2024.1393723