A Support Vector Machine based approach for plagiarism detection in Python code submissions in undergraduate settings
Mechanisms for plagiarism detection play a crucial role in maintaining academic integrity, acting both to penalize wrongdoing while also serving as a preemptive deterrent for bad behavior. This manuscript proposes a customized plagiarism detection algorithm tailored to detect source code plagiarism...
Saved in:
| Published in | Frontiers in computer science (Lausanne) Vol. 6 |
|---|---|
| Main Authors | , , |
| Format | Journal Article |
| Language | English |
| Published |
Frontiers Media S.A
13.06.2024
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 2624-9898 2624-9898 |
| DOI | 10.3389/fcomp.2024.1393723 |
Cover
| Summary: | Mechanisms for plagiarism detection play a crucial role in maintaining academic integrity, acting both to penalize wrongdoing while also serving as a preemptive deterrent for bad behavior. This manuscript proposes a customized plagiarism detection algorithm tailored to detect source code plagiarism in the Python programming language. Our approach combines textual and syntactic techniques, employing a support vector machine (SVM) to effectively combine various indicators of similarity and calculate the resulting similarity scores. The algorithm was trained and tested using a sample of code submissions of 4 coding problems each from 45 volunteers; 15 of these were original submissions while the other 30 were plagiarized samples. The submissions of two of the questions was used for training and the other two for testing-using the leave-p-out cross-validation strategy to avoid overfitting. We compare the performance of the proposed method with two widely used tools-MOSS and JPlag—and find that the proposed method results in a small but significant improvement in accuracy compared to JPlag, while significantly outperforming MOSS in flagging plagiarized samples. |
|---|---|
| ISSN: | 2624-9898 2624-9898 |
| DOI: | 10.3389/fcomp.2024.1393723 |