Adaptive Prefix Filtering for Accurate Code Clone Detection in Conjunction with Meta-learning

This research project has significantly contributed to the advancement of code duplication detection by introducing a new testing method and constructing a highly accurate meta-classifier. The evaluation of various architectures and utilization of a diverse dataset enabled the development of a novel...

Full description

Saved in:
Bibliographic Details
Published inSN computer science Vol. 5; no. 6; p. 789
Main Authors Ralhan, Chavi, Malik, Navneet, Agrawal, Prateek, Gupta, Charu, jatana, Nishtha, Jatain, Divya, Sharma, Geetanjali
Format Journal Article
LanguageEnglish
Published Singapore Springer Nature Singapore 12.08.2024
Springer Nature B.V
Subjects
Online AccessGet full text
ISSN2661-8907
2662-995X
2661-8907
DOI10.1007/s42979-024-03140-5

Cover

More Information
Summary:This research project has significantly contributed to the advancement of code duplication detection by introducing a new testing method and constructing a highly accurate meta-classifier. The evaluation of various architectures and utilization of a diverse dataset enabled the development of a novel and versatile solution for detecting duplicate code in both Java and Python programming languages. The proposed algorithm relies on a set of distinctive qualities represented by 17 code metrics. It efficiently stores the code structure in Abstract Prefix filtering (APF) to identify duplicate code segments. A comprehensive series of experiments is conducted to evaluate the effectiveness of different architectures for meta-classifiers. The goal was to identify an optimal 2-layer stacking meta-classifier to yield precise and reliable results. This process successfully constructed a novel classifier, demonstrating exceptional accuracy in detecting duplicate code. The algorithm was trained on a dataset comprising 19,988 data points, encompassing code metrics from both Java and Python programming languages. This diverse dataset enabled the model to learn and generalize across multiple language paradigms, enhancing its versatility and effectiveness in code clone detection. The results show that the proposed model outperformed the state-of-the-art models, which proves that it is the appropriate choice for constructing a meta-classifier for cloned code detection.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2661-8907
2662-995X
2661-8907
DOI:10.1007/s42979-024-03140-5