An optimized hybrid deep learning model for code clone detection

Code clones in software system are identical or similar pieces of code. The code is repeatedly generated by the copy and paste program. As a result, every duplicate contains a defect that was detected in one unit and the existing techniques are unable to achieve high accuracy for the code clone dete...

Full description

Saved in:
Bibliographic Details
Published inInternational journal of information technology (Singapore. Online) Vol. 17; no. 3; pp. 1589 - 1595
Main Authors Geetika, Kaur, Navdeep, Kaur, Amandeep
Format Journal Article
LanguageEnglish
Published Singapore Springer Nature Singapore 01.04.2025
Springer Nature B.V
Subjects
Online AccessGet full text
ISSN2511-2104
2511-2112
DOI10.1007/s41870-024-02383-y

Cover

More Information
Summary:Code clones in software system are identical or similar pieces of code. The code is repeatedly generated by the copy and paste program. As a result, every duplicate contains a defect that was detected in one unit and the existing techniques are unable to achieve high accuracy for the code clone detection. In this research work, a hybrid deep learning model is proposed which comprises four phases namely pre-processing, feature set generation, feature set optimization and clone detection. We have utilized particle swarm optimization (PSO) and genetic algorithm (GA) for optimization along with convolutional neural network (CNN) and long short-term memory (LSTM) for clone detection. The proposed model is implemented in python and tested on several datasets in terms of accuracy (%), precision (%) and recall (%). In addition to this, the proposed model is compared with existing recent studies in terms of performance and the results show that the proposed hybrid model attains the highest accuracy (94.67%), highest precision (93.12%) and highest recall (93.13%) in case of big clone bench (BCB) dataset. Similarly, our model attains the highest accuracy (93.90%), highest precision (93.50%) and highest recall (93.52%) in case of Google code jam dataset while in case of Java dataset, accuracy, precision and recall are 93.78%, 92.67% and 92.66% respectively.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2511-2104
2511-2112
DOI:10.1007/s41870-024-02383-y