An optimized hybrid deep learning model for code clone detection

Code clones in software system are identical or similar pieces of code. The code is repeatedly generated by the copy and paste program. As a result, every duplicate contains a defect that was detected in one unit and the existing techniques are unable to achieve high accuracy for the code clone dete...

Full description

Saved in:

Bibliographic Details
Published in	International journal of information technology (Singapore. Online) Vol. 17; no. 3; pp. 1589 - 1595
Main Authors	Geetika, Kaur, Navdeep, Kaur, Amandeep
Format	Journal Article
Language	English
Published	Singapore Springer Nature Singapore 01.04.2025 Springer Nature B.V
Subjects	Accuracy Artificial Intelligence Artificial neural networks Cloning Computer Imaging Computer Science Datasets Deep learning Genetic algorithms Image Processing and Computer Vision Infringement Machine Learning Mutation Optimization algorithms Original Research Particle swarm optimization Pattern Recognition and Graphics Plagiarism Recall Semantics Software Software Engineering Velocity Vision CNN GA LSTM PSO Code clone detection Feature set optimization
Online Access	Get full text
ISSN	2511-2104 2511-2112
DOI	10.1007/s41870-024-02383-y

Cover

More Information
Summary:	Code clones in software system are identical or similar pieces of code. The code is repeatedly generated by the copy and paste program. As a result, every duplicate contains a defect that was detected in one unit and the existing techniques are unable to achieve high accuracy for the code clone detection. In this research work, a hybrid deep learning model is proposed which comprises four phases namely pre-processing, feature set generation, feature set optimization and clone detection. We have utilized particle swarm optimization (PSO) and genetic algorithm (GA) for optimization along with convolutional neural network (CNN) and long short-term memory (LSTM) for clone detection. The proposed model is implemented in python and tested on several datasets in terms of accuracy (%), precision (%) and recall (%). In addition to this, the proposed model is compared with existing recent studies in terms of performance and the results show that the proposed hybrid model attains the highest accuracy (94.67%), highest precision (93.12%) and highest recall (93.13%) in case of big clone bench (BCB) dataset. Similarly, our model attains the highest accuracy (93.90%), highest precision (93.50%) and highest recall (93.52%) in case of Google code jam dataset while in case of Java dataset, accuracy, precision and recall are 93.78%, 92.67% and 92.66% respectively.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2511-2104 2511-2112
DOI:	10.1007/s41870-024-02383-y