An optimized hybrid deep learning model for code clone detection
Code clones in software system are identical or similar pieces of code. The code is repeatedly generated by the copy and paste program. As a result, every duplicate contains a defect that was detected in one unit and the existing techniques are unable to achieve high accuracy for the code clone dete...
Saved in:
| Published in | International journal of information technology (Singapore. Online) Vol. 17; no. 3; pp. 1589 - 1595 |
|---|---|
| Main Authors | , , |
| Format | Journal Article |
| Language | English |
| Published |
Singapore
Springer Nature Singapore
01.04.2025
Springer Nature B.V |
| Subjects | |
| Online Access | Get full text |
| ISSN | 2511-2104 2511-2112 |
| DOI | 10.1007/s41870-024-02383-y |
Cover
| Summary: | Code clones in software system are identical or similar pieces of code. The code is repeatedly generated by the copy and paste program. As a result, every duplicate contains a defect that was detected in one unit and the existing techniques are unable to achieve high accuracy for the code clone detection. In this research work, a hybrid deep learning model is proposed which comprises four phases namely pre-processing, feature set generation, feature set optimization and clone detection. We have utilized particle swarm optimization (PSO) and genetic algorithm (GA) for optimization along with convolutional neural network (CNN) and long short-term memory (LSTM) for clone detection. The proposed model is implemented in python and tested on several datasets in terms of accuracy (%), precision (%) and recall (%). In addition to this, the proposed model is compared with existing recent studies in terms of performance and the results show that the proposed hybrid model attains the highest accuracy (94.67%), highest precision (93.12%) and highest recall (93.13%) in case of big clone bench (BCB) dataset. Similarly, our model attains the highest accuracy (93.90%), highest precision (93.50%) and highest recall (93.52%) in case of Google code jam dataset while in case of Java dataset, accuracy, precision and recall are 93.78%, 92.67% and 92.66% respectively. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 2511-2104 2511-2112 |
| DOI: | 10.1007/s41870-024-02383-y |