GRRLN: Gated Recurrent Residual Learning Networks for code clone detection
Code clone detection is a critical problem in software development and maintenance domains. It aims to identify functionally identical or similar code fragments within an application. Existing works formulate the code clone detection task as a binary classification problem which predicts a code pair...
Saved in:
| Published in | Journal of software : evolution and process Vol. 36; no. 7 |
|---|---|
| Main Authors | , , |
| Format | Journal Article |
| Language | English |
| Published |
Chichester
Wiley Subscription Services, Inc
01.07.2024
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 2047-7473 2047-7481 |
| DOI | 10.1002/smr.2649 |
Cover
| Summary: | Code clone detection is a critical problem in software development and maintenance domains. It aims to identify functionally identical or similar code fragments within an application. Existing works formulate the code clone detection task as a binary classification problem which predicts a code pair as a clone or not based on a pre‐defined threshold. In reality, there are various types of code clone subject to the degree of how a pair of code fragments are similar to each other. To investigate the effect of different code clone detection manners on the clone detection result, we propose Gated Recurrent Residual Learning Networks (GRRLN), a novel neural network model for code clone detection. To train GRRLN, we first represent each code fragment as a statement‐level tree sequence derived from the whole syntax tree (AST). Then, a gated recurrent neural network with residual connections is adopted to fully extract the semantics of all individual statement trees together with their dependency relationships across the input statement sequence. Finally, the output representations of code fragments by GRRLN are used for similarity calculation and clone detection. We evaluate GRRLN using two real‐world datasets for code clone detection and clone type classification. Experiments show that GRRLN achieves promising and compelling results and meanwhile needs significantly less time and memory consumption compared with the state‐of‐the‐art methods.
Code clone detection is commonly approached as a binary classification task, determining whether code pairs are clones or not based on a fixed threshold. However, code clones exhibit varying degrees of similarity, leading to different types of clones. To explore the impact of detection manners on clone detection results, we proposed a Gated Recurrent Residual Learning Networks for code clone detection task. The experimental results demonstrate that different detection manners yield varying results, even with the same model and dataset. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 2047-7473 2047-7481 |
| DOI: | 10.1002/smr.2649 |