GRRLN: Gated Recurrent Residual Learning Networks for code clone detection

Code clone detection is a critical problem in software development and maintenance domains. It aims to identify functionally identical or similar code fragments within an application. Existing works formulate the code clone detection task as a binary classification problem which predicts a code pair...

Full description

Saved in:

Bibliographic Details
Published in	Journal of software : evolution and process Vol. 36; no. 7
Main Authors	Zhang, Xiangping, Liu, Jianxun, Shi, Min
Format	Journal Article
Language	English
Published	Chichester Wiley Subscription Services, Inc 01.07.2024
Subjects	abstract syntax tree Binary codes Classification code clone detection code representation Fragments Learning Neural networks Recurrent neural networks residual network Semantics Software development
Online Access	Get full text
ISSN	2047-7473 2047-7481
DOI	10.1002/smr.2649

Cover

More Information
Summary:	Code clone detection is a critical problem in software development and maintenance domains. It aims to identify functionally identical or similar code fragments within an application. Existing works formulate the code clone detection task as a binary classification problem which predicts a code pair as a clone or not based on a pre‐defined threshold. In reality, there are various types of code clone subject to the degree of how a pair of code fragments are similar to each other. To investigate the effect of different code clone detection manners on the clone detection result, we propose Gated Recurrent Residual Learning Networks (GRRLN), a novel neural network model for code clone detection. To train GRRLN, we first represent each code fragment as a statement‐level tree sequence derived from the whole syntax tree (AST). Then, a gated recurrent neural network with residual connections is adopted to fully extract the semantics of all individual statement trees together with their dependency relationships across the input statement sequence. Finally, the output representations of code fragments by GRRLN are used for similarity calculation and clone detection. We evaluate GRRLN using two real‐world datasets for code clone detection and clone type classification. Experiments show that GRRLN achieves promising and compelling results and meanwhile needs significantly less time and memory consumption compared with the state‐of‐the‐art methods. Code clone detection is commonly approached as a binary classification task, determining whether code pairs are clones or not based on a fixed threshold. However, code clones exhibit varying degrees of similarity, leading to different types of clones. To explore the impact of detection manners on clone detection results, we proposed a Gated Recurrent Residual Learning Networks for code clone detection task. The experimental results demonstrate that different detection manners yield varying results, even with the same model and dataset.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2047-7473 2047-7481
DOI:	10.1002/smr.2649