Cross-Domain Deep Code Search with Meta Learning

Recently, pre-trained programming language models such as Code-BERT have demonstrated substantial gains in code search. Despite their success, they rely on the availability of large amounts of parallel data to fine-tune the semantic mappings between queries and code. This restricts their practicalit...

Full description

Saved in:

Bibliographic Details
Published in	2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE) pp. 487 - 498
Main Authors	Chai, Yitian, Zhang, Hongyu, Shen, Beijun, Gu, Xiaodong
Format	Conference Proceeding
Language	English
Published	ACM 01.05.2022
Subjects	Adaptation models Code Search Codes Data models Deep Learning Few-Shot Learning Java Meta Learning Pre-trained Code Models Solid modeling Structured Query Language Transfer learning
Online Access	Get full text
ISSN	1558-1225
DOI	10.1145/3510003.3510125

Cover

More Information
Summary:	Recently, pre-trained programming language models such as Code-BERT have demonstrated substantial gains in code search. Despite their success, they rely on the availability of large amounts of parallel data to fine-tune the semantic mappings between queries and code. This restricts their practicality in domain-specific languages with relatively scarce and expensive data. In this paper, we propose CDCS, a novel approach for domain-specific code search. CDCS employs a transfer learning framework where an initial program representation model is pre-trained on a large corpus of common programming languages (such as Java and Python), and is further adapted to domain-specific languages such as Solidity and SQL. Un-like cross-language CodeBERT, which is directly fine-tuned in the target language, CDCS adapts a few-shot meta-learning algorithm called MAML to learn the good initialization of model parameters, which can be best reused in a domain-specific language. We evaluate the proposed approach on two domain-specific languages, namely Solidity and SQL, with model transferred from two widely used languages (Python and Java). Experimental results show that CDCS significantly outperforms conventional pre-trained code models that are directly fine-tuned in domain-specific languages, and it is particularly effective for scarce data.
ISSN:	1558-1225
DOI:	10.1145/3510003.3510125