UniGenCoder: Merging SEQ2SEQ and SEQ2TREE Paradigms for Unified Code Generation

Deep learning-based code generation has completely transformed the way developers write programs today. Existing approaches to code generation have focused either on the Sequence-to-Sequence paradigm, which generates target code as a sequence of tokens, or the Sequence-to-Tree paradigm, which output...

Full description

Saved in:
Bibliographic Details
Published inIEEE/ACM International Conference on Software Engineering: New Ideas and Emerging Technologies Results (Online) pp. 71 - 75
Main Authors Shao, Liangying, Yan, Yanfu, Poshyvanyk, Denys, Su, Jinsong
Format Conference Proceeding
LanguageEnglish
Published IEEE 27.04.2025
Subjects
Online AccessGet full text
ISSN2832-7632
DOI10.1109/ICSE-NIER66352.2025.00020

Cover

More Information
Summary:Deep learning-based code generation has completely transformed the way developers write programs today. Existing approaches to code generation have focused either on the Sequence-to-Sequence paradigm, which generates target code as a sequence of tokens, or the Sequence-to-Tree paradigm, which outputs code as a sequence of actions. While these two paradigms are intuitively complementary, their combination has not been previously explored. By comparing the code generated under these two paradigms, we find that integrating them holds significant potential. In this paper, we propose UniGenCoder for code-related generation tasks, which consists of a shared encoder, a shared decoder with a minimal set of additional parameters to unify two paradigms, and a selector that dynamically chooses optimal paradigm for each instance. Also, during the model training, we first perform the multi-task learning and distillation strategies to facilitate knowledge transfer between two paradigms, and then leverage contrastive learning to train the selector. Experimental results on the text-to-code and code-to-code generation tasks demonstrate the effectiveness of our proposed model. We release our code at https://github.com/DeepLearnXMU/UniGenCoder.
ISSN:2832-7632
DOI:10.1109/ICSE-NIER66352.2025.00020