Enhancing queries for code generation with reinforcement learning

We present a reinforcement learning framework that enhances natural language queries to improve DeepSeek code generation. A parametric refiner (Qwen with LoRA) is trained via REINFORCE while the generator remains fixed, using a scalar reward that can combine text similarity (BLEU-4, ROUGE-L, F1, Ove...

Full description

Saved in:

Bibliographic Details
Published in	Scientific reports Vol. 15; no. 1; pp. 37300 - 12
Main Authors	Yuan, Dawei, Liang, Guojun, Li, Tingting, Liu, Suping
Format	Journal Article
Language	English
Published	London Nature Publishing Group UK 24.10.2025 Nature Publishing Group Nature Portfolio
Subjects	639/166 639/705 Code generation Cognition & reasoning Efficiency Entropy Feedback Humanities and Social Sciences Language Large language models Methods multidisciplinary Natural language Optimization Parameter-efficient fine-tuning Prompt engineering Queries Reinforcement Reinforcement learning Science Science (multidisciplinary) Code generation Parameter-efficient fine-tuning Prompt engineering Reinforcement learning
Online Access	Get full text
ISSN	2045-2322 2045-2322
DOI	10.1038/s41598-025-21271-4

Cover

More Information
Summary:	We present a reinforcement learning framework that enhances natural language queries to improve DeepSeek code generation. A parametric refiner (Qwen with LoRA) is trained via REINFORCE while the generator remains fixed, using a scalar reward that can combine text similarity (BLEU-4, ROUGE-L, F1, Overlap) with execution signals (unit tests, syntax/timeout penalties). On the DS1000 benchmark (800 train / 200 test), RL4QE improves the code similarity by 34.3%. Ablations show that BLEU-4 is the most reliable text reward overall (with F1 competitive on a larger scale), and LoRA with rank outperforms complete fine-tuning on most metrics while being more parameter efficient. The approach is transferred across foundation models (e.g., Qwen1.5/2/2.5 variants), where architecture often matters more than size. RL4QE is easy to integrate in practice (LoRA in attention projections) and supports reproducibility.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	2045-2322 2045-2322
DOI:	10.1038/s41598-025-21271-4