Enhancing queries for code generation with reinforcement learning
We present a reinforcement learning framework that enhances natural language queries to improve DeepSeek code generation. A parametric refiner (Qwen with LoRA) is trained via REINFORCE while the generator remains fixed, using a scalar reward that can combine text similarity (BLEU-4, ROUGE-L, F1, Ove...
        Saved in:
      
    
          | Published in | Scientific reports Vol. 15; no. 1; pp. 37300 - 12 | 
|---|---|
| Main Authors | , , , | 
| Format | Journal Article | 
| Language | English | 
| Published | 
        London
          Nature Publishing Group UK
    
        24.10.2025
     Nature Publishing Group Nature Portfolio  | 
| Subjects | |
| Online Access | Get full text | 
| ISSN | 2045-2322 2045-2322  | 
| DOI | 10.1038/s41598-025-21271-4 | 
Cover
| Summary: | We present a reinforcement learning framework that enhances natural language queries to improve DeepSeek code generation. A parametric refiner (Qwen with LoRA) is trained via REINFORCE while the generator remains fixed, using a scalar reward that can combine text similarity (BLEU-4, ROUGE-L, F1, Overlap) with execution signals (unit tests, syntax/timeout penalties). On the DS1000 benchmark (800 train / 200 test), RL4QE improves the code similarity by 34.3%. Ablations show that BLEU-4 is the most reliable text reward overall (with F1 competitive on a larger scale), and LoRA with rank
outperforms complete fine-tuning on most metrics while being more parameter efficient. The approach is transferred across foundation models (e.g., Qwen1.5/2/2.5 variants), where architecture often matters more than size. RL4QE is easy to integrate in practice (LoRA in attention projections) and supports reproducibility. | 
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23  | 
| ISSN: | 2045-2322 2045-2322  | 
| DOI: | 10.1038/s41598-025-21271-4 |