GRFuzz: A Deep Reinforcement Learning Approach to Python Library Fuzzing with GRPO

In the digital realm, ensuring the security and reliability of systems and software is of paramount importance. Fuzzing has emerged as one of the most effective testing techniques for uncovering vulnerabilities by systematically generating and executing test cases to explore unexpected software beha...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings of the IEEE International Conference on Information Reuse and Integration (Online) pp. 13 - 18
Main Authors	Le-Minh, Viet-Anh, Tran, Hai-Anh, Nguyen, Huy-Hieu, Hoang, Nam-Thang, Tran, Truong X.
Format	Conference Proceeding
Language	English
Published	IEEE 06.08.2025
Subjects	Code coverage Codes Data science Deep reinforcement learning Fuzzing GRPO Libraries Optimization Python Python Libraries Software Software reliability Testing
Online Access	Get full text
ISSN	2835-5776
DOI	10.1109/IRI66576.2025.00011

Cover

More Information
Summary:	In the digital realm, ensuring the security and reliability of systems and software is of paramount importance. Fuzzing has emerged as one of the most effective testing techniques for uncovering vulnerabilities by systematically generating and executing test cases to explore unexpected software behaviors. While numerous studies have explored machine learningenhanced fuzzing for C libraries, research on applying machine learning techniques to fuzzing Python libraries remains limited. Given Python's widespread adoption in critical applications such as web development, data science, and cybersecurity, improving fuzzing efficiency for Python libraries is crucial for strengthening software security. In this paper, we propose a novel approach to fuzzing Python libraries using deep reinforcement learning (DRL), specifically leveraging the Group Relative Policy Optimization (GRPO) algorithm. Unlike traditional fuzzing methods that rely on random or heuristic-based input generation, our method dynamically learns and prioritizes test cases that maximize code coverage. To further enhance fuzzing effectiveness, we employ a greedy optimization strategy to fine-tune key hyperparameters, ensuring optimal performance. Our results demonstrate a 2.27% - 10.24% improvement in code coverage compared to both PPO-based fuzzing and the traditional Pythonfuzz, and a 28.50% - 33.25% reduction in memory consumption compared to PPO-based fuzzing. These results show that GRPO-based fuzzing can improve vulnerability detection while maintaining lower resource consumption.
ISSN:	2835-5776
DOI:	10.1109/IRI66576.2025.00011