Exploring PPO in G2RL: A Reinforcement Learning-Based Path Planning Approach to Dynamic Environments

Autonomous navigation in dynamic environments presents significant challenges for reinforcement learning (RL)-based robot navigation, including adapting to real-time obstacle dynamics and ensuring reproducibility of results across frameworks. The Globally Guided Reinforcement Learning (G2RL) framewo...

Full description

Saved in:

Bibliographic Details
Published in	2025 3rd International Conference on Control and Robot Technology (ICCRT) pp. 58 - 64
Main Authors	Yalley, Abraham Kojo, Chen, Yang, Fu, Hao
Format	Conference Proceeding
Language	English
Published	IEEE 16.04.2025
Subjects	Adaptation models Autonomous robots Decision making dynamic environments hierarchical reinforcement learning Optimization Path planning proximal policy optimization Reinforcement learning Reproducibility of results Stability analysis Training Tuning
Online Access	Get full text
DOI	10.1109/ICCRT63554.2025.11072787

Cover

More Information
Summary:	Autonomous navigation in dynamic environments presents significant challenges for reinforcement learning (RL)-based robot navigation, including adapting to real-time obstacle dynamics and ensuring reproducibility of results across frameworks. The Globally Guided Reinforcement Learning (G2RL) framework offers a promising hierarchical approach, combining global path planning with \mathrm{A}^{*} -based algorithms and local decision-making using Double Deep Q-Learning (DDQN). However, value-based methods like DDQN can suffer from instability and suboptimal performance in highly dynamic environments. This paper investigates the feasibility of replacing DDQN with Proximal Policy Optimization (PPO), a policy-gradient method known for its stability and adaptability, within the G2RL framework. Using the original G2RL's environment configuration, and reward structure, this study compares the performance of PPO and DDQN in identical conditions. Both models were trained on a single random map with 10 dynamic obstacles and tested on the same map with 60 obstacles. The results reveal that while the DDQN implementation failed to replicate the original paper's reported performance, PPO demonstrated robustness under dynamic conditions and showed potential as a viable alternative for hierarchical frameworks. This study highlights the importance of reproducibility in RL research and showcases PPO's adaptability, even though its overall performance requires further optimization for real-world applications.
DOI:	10.1109/ICCRT63554.2025.11072787