3D Spatial Information Compression Based Deep Reinforcement Learning for UAV Path Planning in Unknown Environments

In the past decade, unmanned aerial vehicles (UAVs) technology has developed rapidly, while the flexibility and low cost of UAVs make them attractive in many applications. Path planning for UAVs is crucial in most applications, where the path planning for UAVs in unknown, while complex 3D environmen...

Full description

Saved in:

Bibliographic Details
Published in	IEEE open journal of vehicular technology Vol. 6; pp. 2662 - 2676
Main Authors	Wang, Zhipeng, Ng, Soon Xin, El-Hajjar, Mohammed
Format	Journal Article
Language	English
Published	IEEE 2025
Subjects	3D path planning 3D spatial information compression Aerospace electronics Autonomous aerial vehicles Deep reinforcement learning Heuristic algorithms Markov decision processes Navigation partially observable Markov decision process Path planning Simulation Three-dimensional displays Training unknown environment unmanned aerial vehicles
Online Access	Get full text
ISSN	2644-1330 2644-1330
DOI	10.1109/OJVT.2025.3611507

Cover

More Information
Summary:	In the past decade, unmanned aerial vehicles (UAVs) technology has developed rapidly, while the flexibility and low cost of UAVs make them attractive in many applications. Path planning for UAVs is crucial in most applications, where the path planning for UAVs in unknown, while complex 3D environments has also become an urgent challenge to mitigate. In this paper, we consider the unknown 3D environment as a partially observable Markov decision process (POMDP) problem and we derive the Bellman equation without the introduction of belief state (BS) distribution. More explicitly, we use an independent emulator to model the environmental observation history, and obtain an approximate BS distribution of the state through Monte Carlo simulation in the emulator, which eliminates the need for BS calculation to improve training efficiency and path planning performance. Additionally, we propose a three-dimensional spatial information compression (3DSIC) algorithm to continuous POMDP environment that can compress 3D environmental information into 2D, greatly reducing the search space of the path planning algorithms. The simulation results show that our proposed 3D spatial information compression based deep deterministic policy gradient (3DSIC-DDPG) algorithm can improve the training efficiency by 95.9% compared to the traditional DDPG algorithm in unknown 3D environments. Additionally, the efficiency of combining 3DSIC with fast recurrent stochastic value gradient (FRSVG) algorithm, which can be considered as the most advanced state-of-the-art planning algorithm for the UAV, is 95% higher than that of FRSVG without 3DSIC algorithm in unknown environments.
ISSN:	2644-1330 2644-1330
DOI:	10.1109/OJVT.2025.3611507