Adaptive Impact-Time-Control Cooperative Guidance Law for UAVs Under Time-Varying Velocity Based on Reinforcement Learning

In this study, an adaptive impact-time-control cooperative guidance law based on deep reinforcement learning considering field-of-view (FOV) constraints is proposed for high-speed UAVs with time-varying velocity. Firstly, a reinforcement learning framework for the high-speed UAVs’ guidance problem i...

Full description

Saved in:

Bibliographic Details
Published in	Drones (Basel) Vol. 9; no. 4; p. 262
Main Authors	Liu, Zhenyu, Lei, Gang, Xian, Yong, Ren, Leliang, Li, Shaopeng, Zhang, Daqiao
Format	Journal Article
Language	English
Published	Basel MDPI AG 01.04.2025
Subjects	Accuracy Algorithms Constraints Control theory Cooperative control cooperative guidance Deep learning Energy consumption field-of-view constraints Flying-machines Guidance (motion) High speed Impact velocity impact-time-control guidance Laws, regulations and rules multiple high-speed UAVs Neural networks Numerical analysis Optimization reinforcement learning Robustness Velocity
Online Access	Get full text
ISSN	2504-446X 2504-446X
DOI	10.3390/drones9040262

Cover

More Information
Summary:	In this study, an adaptive impact-time-control cooperative guidance law based on deep reinforcement learning considering field-of-view (FOV) constraints is proposed for high-speed UAVs with time-varying velocity. Firstly, a reinforcement learning framework for the high-speed UAVs’ guidance problem is established. The optimization objective is to maximize the impact velocity; and the constraints for impact time, dive attacking, and FOV are considered simultaneously. The time-to-go estimation method is improved so that it can be applied to high-speed UAVs with time-varying velocity. Then, in order to improve the applicability and robustness of the agent, environmental uncertainties, including aerodynamic parameter errors, observation noise, and target random maneuvers, are incorporated into the training process. Furthermore, inspired by the RL2 algorithm, the recurrent layer is introduced into both the policy and value network. In this way, the agent can automatically adapt to different mission scenarios by updating the hidden states of the recurrent layer. In addition, a compound reward function is designed to train the agent to satisfy the requirements of impact-time control and dive attack simultaneously. Finally, the effectiveness and robustness of the proposed guidance law are validated through numerical simulations conducted across a wide range of scenarios.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2504-446X 2504-446X
DOI:	10.3390/drones9040262