Learning-Based 6-DOF Control for Autonomous Proximity Operations Under Motion Constraints

This article proposes areinforcement learning (RL)-based six-degree-of-freedom (6-DOF) control scheme for the final-phase proximity operations of spacecraft. The main novelty of the proposed method are from two aspects: 1) The closed-loop performance can be improved in real-time through the RL techn...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on aerospace and electronic systems Vol. 57; no. 6; pp. 4097 - 4109
Main Authors	Hu, Qinglei, Yang, Haoyang, Dong, Hongyang, Zhao, Xiaowei
Format	Journal Article
Language	English
Published	New York IEEE 01.12.2021 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Aerodynamics Approximate optimal control constrained 6-DOF control Constraint modelling Cost function Degrees of freedom Distance learning Errors Motion stability Nonlinear dynamics Optimal control Proximity Quaternions Real-time systems Reinforcement learning reinforcement learning (RL) Space vehicles Spacecraft spacecraft proximity operations Stability analysis Task analysis
Online Access	Get full text
ISSN	0018-9251 1557-9603
DOI	10.1109/TAES.2021.3094628

Cover

More Information
Summary:	This article proposes areinforcement learning (RL)-based six-degree-of-freedom (6-DOF) control scheme for the final-phase proximity operations of spacecraft. The main novelty of the proposed method are from two aspects: 1) The closed-loop performance can be improved in real-time through the RL technique, achieving an online approximate optimal control subject to the full 6-DOF nonlinear dynamics of spacecraft; 2) nontrivial motion constraints of proximity operations are considered and strictly obeyed during the whole control process. As a stepping stone, the dual-quaternion formalism is employed to characterize the 6-DOF dynamics model and motion constraints. Then, an RL-based control scheme is developed under the dual-quaternion algebraic framework to approximate the optimal control solution subject to a cost function and a Hamilton–Jacobi–Bellman equation. In addition, a specially designed barrier function is embedded in the reward function to avoid motion constraint violations. The Lyapunov-based stability analysis guarantees the ultimate boundedness of state errors and the weight of NN estimation errors. Besides, we also show that a PD-like controller under dual-quaternion formulation can be employed as the initial control policy to trigger the online learning process. The boundedness of it is proved by a special Lyapunov strictification method. Simulation results of prototypical spacecraft missions with proximity operations are provided to illustrate the effectiveness of the proposed method.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0018-9251 1557-9603
DOI:	10.1109/TAES.2021.3094628