Learning-Based 6-DOF Control for Autonomous Proximity Operations Under Motion Constraints

This article proposes areinforcement learning (RL)-based six-degree-of-freedom (6-DOF) control scheme for the final-phase proximity operations of spacecraft. The main novelty of the proposed method are from two aspects: 1) The closed-loop performance can be improved in real-time through the RL techn...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on aerospace and electronic systems Vol. 57; no. 6; pp. 4097 - 4109
Main Authors Hu, Qinglei, Yang, Haoyang, Dong, Hongyang, Zhao, Xiaowei
Format Journal Article
LanguageEnglish
Published New York IEEE 01.12.2021
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text
ISSN0018-9251
1557-9603
DOI10.1109/TAES.2021.3094628

Cover

More Information
Summary:This article proposes areinforcement learning (RL)-based six-degree-of-freedom (6-DOF) control scheme for the final-phase proximity operations of spacecraft. The main novelty of the proposed method are from two aspects: 1) The closed-loop performance can be improved in real-time through the RL technique, achieving an online approximate optimal control subject to the full 6-DOF nonlinear dynamics of spacecraft; 2) nontrivial motion constraints of proximity operations are considered and strictly obeyed during the whole control process. As a stepping stone, the dual-quaternion formalism is employed to characterize the 6-DOF dynamics model and motion constraints. Then, an RL-based control scheme is developed under the dual-quaternion algebraic framework to approximate the optimal control solution subject to a cost function and a Hamilton–Jacobi–Bellman equation. In addition, a specially designed barrier function is embedded in the reward function to avoid motion constraint violations. The Lyapunov-based stability analysis guarantees the ultimate boundedness of state errors and the weight of NN estimation errors. Besides, we also show that a PD-like controller under dual-quaternion formulation can be employed as the initial control policy to trigger the online learning process. The boundedness of it is proved by a special Lyapunov strictification method. Simulation results of prototypical spacecraft missions with proximity operations are provided to illustrate the effectiveness of the proposed method.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0018-9251
1557-9603
DOI:10.1109/TAES.2021.3094628