Learning-Based 6-DOF Control for Autonomous Proximity Operations Under Motion Constraints
This article proposes areinforcement learning (RL)-based six-degree-of-freedom (6-DOF) control scheme for the final-phase proximity operations of spacecraft. The main novelty of the proposed method are from two aspects: 1) The closed-loop performance can be improved in real-time through the RL techn...
Saved in:
| Published in | IEEE transactions on aerospace and electronic systems Vol. 57; no. 6; pp. 4097 - 4109 |
|---|---|
| Main Authors | , , , |
| Format | Journal Article |
| Language | English |
| Published |
New York
IEEE
01.12.2021
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Subjects | |
| Online Access | Get full text |
| ISSN | 0018-9251 1557-9603 |
| DOI | 10.1109/TAES.2021.3094628 |
Cover
| Summary: | This article proposes areinforcement learning (RL)-based six-degree-of-freedom (6-DOF) control scheme for the final-phase proximity operations of spacecraft. The main novelty of the proposed method are from two aspects: 1) The closed-loop performance can be improved in real-time through the RL technique, achieving an online approximate optimal control subject to the full 6-DOF nonlinear dynamics of spacecraft; 2) nontrivial motion constraints of proximity operations are considered and strictly obeyed during the whole control process. As a stepping stone, the dual-quaternion formalism is employed to characterize the 6-DOF dynamics model and motion constraints. Then, an RL-based control scheme is developed under the dual-quaternion algebraic framework to approximate the optimal control solution subject to a cost function and a Hamilton–Jacobi–Bellman equation. In addition, a specially designed barrier function is embedded in the reward function to avoid motion constraint violations. The Lyapunov-based stability analysis guarantees the ultimate boundedness of state errors and the weight of NN estimation errors. Besides, we also show that a PD-like controller under dual-quaternion formulation can be employed as the initial control policy to trigger the online learning process. The boundedness of it is proved by a special Lyapunov strictification method. Simulation results of prototypical spacecraft missions with proximity operations are provided to illustrate the effectiveness of the proposed method. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 0018-9251 1557-9603 |
| DOI: | 10.1109/TAES.2021.3094628 |