Hierarchical Reinforcement Learning Based on Planning Operators

Learning long-horizon manipulation tasks such as stacking, presents a longstanding challenge in the field of robotic manipulation, particularly when using Reinforcement Learning (RL) methods. RL algorithms focus on learning a policy for executing the entire task instead of learning the correct seque...

Full description

Saved in:

Bibliographic Details
Published in	IEEE International Conference on Automation Science and Engineering (CASE) pp. 2006 - 2012
Main Authors	Zhang, Jing, Dean, Emmanuel, Ramirez-Amaro, Karinne
Format	Conference Proceeding
Language	English
Published	IEEE 28.08.2024
Subjects	Automation Computer aided software engineering Planning Reinforcement learning Robots Sequential analysis Stacking Training
Online Access	Get full text
ISSN	2161-8089 2161-8070 2161-8089
DOI	10.1109/CASE59546.2024.10711595

Cover

More Information
Summary:	Learning long-horizon manipulation tasks such as stacking, presents a longstanding challenge in the field of robotic manipulation, particularly when using Reinforcement Learning (RL) methods. RL algorithms focus on learning a policy for executing the entire task instead of learning the correct sequence of actions required to achieve complex goals. While RL aims to find a sequence of actions that maximises the total reward of the task, the main challenge arises when there are infinite possibilities of chaining these actions (e.g. reach, grasp, etc.) to achieve the same task (stacking). In these cases, RL methods may struggle to find the optimal policy. This paper introduces a novel framework that integrates the operator concepts from the symbolic planning domain with hierarchical RL methods. We propose to change the way complex tasks are trained by learning independent policies of the actions defined by high-level operators instead of learning a policy for the complete complex task. Our contribution integrates planning operators (e.g. preconditions and effects) as part of the hierarchical RL algorithm based on the Scheduled Auxiliary Control (SAC-X) method. We developed a dual-purpose high-level operator, which can be used both in holistic planning and as independent, reusable policies. Our approach offers a flexible solution for long-horizon tasks, e.g., stacking and inserting a cube. The experimental results show that our proposed method achieved an average success rate of 97.2% for learning and executing the whole stack. Furthermore, we obtain a high success rate when learning independent policies, e.g. reach (98.9%), lift (99.7%), move (97.4%), etc. The training time is also reduced by 68% when using our proposed approach.
ISSN:	2161-8089 2161-8070 2161-8089
DOI:	10.1109/CASE59546.2024.10711595