Reinforcement Learning With Sequences of Motion Primitives for Robust Manipulation

Physical contact events often allow a natural decomposition of manipulation tasks into action phases and subgoals. Within the motion primitive paradigm, each action phase corresponds to a motion primitive, and the subgoals correspond to the goal parameters of these primitives. Current state-of-the-a...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on robotics Vol. 28; no. 6; pp. 1360 - 1370
Main Authors	Stulp, F., Theodorou, E. A., Schaal, S.
Format	Journal Article
Language	English
Published	New York, NY IEEE 01.12.2012 Institute of Electrical and Electronics Engineers The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Adaptative systems Adaptive systems Algorithms Applied sciences Artificial intelligence Computer Science Computer science; control theory; systems Control theory. Systems Exact sciences and technology Grasping Integrals Learning Learning and adaptive systems Learning systems Manipulation manipulation planning Manipulators Optimization reinforcement learning Robotics Policy Gripping Learning and adaptive systems manipulation planning Path integral Reinforcement learning Task scheduling Multidimensional analysis Manipulation Robustness Planning Learning algorithm Motion control Artificial intelligence
Online Access	Get full text
ISSN	1552-3098 1941-0468
DOI	10.1109/TRO.2012.2210294

Cover

More Information
Summary:	Physical contact events often allow a natural decomposition of manipulation tasks into action phases and subgoals. Within the motion primitive paradigm, each action phase corresponds to a motion primitive, and the subgoals correspond to the goal parameters of these primitives. Current state-of-the-art reinforcement learning algorithms are able to efficiently and robustly optimize the parameters of motion primitives in very high-dimensional problems. These algorithms often consider only shape parameters, which determine the trajectory between the start- and end-point of the movement. In manipulation, however, it is also crucial to optimize the goal parameters, which represent the subgoals between the motion primitives. We therefore extend the policy improvement with path integrals (PI 2 ) algorithm to simultaneously optimize shape and goal parameters. Applying simultaneous shape and goal learning to sequences of motion primitives leads to the novel algorithm PI 2 Seq. We use our methods to address a fundamental challenge in manipulation: improving the robustness of everyday pick-and-place tasks.
Bibliography:	SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14
ISSN:	1552-3098 1941-0468
DOI:	10.1109/TRO.2012.2210294