Humanoids Learning to Walk: A Natural CPG-Actor-Critic Architecture

The identification of learning mechanisms for locomotion has been the subject of much research for some time but many challenges remain. Dynamic systems theory (DST) offers a novel approach to humanoid learning through environmental interaction. Reinforcement learning (RL) has offered a promising me...

Full description

Saved in:

Bibliographic Details
Published in	Frontiers in neurorobotics Vol. 7; no. 5; p. 5
Main Authors	Li, Cai, Lowe, Robert, Ziemke, Tom
Format	Journal Article
Language	English
Published	Switzerland Frontiers Research Foundation 01.01.2013 Frontiers Media S.A
Subjects	actor-critic Adaptation Algorithms Central pattern generator central pattern generators Cognition Computer science cpg-actor-critic Dopamine DST Dynamical systems dynamical systems theory embodied cognition Exploration humanoid walking Learning Locomotion Morphology Neuroscience Neurosciences Reinforcement reinforcement learning Robotics Sensorimotor system Technology Teknik value system Walking dynamical systems theory actor-critic embodied cognition value system humanoid walking central pattern generators reinforcement learning
Online Access	Get full text
ISSN	1662-5218 1662-5218
DOI	10.3389/fnbot.2013.00005

Cover

More Information
Summary:	The identification of learning mechanisms for locomotion has been the subject of much research for some time but many challenges remain. Dynamic systems theory (DST) offers a novel approach to humanoid learning through environmental interaction. Reinforcement learning (RL) has offered a promising method to adaptively link the dynamic system to the environment it interacts with via a reward-based value system. In this paper, we propose a model that integrates the above perspectives and applies it to the case of a humanoid (NAO) robot learning to walk the ability of which emerges from its value-based interaction with the environment. In the model, a simplified central pattern generator (CPG) architecture inspired by neuroscientific research and DST is integrated with an actor-critic approach to RL (cpg-actor-critic). In the cpg-actor-critic architecture, least-square-temporal-difference based learning converges to the optimal solution quickly by using natural gradient learning and balancing exploration and exploitation. Futhermore, rather than using a traditional (designer-specified) reward it uses a dynamic value function as a stability indicator that adapts to the environment. The results obtained are analyzed using a novel DST-based embodied cognition approach. Learning to walk, from this perspective, is a process of integrating levels of sensorimotor activity and value.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 Edited by: Jeffrey L. Krichmar, University of California Irvine, USA Reviewed by: Mehdi Khamassi, CNRS, France; Poramate Manoonpong, Georg-August-Universität Göttingen, Germany; Calogero M. Oddo, Scuola Superiore Sant’Anna, Italy
ISSN:	1662-5218 1662-5218
DOI:	10.3389/fnbot.2013.00005