A Study of Reinforcement Learning in a New Multiagent Domain

RoboCup Keepaway is one of the most challenging multiagent systems (MAS) where a team of keepers tries to keep the ball away from the team of takers. Most of current works concentrate on the learning of keeper, not the learning of taker, which is also a great challenge to the application of reinforc...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 02 Vol. 2; pp. 154 - 161
Main Authors	Min, Hua-Qing, Zeng, Jia-An, Chen, Jian, Zhu, Jin-Hui
Format	Conference Proceeding
Language	English
Published	Washington, DC, USA IEEE Computer Society 09.12.2008 IEEE
Series	ACM Conferences
Subjects	Algorithm design and analysis Computer science Computing methodologies > Artificial intelligence > Distributed artificial intelligence > Multi-agent systems Computing methodologies > Artificial intelligence > Knowledge representation and reasoning Computing methodologies > Machine learning Heuristic algorithms Intelligent agent Learning systems Multiagent systems Real time systems State-space methods Testing Theory of computation > Logic Tiles
Online Access	Get full text
ISBN	9780769534961 0769534961
DOI	10.1109/WIIAT.2008.114

Cover

More Information
Summary:	RoboCup Keepaway is one of the most challenging multiagent systems (MAS) where a team of keepers tries to keep the ball away from the team of takers. Most of current works concentrate on the learning of keeper, not the learning of taker, which is also a great challenge to the application of reinforcement learning (RL). In this paper, we propose a task named takeaway for takers and study the learning of them. We employ an initial learning algorithm called Update on Steps (UoS) for takers and demonstrate that this algorithm has two main faults including action oscillation and reliance on designer's experience. Thereafter we present a novel RL algorithm called Dynamic CMAC Advantage Learning (DCMAC-AL). It makes use of advantage ($\lambda$) learning to calculate value function as well as CMAC to generalize state space, and creates novel features based on Bellman error to improve the precision of CMAC. Empirical results show that takers with DCMAC-AL can learn efficiently.
ISBN:	9780769534961 0769534961
DOI:	10.1109/WIIAT.2008.114