Reward Function Design Method for Long Episode Pursuit Tasks Under Polar Coordinate in Multi-Agent Reinforcement Learning

Multi-agent reinforcement learning has recently been applied to solve pursuit problems. However, it suffers from a large number of time steps per training episode, thus always struggling to converge effectively, resulting in low rewards and an inability for agents to learn strategies. This paper pro...

Full description

Saved in:
Bibliographic Details
Published inShanghai jiao tong da xue xue bao Vol. 29; no. 4; pp. 646 - 655
Main Authors Dong, Yubo, Cui, Tao, Zhou, Yufan, Song, Xun, Zhu, Yue, Dong, Peng
Format Journal Article
LanguageEnglish
Published Shanghai Shanghai Jiaotong University Press 01.08.2024
Springer Nature B.V
Subjects
Online AccessGet full text
ISSN1007-1172
1995-8188
DOI10.1007/s12204-024-2713-4

Cover

More Information
Summary:Multi-agent reinforcement learning has recently been applied to solve pursuit problems. However, it suffers from a large number of time steps per training episode, thus always struggling to converge effectively, resulting in low rewards and an inability for agents to learn strategies. This paper proposes a deep reinforcement learning (DRL) training method that employs an ensemble segmented multi-reward function design approach to address the convergence problem mentioned before. The ensemble reward function combines the advantages of two reward functions, which enhances the training effect of agents in long episode. Then, we eliminate the non-monotonic behavior in reward function introduced by the trigonometric functions in the traditional 2D polar coordinates observation representation. Experimental results demonstrate that this method outperforms the traditional single reward function mechanism in the pursuit scenario by enhancing agents’ policy scores of the task. These ideas offer a solution to the convergence challenges faced by DRL models in long episode pursuit problems, leading to an improved model training performance.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1007-1172
1995-8188
DOI:10.1007/s12204-024-2713-4