Reward Function Design Method for Long Episode Pursuit Tasks Under Polar Coordinate in Multi-Agent Reinforcement Learning
Multi-agent reinforcement learning has recently been applied to solve pursuit problems. However, it suffers from a large number of time steps per training episode, thus always struggling to converge effectively, resulting in low rewards and an inability for agents to learn strategies. This paper pro...
        Saved in:
      
    
          | Published in | Shanghai jiao tong da xue xue bao Vol. 29; no. 4; pp. 646 - 655 | 
|---|---|
| Main Authors | , , , , , | 
| Format | Journal Article | 
| Language | English | 
| Published | 
        Shanghai
          Shanghai Jiaotong University Press
    
        01.08.2024
     Springer Nature B.V  | 
| Subjects | |
| Online Access | Get full text | 
| ISSN | 1007-1172 1995-8188  | 
| DOI | 10.1007/s12204-024-2713-4 | 
Cover
| Summary: | Multi-agent reinforcement learning has recently been applied to solve pursuit problems. However, it suffers from a large number of time steps per training episode, thus always struggling to converge effectively, resulting in low rewards and an inability for agents to learn strategies. This paper proposes a deep reinforcement learning (DRL) training method that employs an ensemble segmented multi-reward function design approach to address the convergence problem mentioned before. The ensemble reward function combines the advantages of two reward functions, which enhances the training effect of agents in long episode. Then, we eliminate the non-monotonic behavior in reward function introduced by the trigonometric functions in the traditional 2D polar coordinates observation representation. Experimental results demonstrate that this method outperforms the traditional single reward function mechanism in the pursuit scenario by enhancing agents’ policy scores of the task. These ideas offer a solution to the convergence challenges faced by DRL models in long episode pursuit problems, leading to an improved model training performance. | 
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14  | 
| ISSN: | 1007-1172 1995-8188  | 
| DOI: | 10.1007/s12204-024-2713-4 |