Exp3.P-based Autonomous Decision Algorithm against Non-stationary Opponents with Partially Known Policies
This paper takes into account multi-agent games during which the opponents can change policies and their policy sets are partially known. Our goal is to generate an effective policy such that our agent can obtain a higher reward and meanwhile guarantee bounded regret. Considering such games against...
Saved in:
| Published in | IEEE transactions on games pp. 1 - 18 |
|---|---|
| Main Authors | , , , , |
| Format | Journal Article |
| Language | English |
| Published |
IEEE
2025
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 2475-1502 2475-1510 |
| DOI | 10.1109/TG.2025.3579719 |
Cover
| Summary: | This paper takes into account multi-agent games during which the opponents can change policies and their policy sets are partially known. Our goal is to generate an effective policy such that our agent can obtain a higher reward and meanwhile guarantee bounded regret. Considering such games against non-stationary opponents with partially known policies, Exp3.P-based Autonomous Decision (EAD) algorithm is put forward which contains three steps. Firstly, we learn embedding of opponent policy via Conditional Encoder-Decoder and employ conditional RL to generate the targeted policy. Secondly, we estimate the opponent policy through online Bayesian belief updates. Finally, we select the adversarial and targeted policy via a multi-armed bandit algorithm. Theoretical analysis is performed for the EAD algorithm. We give the lower bound of the expected reward when using the targeted policy, and prove the EAD algorithm has a bounded regret. Experimental results on Kuhn poker, Grid-world Predator-Prey and Grid world show the effectiveness of the proposed EAD algorithm. |
|---|---|
| ISSN: | 2475-1502 2475-1510 |
| DOI: | 10.1109/TG.2025.3579719 |