Enhanced LSTM‐DQN algorithm for a two‐player zero‐sum game in three‐dimensional space

To tackle the challenges presented by the two‐player zero sum game (TZSG) in three‐dimensional space, this study introduces an enhanced deep Q‐learning (DQN) algorithm that utilizes long short term memory (LSTM) network. The primary objective of this algorithm is to enhance the temporal correlation...

Full description

Saved in:
Bibliographic Details
Published inIET control theory & applications Vol. 18; no. 18; pp. 2798 - 2812
Main Authors Lu, Bo, Ru, Le, Lv, Maolong, Hu, Shiguang, Zhang, Hongguo, Zhao, Zilong
Format Journal Article
LanguageEnglish
Published 01.12.2024
Subjects
Online AccessGet full text
ISSN1751-8644
1751-8652
1751-8652
DOI10.1049/cth2.12677

Cover

More Information
Summary:To tackle the challenges presented by the two‐player zero sum game (TZSG) in three‐dimensional space, this study introduces an enhanced deep Q‐learning (DQN) algorithm that utilizes long short term memory (LSTM) network. The primary objective of this algorithm is to enhance the temporal correlation of the TZSG in three‐dimensional space. Additionally, it incorporates the hindsight experience replay (HER) mechanism to improve the learning efficiency of the network and mitigate the issue of the “sparse reward” that arises from prolonged training of intelligence in solving the TZSG in the three‐dimensional. Furthermore, this method enhances the convergence and stability of the overall solution.An intelligent training environment centred around an airborne agent and its mutual pursuit interaction scenario was designed to proposed approach's effectiveness. The algorithm training and comparison results show that the LSTM‐DQN‐HER algorithm outperforms similar algorithm in solving the TZSG in three‐dimensional space. In conclusion, this paper presents an improved DQN algorithm based on LSTM and incorporates the HER mechanism to address the challenges posed by the TZSG in three‐dimensional space. The proposed algorithm enhances the solution's temporal correlation, learning efficiency, convergence, and stability. The simulation results confirm its superior performance in solving the TZSG in three‐dimensional space. The LSTM‐DQN‐HER algorithm is proposed by modelling the MDP and POMDP of a two‐player zero‐sum game problem in three‐dimensional space, and the effectiveness of the proposed algorithm in solving the three‐dimensional two‐player zero‐sum game problem is verified by training and adversarial simulation of the Agent.
ISSN:1751-8644
1751-8652
1751-8652
DOI:10.1049/cth2.12677