An Off-COMA Algorithm for Multi-UCAV Intelligent Combat Decision-Making

Unmanned Combat Aerial Vehicle (UCAV) has played an important role in modern military warfare, whose level of intelligent decision-making needs to be improved urgently. In this paper, a simplified multi-UCAV combat environment is established, which is modeled as a multi-agent Markov games. There are...

Full description

Saved in:
Bibliographic Details
Published in2022 4th International Conference on Data-driven Optimization of Complex Systems (DOCS) pp. 1 - 6
Main Authors Shi, Zhengkang, Wang, Jingcheng, Wang, Hongyuan
Format Conference Proceeding
LanguageEnglish
Published IEEE 28.10.2022
Subjects
Online AccessGet full text
DOI10.1109/DOCS55193.2022.9967776

Cover

More Information
Summary:Unmanned Combat Aerial Vehicle (UCAV) has played an important role in modern military warfare, whose level of intelligent decision-making needs to be improved urgently. In this paper, a simplified multi-UCAV combat environment is established, which is modeled as a multi-agent Markov games. There are many difficulties in multi-UCAV combat problem, including strong randomness and complexity, sparse rewards, and no strong opponents for training. In order to solve the above problems, an algorithm called Off Conterfactual Multi-Agent (Off-COMA) is proposed. This algorithm extends the COMA algorithm to the off-policy version, and can reuse historical data for training, which improves data utilization. In addition, the proposed Off-COMA algorithm exploits an improved prioritized experience replay method to deal with the sparse reward. This paper presents an asymmetric policy replay self-play method, which provides a guarantee for the algorithm to generate a powerful policy. Finally, compared with several classical multi-agent reinforcement learning algorithms, the superiority of Off-COMA algorithm in solving the multi-UCAV combat problem is verified.
DOI:10.1109/DOCS55193.2022.9967776