Deep Reinforcement Learning Based Routing for Non-Cooperative Multi-Flow Games in Dynamic AANETs

Aeronautical Ad hoc Networks (AANETs) have been identified as pivotal constituents of the Next Generation Wireless Communication Network (NGWCN), courtesy of their ability to facilitate global coverage and low-latency network services. However, in contrast with terrestrial networks, large-scale AANE...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on vehicular technology Vol. 73; no. 12; pp. 19495 - 19510
Main Authors He, Huasen, Sun, Kaixuan, Chen, Shuangwu, Jiang, Xiaofeng, Zhu, Rangang, Yang, Jian
Format Journal Article
LanguageEnglish
Published New York IEEE 01.12.2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text
ISSN0018-9545
1939-9359
DOI10.1109/TVT.2024.3440949

Cover

More Information
Summary:Aeronautical Ad hoc Networks (AANETs) have been identified as pivotal constituents of the Next Generation Wireless Communication Network (NGWCN), courtesy of their ability to facilitate global coverage and low-latency network services. However, in contrast with terrestrial networks, large-scale AANETs exhibit distinct characteristics of high dynamics, which impose considerable challenges to global state synchronization for computing routing paths. For suppressing synchronization overhead, we consider partial-observable network state to make routing decisions. Specifically, we formulate the multi-flow routing problem as a non-cooperative Multi-Player Partially Observable Markov Decision Process (MP-POMDP) game, in which each flow acting as a player aims to maximize its own transmission bandwidth, while consciously avoiding conflicts with bandwidth already occupied by other flows. To tackle the high-dimensional state space of the proposed MP-POMDP game, we employ the Deep Reinforcement Learning (DRL) approach to develop a novel Distributed Game based Multi-flow Routing (DGMR) algorithm by utilizing a parallel multi-agent scheme. In DGMR, each flow is equipped with an agent for routing selection and the agent will move along the routing path and utilize the recently observed states to make the next-hop routing decision. Moreover, to provide fixed-size inputs for neural networks, a Pareto-based Optimal Neighbor Selection (PONS) algorithm based on Pareto optimality theory is proposed to filter out a fixed number of neighbors from variable neighbor sets of aircraft. The selected neighbors are proximal to the destination and have sufficient available bandwidth resources, which guarantee high-quality routing decisions. The experimental results show that DGMR has high scalability and achieves up to ten times of bandwidth utilization than the benchmark algorithms.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0018-9545
1939-9359
DOI:10.1109/TVT.2024.3440949