Deep Reinforcement Learning Based Routing for Non-Cooperative Multi-Flow Games in Dynamic AANETs

Aeronautical Ad hoc Networks (AANETs) have been identified as pivotal constituents of the Next Generation Wireless Communication Network (NGWCN), courtesy of their ability to facilitate global coverage and low-latency network services. However, in contrast with terrestrial networks, large-scale AANE...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on vehicular technology Vol. 73; no. 12; pp. 19495 - 19510
Main Authors	He, Huasen, Sun, Kaixuan, Chen, Shuangwu, Jiang, Xiaofeng, Zhu, Rangang, Yang, Jian
Format	Journal Article
Language	English
Published	New York IEEE 01.12.2024 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Ad hoc networks Aeronautical ad hoc network (AANET) Aircraft Algorithms Bandwidth Bandwidths Channel allocation Decisions Deep learning Game theory Games Heuristic algorithms Machine learning Markov processes Multiagent systems Network latency Neural networks non-cooperative game Optimization Pareto optimum partially observable Markov decision process (POMDP) Routing Routing (telecommunications) routing algorithm Synchronism Vehicle dynamics Wireless communications Wireless networks
Online Access	Get full text
ISSN	0018-9545 1939-9359
DOI	10.1109/TVT.2024.3440949

Cover

More Information
Summary:	Aeronautical Ad hoc Networks (AANETs) have been identified as pivotal constituents of the Next Generation Wireless Communication Network (NGWCN), courtesy of their ability to facilitate global coverage and low-latency network services. However, in contrast with terrestrial networks, large-scale AANETs exhibit distinct characteristics of high dynamics, which impose considerable challenges to global state synchronization for computing routing paths. For suppressing synchronization overhead, we consider partial-observable network state to make routing decisions. Specifically, we formulate the multi-flow routing problem as a non-cooperative Multi-Player Partially Observable Markov Decision Process (MP-POMDP) game, in which each flow acting as a player aims to maximize its own transmission bandwidth, while consciously avoiding conflicts with bandwidth already occupied by other flows. To tackle the high-dimensional state space of the proposed MP-POMDP game, we employ the Deep Reinforcement Learning (DRL) approach to develop a novel Distributed Game based Multi-flow Routing (DGMR) algorithm by utilizing a parallel multi-agent scheme. In DGMR, each flow is equipped with an agent for routing selection and the agent will move along the routing path and utilize the recently observed states to make the next-hop routing decision. Moreover, to provide fixed-size inputs for neural networks, a Pareto-based Optimal Neighbor Selection (PONS) algorithm based on Pareto optimality theory is proposed to filter out a fixed number of neighbors from variable neighbor sets of aircraft. The selected neighbors are proximal to the destination and have sufficient available bandwidth resources, which guarantee high-quality routing decisions. The experimental results show that DGMR has high scalability and achieves up to ten times of bandwidth utilization than the benchmark algorithms.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0018-9545 1939-9359
DOI:	10.1109/TVT.2024.3440949