Deep Reinforcement Learning for UAV-Assisted Spectrum Sharing Under Partial Observability

This paper proposes a dynamic spectrum sharing scheme in an unmanned aerial vehicle (UAV) assisted cognitive radio network. The UAV serves as a secondary base station to provide communication services to multiple secondary users (SUs) by adaptively utilizing the spatio-temporal spectrum opportunitie...

Full description

Saved in:

Bibliographic Details
Published in	IEEE Vehicular Technology Conference pp. 1 - 6
Main Authors	Zhang, Sigen, Wang, Zhe, Gao, Guanyu, Li, Jun, Zhang, Jie, Yin, Ziyan
Format	Conference Proceeding
Language	English
Published	IEEE 10.10.2023
Subjects	Autonomous aerial vehicles Correlation deep reinforcement learning dynamic spectrum sharing Energy efficiency Markov processes partially observable Markov decision process Reinforcement learning Simulation Trajectory Unmanned aerial vehicle
Online Access	Get full text
ISSN	2577-2465
DOI	10.1109/VTC2023-Fall60731.2023.10333853

Cover

More Information
Summary:	This paper proposes a dynamic spectrum sharing scheme in an unmanned aerial vehicle (UAV) assisted cognitive radio network. The UAV serves as a secondary base station to provide communication services to multiple secondary users (SUs) by adaptively utilizing the spatio-temporal spectrum opportunities of multiple device-to-device primary users (PUs), where each PU's spectrum occupancy follows a two-state Markov process. We jointly optimize the UAV's trajectory and user association to maximize the expectation of its cumulative energy efficiency subject to the interference constraint of the PUs. We formulate this problem as a partially observable Markov decision process (POMDP), where the UAV can only observe the spectrum occupancy status of the adjacent PUs. Due to the lack of the PUs' spectrum occupancy statistics, we propose a model-free reinforcement learning algorithm named partially observable double deep Q network (PO-DDQN) to obtain the near-optimal spectrum sharing policy. Simulation results show that our proposed algorithm outperforms the baseline policy gradient (PG) algorithm in terms of convergence speed and the UAV's energy efficiency. Additionally, the spectrum utilization efficiency can be further enhanced when the UAV has wider observation radius, or if the PUs' spectrum occupancy exhibits stronger temporal correlation.
ISSN:	2577-2465
DOI:	10.1109/VTC2023-Fall60731.2023.10333853