Deep Reinforcement Learning for UAV-Assisted Spectrum Sharing Under Partial Observability

This paper proposes a dynamic spectrum sharing scheme in an unmanned aerial vehicle (UAV) assisted cognitive radio network. The UAV serves as a secondary base station to provide communication services to multiple secondary users (SUs) by adaptively utilizing the spatio-temporal spectrum opportunitie...

Full description

Saved in:
Bibliographic Details
Published inIEEE Vehicular Technology Conference pp. 1 - 6
Main Authors Zhang, Sigen, Wang, Zhe, Gao, Guanyu, Li, Jun, Zhang, Jie, Yin, Ziyan
Format Conference Proceeding
LanguageEnglish
Published IEEE 10.10.2023
Subjects
Online AccessGet full text
ISSN2577-2465
DOI10.1109/VTC2023-Fall60731.2023.10333853

Cover

More Information
Summary:This paper proposes a dynamic spectrum sharing scheme in an unmanned aerial vehicle (UAV) assisted cognitive radio network. The UAV serves as a secondary base station to provide communication services to multiple secondary users (SUs) by adaptively utilizing the spatio-temporal spectrum opportunities of multiple device-to-device primary users (PUs), where each PU's spectrum occupancy follows a two-state Markov process. We jointly optimize the UAV's trajectory and user association to maximize the expectation of its cumulative energy efficiency subject to the interference constraint of the PUs. We formulate this problem as a partially observable Markov decision process (POMDP), where the UAV can only observe the spectrum occupancy status of the adjacent PUs. Due to the lack of the PUs' spectrum occupancy statistics, we propose a model-free reinforcement learning algorithm named partially observable double deep Q network (PO-DDQN) to obtain the near-optimal spectrum sharing policy. Simulation results show that our proposed algorithm outperforms the baseline policy gradient (PG) algorithm in terms of convergence speed and the UAV's energy efficiency. Additionally, the spectrum utilization efficiency can be further enhanced when the UAV has wider observation radius, or if the PUs' spectrum occupancy exhibits stronger temporal correlation.
ISSN:2577-2465
DOI:10.1109/VTC2023-Fall60731.2023.10333853