Attention Enhanced Multi-Agent Reinforcement Learning for Cooperative Spectrum Sensing in Cognitive Radio Networks
Cooperative spectrum sensing (CSS) technology has been widely studied to enhance the spectrum sharing efficiency spatially and temporally in cognitive radio networks (CRNs), where the secondary users (SUs) can opportunistically reuse the channels already licensed to the primary users (PUs) for trans...
Saved in:
| Published in | IEEE transactions on vehicular technology Vol. 73; no. 7; pp. 10464 - 10477 |
|---|---|
| Main Authors | , , , , , , |
| Format | Journal Article |
| Language | English |
| Published |
New York
IEEE
01.07.2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Subjects | |
| Online Access | Get full text |
| ISSN | 0018-9545 1939-9359 |
| DOI | 10.1109/TVT.2024.3384393 |
Cover
| Summary: | Cooperative spectrum sensing (CSS) technology has been widely studied to enhance the spectrum sharing efficiency spatially and temporally in cognitive radio networks (CRNs), where the secondary users (SUs) can opportunistically reuse the channels already licensed to the primary users (PUs) for transmission by sensing spectrum holes. SUs are endowed with the global awareness of channels state by cooperating with each other without sweeping across the whole frequency bands. Since the channels occupation of PUs changes dynamically, the accurate sensing and swift information sharing are crucial for CRNs. The paper proposes a multi-agent deep reinforcement learning (DRL) based CSS method to help SUs efficiently finding a vacant channel by the cooperation with their partners. 1) Two partner selection algorithms are proposed named as Reliable Partner CSS and Adaptive Partner CSS, respectively. For the former, the partner selection is facilitated based on the historical sensing accuracy of SUs. While the latter takes the comprehensive consideration of both the reliability and geographical distribution of SUs to further improve the sensing accuracy. 2) Multi-agent deep deterministic policy gradient (MADDPG) is adopted to resist the dynamically varying channels condition as well as the high-dimension solution space. With the feature of 'centralized training and decentralized execution', each SU learns to interact with the environment and select a vacant channel for transmission by its partial observation, which greatly reduces the communication overhead caused by the cooperative spectrum sensing. 3) Numerical simulation demonstrates the convergence and availability of the proposed algorithms. No matter Reliable Partner CSS or Adaptive Partner CSS, the sensing accuracy can be greatly enhanced comparing with other non-cooperative or centralized learning approaches. Besides, the attention mechanism is introduced to MADDPG for Adaptive Partner CSS to reveal the behavior of SUs by the visualization of attention weight, which helps to partially interpret the 'black box' issue of DRL. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 0018-9545 1939-9359 |
| DOI: | 10.1109/TVT.2024.3384393 |