Sum Throughput Maximization Scheme for NOMA-Enabled D2D Groups Using Deep Reinforcement Learning in 5G and Beyond Networks

Device-to-Device (D2D) communication underlaying cellular network is a capable system for advancing the spectrum's efficiency. However, in this condition, D2D generates cross-channel and co-channel interference for cellular and other D2D users, which creates an excessive technical challenge for...

Full description

Saved in:

Bibliographic Details
Published in	IEEE sensors journal Vol. 23; no. 13; p. 1
Main Authors	Khan, Mohammad Aftab Alam, Kaidi, Hazilah Mad, Ahmad, Norulhusna, Rehman, Masood Ur
Format	Journal Article
Language	English
Published	New York IEEE 01.07.2023 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Cellular communication Cochannel interference Complexity Computer architecture D2D DDPG Deep learning Device-to-device communication DGs DQN MADRL Multiagent systems NOMA Nonorthogonal multiple access Optimization POPS Resource management SINR Throughput Training
Online Access	Get full text
ISSN	1530-437X 1558-1748
DOI	10.1109/JSEN.2023.3276799

Cover

More Information
Summary:	Device-to-Device (D2D) communication underlaying cellular network is a capable system for advancing the spectrum's efficiency. However, in this condition, D2D generates cross-channel and co-channel interference for cellular and other D2D users, which creates an excessive technical challenge for allocating the spectrum. Despite this, massive connectivity is another issue in the 5G and beyond networks that need to be addressed. To overcome this problem, non-orthogonal multiple access (NOMA) is integrated with the D2D groups (DGs). In this paper, our target is to maximize the sum throughput of the overall network while maintaining the signal-to-interference noise ratio (SINR) of the cellular and D2D users. To achieve the target, a discriminated spectrum distribution framework dependent on multi-agent deep reinforcement learning (MADRL), termed a deep deterministic policy gradient (DDPG) is proposed. Here, it shares the global historical states, actions, and policies using the duration of central training. Furthermore, the proximal online policy scheme (POPS) is used to decrease the computation complexity of training. It utilized the clipping substitute technique for the modification and reduction of complexity at the training stage. The simulation results demonstrated that the proposed scheme POPS attains 16.67%, 24.98%, and 59.09% higher performance than the DDPG, Deep Dueling and deep Q-network (DQN).
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1530-437X 1558-1748
DOI:	10.1109/JSEN.2023.3276799