Collaborative Double DQN-DDPG Framework for User Clustering and Power Allocation in NOMA Systems

In non-orthogonal multiple access (NOMA) systems, collaborative decision-making faces a challenge due to mixed discrete-continuous action spaces, while resource allocation acts as a key factor in determining system performance. Existing studies often rely on single-agent deep reinforcement learning...

Full description

Saved in:

Bibliographic Details
Published in	2025 4th International Joint Conference on Information and Communication Engineering (JCICE) pp. 1 - 5
Main Authors	Wang, Xiangcheng, Lin, Ying, Tian, Yinhang, Yu, Kang, Gu, Zhenbiao
Format	Conference Proceeding
Language	English
Published	IEEE 25.07.2025
Subjects	Clustering algorithms Collaboration Decision making deep deterministic policy gradient double deep Q-network Markov decision processes NOMA Optimization power allocation Quality of service Resource management System performance user clustering User experience
Online Access	Get full text
DOI	10.1109/JCICE66205.2025.11182061

Cover

More Information
Summary:	In non-orthogonal multiple access (NOMA) systems, collaborative decision-making faces a challenge due to mixed discrete-continuous action spaces, while resource allocation acts as a key factor in determining system performance. Existing studies often rely on single-agent deep reinforcement learning (DRL) algorithms that only address either discrete or continuous actions separately, thereby limiting the potential for synergistic optimization across both domains. To address this, we propose a collaborative optimization framework double deep Q-network (Double DQN) and deep deterministic policy gradient (DDPG) algorithms. The joint optimization issue, which pertains to resource allocation and strategy making, is stated as a partly observable Markov decision process (POMDP). This POMDP is subject to quality of service (QoS) restrictions that ensure user experience. It considers the poor channel state information (CSI) at base stations, a situation often encountered in real - world application scenarios, user fairness constraints, and maximum transmit power limitations. At the algorithmic level, dual-agent collaborative architecture is constructed. Double DQN is employed for discrete decision-making in user clustering, with a prioritized experience replay (PER) mechanism to enhance clustering effectiveness; DDPG is designed for continuous power allocation, incorporating target network clipping to ensure power constraints are met. These two components achieve crossmodal optimization through shared state. According to simulation data, when compared to both single-agent DRL algorithms and conventional algorithms, the suggested joint method performs better in raising the system sum rate.
DOI:	10.1109/JCICE66205.2025.11182061