Collaborative Double DQN-DDPG Framework for User Clustering and Power Allocation in NOMA Systems
In non-orthogonal multiple access (NOMA) systems, collaborative decision-making faces a challenge due to mixed discrete-continuous action spaces, while resource allocation acts as a key factor in determining system performance. Existing studies often rely on single-agent deep reinforcement learning...
        Saved in:
      
    
          | Published in | 2025 4th International Joint Conference on Information and Communication Engineering (JCICE) pp. 1 - 5 | 
|---|---|
| Main Authors | , , , , | 
| Format | Conference Proceeding | 
| Language | English | 
| Published | 
            IEEE
    
        25.07.2025
     | 
| Subjects | |
| Online Access | Get full text | 
| DOI | 10.1109/JCICE66205.2025.11182061 | 
Cover
| Summary: | In non-orthogonal multiple access (NOMA) systems, collaborative decision-making faces a challenge due to mixed discrete-continuous action spaces, while resource allocation acts as a key factor in determining system performance. Existing studies often rely on single-agent deep reinforcement learning (DRL) algorithms that only address either discrete or continuous actions separately, thereby limiting the potential for synergistic optimization across both domains. To address this, we propose a collaborative optimization framework double deep Q-network (Double DQN) and deep deterministic policy gradient (DDPG) algorithms. The joint optimization issue, which pertains to resource allocation and strategy making, is stated as a partly observable Markov decision process (POMDP). This POMDP is subject to quality of service (QoS) restrictions that ensure user experience. It considers the poor channel state information (CSI) at base stations, a situation often encountered in real - world application scenarios, user fairness constraints, and maximum transmit power limitations. At the algorithmic level, dual-agent collaborative architecture is constructed. Double DQN is employed for discrete decision-making in user clustering, with a prioritized experience replay (PER) mechanism to enhance clustering effectiveness; DDPG is designed for continuous power allocation, incorporating target network clipping to ensure power constraints are met. These two components achieve crossmodal optimization through shared state. According to simulation data, when compared to both single-agent DRL algorithms and conventional algorithms, the suggested joint method performs better in raising the system sum rate. | 
|---|---|
| DOI: | 10.1109/JCICE66205.2025.11182061 |