Double Deep Q-Network for Power Allocation in Cloud Radio Access Network

Cloud radio access network (CRAN) facilitates resource allocation (RA) by isolating remote radio heads (RRHs) from baseband units (BBUs). Traditional RA algorithms save energy by dynamically turning on/off RRHs and allocating power in each time slot. However, when the energy switching cost is consid...

Full description

Saved in:

Bibliographic Details
Published in	2020 IEEE 3rd International Conference on Computer and Communication Engineering Technology (CCET) pp. 272 - 277
Main Authors	Iqbal, Amjad, Tham, Mau-Luen, Chang, Yoong Choon
Format	Conference Proceeding
Language	English
Published	IEEE 01.08.2020
Subjects	Cloud RAN Double Deep Q-Network Markov Decision Process (MDP) Minimization Neural networks Power Allocation Power demand Radio access networks Resource management Switches Turning
Online Access	Get full text
DOI	10.1109/CCET50901.2020.9213138

Cover

More Information
Summary:	Cloud radio access network (CRAN) facilitates resource allocation (RA) by isolating remote radio heads (RRHs) from baseband units (BBUs). Traditional RA algorithms save energy by dynamically turning on/off RRHs and allocating power in each time slot. However, when the energy switching cost is considered, the decisions of turning on/off RRHs in adjacent time slots are correlated, which cannot be solved directly. Fortunately, deep reinforcement learning (DRL) can effectively model such problem, which motivates us to minimize the total power consumption subject to the constraints on per-RRH transmit power and user rates. Our starting point is the deep Q network (DQN), which is a combination of a neural network and Q-learning. In each time slot, DQN turns on /off a RRH yielding the largest Q-value (known as action value) prior to solving a power minimization problem for active RRHs. However, DQN yields Q-value overestimation issue, which stems from using the same network to choose the best action and to compute the target Qvalue of taking that action at the next state. To further increase the CRAN power savings, we propose a Double DQN-based framework by decoupling the action selection from the target Q-value generation. Simulation results indicate that the Double DQN-based RA method outperforms the DQN-based RA algorithm in terms of total power consumption.
DOI:	10.1109/CCET50901.2020.9213138