A Model-Based Reinforcement Learning Algorithm for Routing in Energy Harvesting Mobile Ad-Hoc Networks

Dynamic topology, lack of a fixed infrastructure and limited energy in mobile ad-hoc networks (MANETs) give rise to a challenging operational environment. MANET routing protocols should consider dynamic network changes (e.g., link qualities and nodes residual energy) in such circumstances and be abl...

Full description

Saved in:

Bibliographic Details
Published in	Wireless personal communications Vol. 95; no. 3; pp. 3119 - 3139
Main Authors	Maleki, Meisam, Hakami, Vesal, Dehghan, Mehdi
Format	Journal Article
Language	English
Published	New York Springer US 01.08.2017 Springer Nature B.V
Subjects	Algorithms Communications Engineering Computer Communication Networks Computer networks Computer simulation Cost function Energy harvesting Engineering Expected values Iterative methods Machine learning Markov analysis Markov processes Mobile ad hoc networks Mobile communication systems Multiagent systems Networks Protocol (computers) Residual energy Routing (telecommunications) Signal,Image and Speech Processing Topology Traffic flow Transition probabilities Wireless networks MDP Routing Model-based reinforcement learning Mobile ad-hoc networks Energy harvesting
Online Access	Get full text
ISSN	0929-6212 1572-834X
DOI	10.1007/s11277-017-3987-8

Cover

More Information
Summary:	Dynamic topology, lack of a fixed infrastructure and limited energy in mobile ad-hoc networks (MANETs) give rise to a challenging operational environment. MANET routing protocols should consider dynamic network changes (e.g., link qualities and nodes residual energy) in such circumstances and be able to adapt to these changes to efficiently handle the traffic flows. In this paper, we assume an energy harvesting MANET in which the nodes have recharging capability and thus their residual energy level is randomly changing with time. We present a bi-objective intelligent routing protocol that aims at reducing an expected long-run cost function composed of end-to-end delay and the path energy cost. We formulate the routing problem as a Markov decision process which captures both the link state dynamics due to node mobility and energy state dynamics due to nodes rechargeable energy sources. We propose a multi-agent reinforcement learning-based algorithm to approximate the optimal routing policy in the absence of a priori knowledge of the system statistics. The proposed algorithm is built using the principles of model-based RL. More specifically, we model each node’s cost function by deriving an expression for the expected value of end-to-end costs. Also the transition probabilities are estimated online using a tabular maximum likelihood method. Simulation results show that our model-based scheme outperforms its model-free counterpart and operates closely to standard value-iteration which assumes perfect statistics.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0929-6212 1572-834X
DOI:	10.1007/s11277-017-3987-8