UAV Path Planning for Wireless Data Harvesting: A Deep Reinforcement Learning Approach

Autonomous deployment of unmanned aerial vehicles (UAVs) supporting next-generation communication networks requires efficient trajectory planning methods. We propose a new end-to-end reinforcement learning (RL) approach to UAV-enabled data collection from Internet of Things (IoT) devices in an urban...

Full description

Saved in:

Bibliographic Details
Published in	arXiv.org
Main Authors	Bayerlein, Harald, Theile, Mirco, Caccamo, Marco, Gesbert, David
Format	Paper Journal Article
Language	English
Published	Ithaca Cornell University Library, arXiv.org 26.10.2020
Subjects	Communication networks Computer architecture Computer Science - Information Theory Computer Science - Learning Computer Science - Robotics Data collection Deep learning Distributed sensor systems Flight time Internet of Things Mathematics - Information Theory Multilayers Obstacle avoidance Parameters Statistics - Machine Learning Trajectory planning Unmanned aerial vehicles Urban environments
Online Access	Get full text
ISSN	2331-8422
DOI	10.48550/arxiv.2007.00544

Cover

More Information
Summary:	Autonomous deployment of unmanned aerial vehicles (UAVs) supporting next-generation communication networks requires efficient trajectory planning methods. We propose a new end-to-end reinforcement learning (RL) approach to UAV-enabled data collection from Internet of Things (IoT) devices in an urban environment. An autonomous drone is tasked with gathering data from distributed sensor nodes subject to limited flying time and obstacle avoidance. While previous approaches, learning and non-learning based, must perform expensive recomputations or relearn a behavior when important scenario parameters such as the number of sensors, sensor positions, or maximum flying time, change, we train a double deep Q-network (DDQN) with combined experience replay to learn a UAV control policy that generalizes over changing scenario parameters. By exploiting a multi-layer map of the environment fed through convolutional network layers to the agent, we show that our proposed network architecture enables the agent to make movement decisions for a variety of scenario parameters that balance the data collection goal with flight time efficiency and safety constraints. Considerable advantages in learning efficiency from using a map centered on the UAV's position over a non-centered map are also illustrated.
Bibliography:	SourceType-Working Papers-1 ObjectType-Working Paper/Pre-Print-1 content type line 50
ISSN:	2331-8422
DOI:	10.48550/arxiv.2007.00544