MACSQ: Massively Accelerated DeepQ Learning on GPUs Using On-the-fly State Construction

The current trend of using artificial neural networks to solve computationally intensive problems is omnipresent. In this scope, DeepQ learning is a common choice for agent-based problems. DeepQ combines the concept of Q-Learning with (deep) neural networks to learn different Q-values/matrices based...

Full description

Saved in:
Bibliographic Details
Published inParallel and Distributed Computing, Applications and Technologies Vol. 13148; pp. 383 - 395
Main Authors Köster, Marcel, Groß, Julian, Krüger, Antonio
Format Book Chapter
LanguageEnglish
Published Switzerland Springer International Publishing AG 2022
Springer International Publishing
SeriesLecture Notes in Computer Science
Subjects
Online AccessGet full text
ISBN9783030967710
3030967719
ISSN0302-9743
1611-3349
DOI10.1007/978-3-030-96772-7_35

Cover

More Information
Summary:The current trend of using artificial neural networks to solve computationally intensive problems is omnipresent. In this scope, DeepQ learning is a common choice for agent-based problems. DeepQ combines the concept of Q-Learning with (deep) neural networks to learn different Q-values/matrices based on environmental conditions. Unfortunately, DeepQ learning requires hundreds of thousands of iterations/Q-samples that must be generated and learned for large-scale problems. Gathering data sets for such challenging tasks is extremely time consuming and requires large data-storage containers. Consequently, a common solution is the automatic generation of input samples for agent-based DeepQ networks. However, a usual workflow is to create the samples separately from the training process in either a (set of) pre-processing step(s) or interleaved with the training process. This requires the input Q-samples to be materialized in order to be fed into the training step of the attached neural network. In this paper, we propose a new GPU-focussed method for on-the-fly generation of training samples tightly coupled with the training process itself. This allows us to skip the materialization process of all samples (e.g. avoid dumping them disk), as they are (re)constructed when needed. Our method significantly outperforms usual workflows that generate the input samples on the CPU in terms of runtime performance and memory/storage consumption.
Bibliography:This work has been developed in the project APPaM (01IW20006), which is partly funded by the German ministry of education and research (BMBF).
ISBN:9783030967710
3030967719
ISSN:0302-9743
1611-3349
DOI:10.1007/978-3-030-96772-7_35