Efficient Online Learning Algorithms Based on LSTM Neural Networks

We investigate online nonlinear regression and introduce novel regression structures based on the long short term memory (LSTM) networks. For the introduced structures, we also provide highly efficient and effective online training methods. To train these novel LSTM-based structures, we put the unde...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transaction on neural networks and learning systems Vol. 29; no. 8; pp. 3772 - 3783
Main Authors	Ergen, Tolga, Kozat, Suleyman Serdar
Format	Journal Article
Language	English
Published	United States IEEE 01.08.2018 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Algorithms Architecture Complexity theory Computational modeling Computer applications Computer architecture Control methods Data models Datasets Distance learning Error detection Extended Kalman filter Filtration Gated recurrent unit (GRU) Internet Kalman filtering Learning algorithms long short term memory (LSTM) Long short-term memory Machine learning Neural networks Online instruction online learning Parameter estimation particle filtering (PF) Recurrent neural networks regression stochastic gradient descent (SGD) Stochasticity Training
Online Access	Get full text
ISSN	2162-237X 2162-2388 2162-2388
DOI	10.1109/TNNLS.2017.2741598

Cover

More Information
Summary:	We investigate online nonlinear regression and introduce novel regression structures based on the long short term memory (LSTM) networks. For the introduced structures, we also provide highly efficient and effective online training methods. To train these novel LSTM-based structures, we put the underlying architecture in a state space form and introduce highly efficient and effective particle filtering (PF)-based updates. We also provide stochastic gradient descent and extended Kalman filter-based updates. Our PF-based training method guarantees convergence to the optimal parameter estimation in the mean square error sense provided that we have a sufficient number of particles and satisfy certain technical conditions. More importantly, we achieve this performance with a computational complexity in the order of the first-order gradient-based methods by controlling the number of particles. Since our approach is generic, we also introduce a gated recurrent unit (GRU)-based approach by directly replacing the LSTM architecture with the GRU architecture, where we demonstrate the superiority of our LSTM-based approach in the sequential prediction task via different real life data sets. In addition, the experimental results illustrate significant performance improvements achieved by the introduced algorithms with respect to the conventional methods over several different benchmark real life data sets.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	2162-237X 2162-2388 2162-2388
DOI:	10.1109/TNNLS.2017.2741598