Lower error bounds for the stochastic gradient descent optimization algorithm: Sharp convergence rates for slowly and fast decaying learning rates

The stochastic gradient descent (SGD) optimization algorithm is one of the central tools used to approximate solutions of stochastic optimization problems arising in machine learning and, in particular, deep learning applications. It is therefore important to analyze the convergence behavior of SGD....

Full description

Saved in:

Bibliographic Details
Published in	Journal of Complexity Vol. 57; p. 101438
Main Authors	Jentzen, Arnulf, von Wurstemberger, Philippe
Format	Journal Article
Language	English
Published	Elsevier Inc 01.04.2020
Subjects	Error analysis Lower bounds Machine learning SGD SGD Lower bounds Error analysis Machine learning
Online Access	Get full text
ISSN	0885-064X 1090-2708 1090-2708
DOI	10.1016/j.jco.2019.101438

Cover

More Information
Summary:	The stochastic gradient descent (SGD) optimization algorithm is one of the central tools used to approximate solutions of stochastic optimization problems arising in machine learning and, in particular, deep learning applications. It is therefore important to analyze the convergence behavior of SGD. In this article we consider a simple quadratic stochastic optimization problem and establish for every γ,ν∈(0,∞) essentially matching lower and upper bounds for the mean square error of the associated SGD process with learning rates (γnν)n∈N. This allows us to precisely quantify the mean square convergence rate of the SGD method in dependence on the choice of the learning rates.
ISSN:	0885-064X 1090-2708 1090-2708
DOI:	10.1016/j.jco.2019.101438