Lower error bounds for the stochastic gradient descent optimization algorithm: Sharp convergence rates for slowly and fast decaying learning rates

The stochastic gradient descent (SGD) optimization algorithm is one of the central tools used to approximate solutions of stochastic optimization problems arising in machine learning and, in particular, deep learning applications. It is therefore important to analyze the convergence behavior of SGD....

Full description

Saved in:
Bibliographic Details
Published inJournal of Complexity Vol. 57; p. 101438
Main Authors Jentzen, Arnulf, von Wurstemberger, Philippe
Format Journal Article
LanguageEnglish
Published Elsevier Inc 01.04.2020
Subjects
Online AccessGet full text
ISSN0885-064X
1090-2708
1090-2708
DOI10.1016/j.jco.2019.101438

Cover

More Information
Summary:The stochastic gradient descent (SGD) optimization algorithm is one of the central tools used to approximate solutions of stochastic optimization problems arising in machine learning and, in particular, deep learning applications. It is therefore important to analyze the convergence behavior of SGD. In this article we consider a simple quadratic stochastic optimization problem and establish for every γ,ν∈(0,∞) essentially matching lower and upper bounds for the mean square error of the associated SGD process with learning rates (γnν)n∈N. This allows us to precisely quantify the mean square convergence rate of the SGD method in dependence on the choice of the learning rates.
ISSN:0885-064X
1090-2708
1090-2708
DOI:10.1016/j.jco.2019.101438