Lower error bounds for the stochastic gradient descent optimization algorithm: Sharp convergence rates for slowly and fast decaying learning rates
The stochastic gradient descent (SGD) optimization algorithm is one of the central tools used to approximate solutions of stochastic optimization problems arising in machine learning and, in particular, deep learning applications. It is therefore important to analyze the convergence behavior of SGD....
Saved in:
| Published in | Journal of Complexity Vol. 57; p. 101438 |
|---|---|
| Main Authors | , |
| Format | Journal Article |
| Language | English |
| Published |
Elsevier Inc
01.04.2020
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 0885-064X 1090-2708 1090-2708 |
| DOI | 10.1016/j.jco.2019.101438 |
Cover
| Summary: | The stochastic gradient descent (SGD) optimization algorithm is one of the central tools used to approximate solutions of stochastic optimization problems arising in machine learning and, in particular, deep learning applications. It is therefore important to analyze the convergence behavior of SGD. In this article we consider a simple quadratic stochastic optimization problem and establish for every γ,ν∈(0,∞) essentially matching lower and upper bounds for the mean square error of the associated SGD process with learning rates (γnν)n∈N. This allows us to precisely quantify the mean square convergence rate of the SGD method in dependence on the choice of the learning rates. |
|---|---|
| ISSN: | 0885-064X 1090-2708 1090-2708 |
| DOI: | 10.1016/j.jco.2019.101438 |