Archetypal landscapes for deep neural networks

The predictive capabilities of deep neural networks (DNNs) continue to evolve to increasingly impressive levels. However, it is still unclear how training procedures for DNNs succeed in finding parameters that produce good results for such high-dimensional and nonconvex loss functions. In particular...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings of the National Academy of Sciences - PNAS Vol. 117; no. 36; pp. 21857 - 21864
Main Authors	Verpoort, Philipp C., Lee, Alpha A., Wales, David J.
Format	Journal Article
Language	English
Published	United States National Academy of Sciences 08.09.2020
Subjects	Artificial neural networks Computational Geometry Computer Science Learning algorithms Machine learning Minima Neural networks Optimization Physical Sciences deep learning energy landscapes optimization neural networks statistical mechanics
Online Access	Get full text
ISSN	0027-8424 1091-6490 1091-6490
DOI	10.1073/pnas.1919995117

Cover

More Information
Summary:	The predictive capabilities of deep neural networks (DNNs) continue to evolve to increasingly impressive levels. However, it is still unclear how training procedures for DNNs succeed in finding parameters that produce good results for such high-dimensional and nonconvex loss functions. In particular, we wish to understand why simple optimization schemes, such as stochastic gradient descent, do not end up trapped in local minima with high loss values that would not yield useful predictions. We explain the optimizability of DNNs by characterizing the local minima and transition states of the loss-function landscape (LFL) along with their connectivity. We show that the LFL of a DNN in the shallow network or data-abundant limit is funneled, and thus easy to optimize. Crucially, in the opposite low-data/deep limit, although the number of minima increases, the landscape is characterized by many minima with similar loss values separated by low barriers. This organization is different from the hierarchical landscapes of structural glass formers and explains why minimization procedures commonly employed by the machine-learning community can navigate the LFL successfully and reach low-lying solutions.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 PMCID: PMC7486703 Author contributions: P.C.V., A.A.L., and D.J.W. designed the study and interpreted the results; P.C.V. and D.J.W. performed the numerical studies; and P.C.V., A.A.L., and D.J.W. wrote the paper. Edited by David L. Donoho, Stanford University, Stanford, CA, and approved July 7, 2020 (received for review November 15, 2019)
ISSN:	0027-8424 1091-6490 1091-6490
DOI:	10.1073/pnas.1919995117