A Fast Adaptive Online Gradient Descent Algorithm in Over-Parameterized Neural Networks

In recent years, deep learning has dramatically improved state of the art in many practical applications. However, this utility is highly dependent on fine-tuning of hyperparameters, including learning rate, batch size, and network initialization. Although many first-order adaptive gradient algorith...

Full description

Saved in:

Bibliographic Details
Published in	Neural processing letters Vol. 55; no. 4; pp. 4641 - 4659
Main Authors	Yang, Anni, Li, Dequan, Li, Guangxiang
Format	Journal Article
Language	English
Published	New York Springer US 01.08.2023 Springer Nature B.V
Subjects	Adaptive algorithms Algorithms Artificial Intelligence Complex Systems Computational Intelligence Computer Science Deep learning Distance learning Machine learning Neural networks Parameterization Over-parameterized Neural networks Online learning Adaptive gradient
Online Access	Get full text
ISSN	1370-4621 1573-773X
DOI	10.1007/s11063-022-11057-4

Cover

More Information
Summary:	In recent years, deep learning has dramatically improved state of the art in many practical applications. However, this utility is highly dependent on fine-tuning of hyperparameters, including learning rate, batch size, and network initialization. Although many first-order adaptive gradient algorithms (e.g., Adam, AdaGrad) have been proposed to adjust the learning rate, they are vulnerable to the initial learning rate and network structure in the training over-parameterized models, especially in the dynamic online setting. Therefore, the main challenge of using deep learning in practice is how to reduce the cost of tuning hyperparameters. To address this problem, we integrate the adaptive strategy of Radhakrishnan et al. and the acceleration strategy of Ghadimi et al. to propose a fast adaptive online gradient algorithm, FAOGD. The adaptive strategy we adopt only adjusts the learning rate according to the historical gradient and training loss value, while the acceleration strategy is the heavy-ball momentum used to accelerate the training of deep models. The proposed FAOGD enjoys merit that there is no need to tune hyperparameters related to the learning rate, which thus saves much unnecessary computational overhead. It is also shown that FAOGD can obtain the regret bound of O T , matching the Adam and AdaGrad using the empirical learning rate. Simulation results in the over-parameterized neural networks clearly show that FAOGD outperforms existing algorithms. Furthermore, FAOGD is also robust to network structures and batch size.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1370-4621 1573-773X
DOI:	10.1007/s11063-022-11057-4