DiffMoment: an adaptive optimization technique for convolutional neural network

Stochastic Gradient Decent (SGD) is a very popular basic optimizer applied in the learning algorithms of deep neural networks. However, it has fixed-sized steps for every epoch without considering gradient behaviour to determine step size. The improved SGD optimizers like AdaGrad, Adam, AdaDelta, RA...

Full description

Saved in:

Bibliographic Details
Published in	Applied intelligence (Dordrecht, Netherlands) Vol. 53; no. 13; pp. 16844 - 16858
Main Authors	Bhakta, Shubhankar, Nandi, Utpal, Si, Tapas, Ghosal, Sudipta Kr, Changdar, Chiranjit, Pal, Rajat Kumar
Format	Journal Article
Language	English
Published	New York Springer US 01.07.2023 Springer Nature B.V
Subjects	Algorithms Artificial Intelligence Artificial neural networks Computer Science Machine learning Machines Manufacturing Mechanical Engineering Momentum Neural networks Optimization Optimization techniques Processes Adam Gradient descent Difference of momentum Neural networks Optimizer
Online Access	Get full text
ISSN	0924-669X 1573-7497
DOI	10.1007/s10489-022-04382-7

Cover

More Information
Summary:	Stochastic Gradient Decent (SGD) is a very popular basic optimizer applied in the learning algorithms of deep neural networks. However, it has fixed-sized steps for every epoch without considering gradient behaviour to determine step size. The improved SGD optimizers like AdaGrad, Adam, AdaDelta, RAdam, and RMSProp make step sizes adaptive in every epoch. However, these optimizers depend on square roots of exponential moving averages (EMA) of squared previous gradients or momentums or both and cannot take the benefit of local change in gradients or momentums or both. To reduce these limitations, a novel optimizer has been presented in this paper where the adjustment of step size is done for each parameter based on changing information between the 1 s t and the 2 n d moment estimate (i.e., diffMoment). The experimental results depict that diffMoment offers better performance than AdaGrad, Adam, AdaDelta, RAdam, and RMSProp optimizers. It is also noticed that diffMoment does uniformly better for training Convolutional Neural Networks (CNN) applying different activation functions.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0924-669X 1573-7497
DOI:	10.1007/s10489-022-04382-7