A new approach for the vanishing gradient problem on sigmoid activation

The vanishing gradient problem (VGP) is an important issue at training time on multilayer neural networks using the backpropagation algorithm. This problem is worse when sigmoid transfer functions are used, in a network with many hidden layers. However, the sigmoid function is very important in seve...

Full description

Saved in:
Bibliographic Details
Published inProgress in artificial intelligence Vol. 9; no. 4; pp. 351 - 360
Main Authors Roodschild, Matías, Gotay Sardiñas, Jorge, Will, Adrián
Format Journal Article
LanguageEnglish
Published Berlin/Heidelberg Springer Berlin Heidelberg 01.12.2020
Springer Nature B.V
Subjects
Online AccessGet full text
ISSN2192-6352
2192-6360
DOI10.1007/s13748-020-00218-y

Cover

More Information
Summary:The vanishing gradient problem (VGP) is an important issue at training time on multilayer neural networks using the backpropagation algorithm. This problem is worse when sigmoid transfer functions are used, in a network with many hidden layers. However, the sigmoid function is very important in several architectures such as recurrent neural networks and autoencoders, where the VGP might also appear. In this article, we propose a modification of the backpropagation algorithm for the sigmoid neurons training. It consists of adding a small constant to the calculation of the sigmoid’s derivative so that the proposed training direction differs slightly from the gradient while keeping the original sigmoid function in the network. This approach suggests that the derivative’s modification produces the same accuracy in fewer training steps on most datasets. Moreover, due to VGP, the original derivative does not converge using sigmoid functions on more than five hidden layers. However, the modification allows backpropagation to train two extra hidden layers in feedforward neural networks.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2192-6352
2192-6360
DOI:10.1007/s13748-020-00218-y