Power-law initialization algorithm for convolutional neural networks

Well-honed CNN architectures trained with massive labeled images datasets are the state-of-the-art solution in many fields. In this paper, the weights of five commonly used pre-trained models are carefully analyzed for extracting their numerical characteristics and spatial distribution law. The gene...

Full description

Saved in:
Bibliographic Details
Published inNeural computing & applications Vol. 35; no. 30; pp. 22431 - 22447
Main Authors Jiang, Kaiwen, Liu, Jian, Xing, Tongtong, Li, Shujing, Wu, Shunyao, Shao, Fengjing, Sun, Rencheng
Format Journal Article
LanguageEnglish
Published London Springer London 01.10.2023
Springer Nature B.V
Subjects
Online AccessGet full text
ISSN0941-0643
1433-3058
DOI10.1007/s00521-023-08881-7

Cover

More Information
Summary:Well-honed CNN architectures trained with massive labeled images datasets are the state-of-the-art solution in many fields. In this paper, the weights of five commonly used pre-trained models are carefully analyzed for extracting their numerical characteristics and spatial distribution law. The general characteristics are: (1) the weights of a single convolutional layer conform to the distribution of symmetric power law. (2) the power exponent at the center of its convolutional kernel is relatively large, and the power exponent decreases radially from the center. (3) the value range of power exponents between layers is continuous from - 0.5 to - 3.5 . Based on these founding, a weight initialization method is proposed in order to speed up the convergence and improve the performance of CNN models. The proposed weight initialization method is compared with several commonly used methods. Extensive experiments show that it can improve the convergence speed of the CNN models, and the model accuracy is improved by 1–3%.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0941-0643
1433-3058
DOI:10.1007/s00521-023-08881-7