Power-law initialization algorithm for convolutional neural networks
Well-honed CNN architectures trained with massive labeled images datasets are the state-of-the-art solution in many fields. In this paper, the weights of five commonly used pre-trained models are carefully analyzed for extracting their numerical characteristics and spatial distribution law. The gene...
Saved in:
| Published in | Neural computing & applications Vol. 35; no. 30; pp. 22431 - 22447 |
|---|---|
| Main Authors | , , , , , , |
| Format | Journal Article |
| Language | English |
| Published |
London
Springer London
01.10.2023
Springer Nature B.V |
| Subjects | |
| Online Access | Get full text |
| ISSN | 0941-0643 1433-3058 |
| DOI | 10.1007/s00521-023-08881-7 |
Cover
| Summary: | Well-honed CNN architectures trained with massive labeled images datasets are the state-of-the-art solution in many fields. In this paper, the weights of five commonly used pre-trained models are carefully analyzed for extracting their numerical characteristics and spatial distribution law. The general characteristics are: (1) the weights of a single convolutional layer conform to the distribution of symmetric power law. (2) the power exponent at the center of its convolutional kernel is relatively large, and the power exponent decreases radially from the center. (3) the value range of power exponents between layers is continuous from
-
0.5
to
-
3.5
. Based on these founding, a weight initialization method is proposed in order to speed up the convergence and improve the performance of CNN models. The proposed weight initialization method is compared with several commonly used methods. Extensive experiments show that it can improve the convergence speed of the CNN models, and the model accuracy is improved by 1–3%. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 0941-0643 1433-3058 |
| DOI: | 10.1007/s00521-023-08881-7 |