Image Classification Network Compression Technique Based on Learning Temperature-Knowledge Distillation

Knowledge distillation is an important network compression technique, which can generate a small student network by learning the knowledge of the teacher network. However, there are two problems in the previous research on knowledge distillation: first, the network mismatch between teachers and stud...

Full description

Saved in:

Bibliographic Details
Published in	2024 20th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD) pp. 1 - 7
Main Authors	Bao, Zhiqiang, Tong, Tong, Du, Defei, Wang, Siwei
Format	Conference Proceeding
Language	English
Published	IEEE 27.07.2024
Subjects	Accuracy Distillation Temperature Gradient Reversal Layer Image classification Image coding Knowledge discovery Knowledge Distillation Knowledge engineering Knowledge transfer Limiting Network Compression Plugs Temperature control Training
Online Access	Get full text
DOI	10.1109/ICNC-FSKD64080.2024.10702237

Cover

More Information
Summary:	Knowledge distillation is an important network compression technique, which can generate a small student network by learning the knowledge of the teacher network. However, there are two problems in the previous research on knowledge distillation: first, the network mismatch between teachers and students is ignored; The second is to set the distillation temperature to a fixed hyperparameter, which affects the performance of the student network. To solve these problems, a Learning Temperature-Knowledge Distillation (LT-KD) technique was proposed. For each network matching is suitable for students, teachers LT-KD using the ideas of teaching between teachers and students in the gradual progress of distillation temperature made improvements, through nonparametric Gradient Reversal Layer(GRL) to alter the distillation temperature and control of network learning difficulty. Experiments show that LT-KD inserted into existing knowledge distillation algorithms can significantly improve performance on CIFAR-100 and ImageNet datasets. For example, LT-KD can compress ResNet110 with 13.73M parameters into ResNet32 with 1.9M parameters, and the accuracy of ResNet32 can reach 74.2%.
DOI:	10.1109/ICNC-FSKD64080.2024.10702237