Image Classification Network Compression Technique Based on Learning Temperature-Knowledge Distillation
Knowledge distillation is an important network compression technique, which can generate a small student network by learning the knowledge of the teacher network. However, there are two problems in the previous research on knowledge distillation: first, the network mismatch between teachers and stud...
Saved in:
| Published in | 2024 20th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD) pp. 1 - 7 |
|---|---|
| Main Authors | , , , |
| Format | Conference Proceeding |
| Language | English |
| Published |
IEEE
27.07.2024
|
| Subjects | |
| Online Access | Get full text |
| DOI | 10.1109/ICNC-FSKD64080.2024.10702237 |
Cover
| Summary: | Knowledge distillation is an important network compression technique, which can generate a small student network by learning the knowledge of the teacher network. However, there are two problems in the previous research on knowledge distillation: first, the network mismatch between teachers and students is ignored; The second is to set the distillation temperature to a fixed hyperparameter, which affects the performance of the student network. To solve these problems, a Learning Temperature-Knowledge Distillation (LT-KD) technique was proposed. For each network matching is suitable for students, teachers LT-KD using the ideas of teaching between teachers and students in the gradual progress of distillation temperature made improvements, through nonparametric Gradient Reversal Layer(GRL) to alter the distillation temperature and control of network learning difficulty. Experiments show that LT-KD inserted into existing knowledge distillation algorithms can significantly improve performance on CIFAR-100 and ImageNet datasets. For example, LT-KD can compress ResNet110 with 13.73M parameters into ResNet32 with 1.9M parameters, and the accuracy of ResNet32 can reach 74.2%. |
|---|---|
| DOI: | 10.1109/ICNC-FSKD64080.2024.10702237 |