WhiteDwarf: A Holistic Co-Design Approach to Ultra-Compact Neural Inference Acceleration

Recent work on the Strong Lottery Ticket Hypothesis has upgraded the role of sparsity, commonly exploited by algorithms and sparse architectures as an advantageous by-product of training overparameterized models, to the main driver of neural training, opening new co-design opportunities for efficien...

Full description

Saved in:

Bibliographic Details
Published in	IEEE access Vol. 13; pp. 86509 - 86527
Main Authors	Garcia-Arias, Angel Lopez, Okoshi, Yasuyuki, Yu, Jaehoon, Suzuki, Junnosuke, Otsuka, Hikari, Kawamura, Kazushi, Van Chu, Thiem, Fujiki, Daichi, Motomura, Masato
Format	Journal Article
Language	English
Published	Piscataway IEEE 2025 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Accuracy Algorithms Artificial neural networks Co-design Computational modeling Costs Deep learning Inference lottery ticket hypothesis mixed precision multiplier-less computation neural accelerators neural network Numerical models Optimization Power efficiency pruning quantization Quantization (signal) sparse computing Sparsity specialized hardware Stars System-on-chip Training
Online Access	Get full text
ISSN	2169-3536 2169-3536
DOI	10.1109/ACCESS.2025.3568916

Cover

More Information
Summary:	Recent work on the Strong Lottery Ticket Hypothesis has upgraded the role of sparsity, commonly exploited by algorithms and sparse architectures as an advantageous by-product of training overparameterized models, to the main driver of neural training, opening new co-design opportunities for efficient neural execution. WhiteDwarf is a holistic ultra-compact approach that first uses a triple model compression algorithm to reduce CNN and MLP models to under 10% of the original off-chip size and then uses a triple unstructured sparsity exploitation architecture for under 1% on-chip size. Folded, Signed, and Multi-coated supermasks are integrated into a novel scalar supermask that integrates pruning, quantization, and training. These supermasks offer straightforward compression and an opportunity for a sparse, mixed precision, multiplier-less datapath that does not require complex control. Activation precision is set to FP8 with large models in mind, while INT2-4 weights suffice for sparse and accurate supermask models. The 1K-PE array features parallel, hierarchically synchronous, non-zero gathering, and bit-decomposed computation for exploiting unstructured sparsity down to bit-level. For example, a ResNet-50 is compressed to 5.3% of the original model size while maintaining 74.7% accuracy on ImageNet. The fabricated 40-nm CMOS chip, aimed at high inference accuracy and power efficiency, achieves 12.24 TFLOPS/W at 99% weight sparsity.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2025.3568916