WhiteDwarf: A Holistic Co-Design Approach to Ultra-Compact Neural Inference Acceleration

Recent work on the Strong Lottery Ticket Hypothesis has upgraded the role of sparsity, commonly exploited by algorithms and sparse architectures as an advantageous by-product of training overparameterized models, to the main driver of neural training, opening new co-design opportunities for efficien...

Full description

Saved in:
Bibliographic Details
Published inIEEE access Vol. 13; pp. 86509 - 86527
Main Authors Garcia-Arias, Angel Lopez, Okoshi, Yasuyuki, Yu, Jaehoon, Suzuki, Junnosuke, Otsuka, Hikari, Kawamura, Kazushi, Van Chu, Thiem, Fujiki, Daichi, Motomura, Masato
Format Journal Article
LanguageEnglish
Published Piscataway IEEE 2025
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text
ISSN2169-3536
2169-3536
DOI10.1109/ACCESS.2025.3568916

Cover

More Information
Summary:Recent work on the Strong Lottery Ticket Hypothesis has upgraded the role of sparsity, commonly exploited by algorithms and sparse architectures as an advantageous by-product of training overparameterized models, to the main driver of neural training, opening new co-design opportunities for efficient neural execution. WhiteDwarf is a holistic ultra-compact approach that first uses a triple model compression algorithm to reduce CNN and MLP models to under 10% of the original off-chip size and then uses a triple unstructured sparsity exploitation architecture for under 1% on-chip size. Folded, Signed, and Multi-coated supermasks are integrated into a novel scalar supermask that integrates pruning, quantization, and training. These supermasks offer straightforward compression and an opportunity for a sparse, mixed precision, multiplier-less datapath that does not require complex control. Activation precision is set to FP8 with large models in mind, while INT2-4 weights suffice for sparse and accurate supermask models. The 1K-PE array features parallel, hierarchically synchronous, non-zero gathering, and bit-decomposed computation for exploiting unstructured sparsity down to bit-level. For example, a ResNet-50 is compressed to 5.3% of the original model size while maintaining 74.7% accuracy on ImageNet. The fabricated 40-nm CMOS chip, aimed at high inference accuracy and power efficiency, achieves 12.24 TFLOPS/W at 99% weight sparsity.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2025.3568916