WhiteDwarf: A Holistic Co-Design Approach to Ultra-Compact Neural Inference Acceleration
Recent work on the Strong Lottery Ticket Hypothesis has upgraded the role of sparsity, commonly exploited by algorithms and sparse architectures as an advantageous by-product of training overparameterized models, to the main driver of neural training, opening new co-design opportunities for efficien...
Saved in:
| Published in | IEEE access Vol. 13; pp. 86509 - 86527 |
|---|---|
| Main Authors | , , , , , , , , |
| Format | Journal Article |
| Language | English |
| Published |
Piscataway
IEEE
2025
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Subjects | |
| Online Access | Get full text |
| ISSN | 2169-3536 2169-3536 |
| DOI | 10.1109/ACCESS.2025.3568916 |
Cover
| Summary: | Recent work on the Strong Lottery Ticket Hypothesis has upgraded the role of sparsity, commonly exploited by algorithms and sparse architectures as an advantageous by-product of training overparameterized models, to the main driver of neural training, opening new co-design opportunities for efficient neural execution. WhiteDwarf is a holistic ultra-compact approach that first uses a triple model compression algorithm to reduce CNN and MLP models to under 10% of the original off-chip size and then uses a triple unstructured sparsity exploitation architecture for under 1% on-chip size. Folded, Signed, and Multi-coated supermasks are integrated into a novel scalar supermask that integrates pruning, quantization, and training. These supermasks offer straightforward compression and an opportunity for a sparse, mixed precision, multiplier-less datapath that does not require complex control. Activation precision is set to FP8 with large models in mind, while INT2-4 weights suffice for sparse and accurate supermask models. The 1K-PE array features parallel, hierarchically synchronous, non-zero gathering, and bit-decomposed computation for exploiting unstructured sparsity down to bit-level. For example, a ResNet-50 is compressed to 5.3% of the original model size while maintaining 74.7% accuracy on ImageNet. The fabricated 40-nm CMOS chip, aimed at high inference accuracy and power efficiency, achieves 12.24 TFLOPS/W at 99% weight sparsity. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 2169-3536 2169-3536 |
| DOI: | 10.1109/ACCESS.2025.3568916 |