A Multiply-Less Approximate SRAM Compute-In-Memory Macro for Neural-Network Inference

Compute-in-memory (CIM) is promising in reducing data movement energy and providing large bandwidth for matrix-vector multiplies (MVMs). However, existing work still faces various challenges, such as the digital logic overhead caused by the multiply-add operations (OPs) and structural sparsity. This...

Full description

Saved in:

Bibliographic Details
Published in	IEEE journal of solid-state circuits Vol. 60; no. 2; pp. 695 - 706
Main Authors	Diao, Haikang, He, Yifan, Li, Xuan, Tang, Chen, Jia, Wenbin, Yue, Jinshan, Luo, Haoyang, Song, Jiahao, Li, Xueqing, Yang, Huazhong, Jia, Hongyang, Liu, Yongpan, Wang, Yuan, Tang, Xiyuan
Format	Journal Article
Language	English
Published	New York IEEE 01.02.2025 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Adders Circuits Common Information Model (computing) Compute-in-memory (CIM) Computer architecture Costs dynamic logic In-memory computing Kernel Logic multiply-less approximate neural network (NN) inference Neural networks Static random access memory
Online Access	Get full text
ISSN	0018-9200 1558-173X
DOI	10.1109/JSSC.2024.3433417

Cover

More Information
Summary:	Compute-in-memory (CIM) is promising in reducing data movement energy and providing large bandwidth for matrix-vector multiplies (MVMs). However, existing work still faces various challenges, such as the digital logic overhead caused by the multiply-add operations (OPs) and structural sparsity. This article presents a 2-to-8-b scalable approximate digital SRAM-based CIM macro co-designed with a multiply-less neural network (NN) approach. It incorporates dynamic-logic-based approximate circuits for the logic area and energy saving by eliminating multiplications. A prototype is fabricated in 28-nm CMOS technology and achieves peak multiply-accumulate (MAC)-level energy efficiency of 102 TOPS/W for 8-b operations. The NN model deployment flow is used to demonstrate CIFAR-10 and ImageNet classification with ResNet-20 and ResNet-50 style multiply-less models, respectively, achieving the accuracy of 91.74% and 74.8% with 8-bit weights and activations.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0018-9200 1558-173X
DOI:	10.1109/JSSC.2024.3433417