Efficient SNN multi-cores MAC array acceleration on SpiNNaker 2
The potential low-energy feature of the spiking neural network (SNN) engages the attention of the AI community. Only CPU-involved SNN processing inevitably results in an inherently long temporal span in the cases of large models and massive datasets. This study introduces the MAC array, a parallel a...
Saved in:
| Published in | Frontiers in neuroscience Vol. 17; p. 1223262 |
|---|---|
| Main Authors | , , , , , , , , |
| Format | Journal Article |
| Language | English |
| Published |
Lausanne
Frontiers Research Foundation
07.08.2023
Frontiers Media S.A |
| Subjects | |
| Online Access | Get full text |
| ISSN | 1662-453X 1662-4548 1662-453X |
| DOI | 10.3389/fnins.2023.1223262 |
Cover
| Summary: | The potential low-energy feature of the spiking neural network (SNN) engages the attention of the AI community. Only CPU-involved SNN processing inevitably results in an inherently long temporal span in the cases of large models and massive datasets. This study introduces the MAC array, a parallel architecture on each processing element (PE) of SpiNNaker 2, into the computational process of SNN inference. Based on the work of single-core optimization algorithms, we investigate the parallel acceleration algorithms for collaborating with multi-core MAC arrays. The proposed Echelon Reorder model information densification algorithm, along with the adapted multi-core two-stage splitting and authorization deployment strategies, achieves efficient spatio-temporal load balancing and optimization performance. We evaluate the performance by benchmarking a wide range of constructed SNN models to research on the influence degree of different factors. We also benchmark with two actual SNN models (the gesture recognition model of the real-world application and balanced random cortex-like network from neuroscience) on the neuromorphic multi-core hardware SpiNNaker 2. The echelon optimization algorithm with mixed processors realizes 74.28% and 85.78% memory footprint of the original MAC calculation on these two models, respectively. The execution time of echelon algorithms using only MAC or mixed processors accounts for ≤ 24.56% of the serial ARM baseline. Accelerating SNN inference with algorithms in this study is essentially the general sparse matrix-matrix multiplication (SpGEMM) problem. This article explicitly expands the application field of the SpGEMM issue to SNN, developing novel SpGEMM optimization algorithms fitting the SNN feature and MAC array. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 Reviewed by: Zhuo Zou, Fudan University, China; Steffen Albrecht, The University of Auckland, New Zealand; Arindam Basu, City University of Hong Kong, Hong Kong SAR, China Edited by: Lei Deng, Tsinghua University, China |
| ISSN: | 1662-453X 1662-4548 1662-453X |
| DOI: | 10.3389/fnins.2023.1223262 |