Efficient SNN multi-cores MAC array acceleration on SpiNNaker 2

The potential low-energy feature of the spiking neural network (SNN) engages the attention of the AI community. Only CPU-involved SNN processing inevitably results in an inherently long temporal span in the cases of large models and massive datasets. This study introduces the MAC array, a parallel a...

Full description

Saved in:
Bibliographic Details
Published inFrontiers in neuroscience Vol. 17; p. 1223262
Main Authors Huang, Jiaxin, Kelber, Florian, Vogginger, Bernhard, Liu, Chen, Kreutz, Felix, Gerhards, Pascal, Scholz, Daniel, Knobloch, Klaus, Mayr, Christian G.
Format Journal Article
LanguageEnglish
Published Lausanne Frontiers Research Foundation 07.08.2023
Frontiers Media S.A
Subjects
Online AccessGet full text
ISSN1662-453X
1662-4548
1662-453X
DOI10.3389/fnins.2023.1223262

Cover

More Information
Summary:The potential low-energy feature of the spiking neural network (SNN) engages the attention of the AI community. Only CPU-involved SNN processing inevitably results in an inherently long temporal span in the cases of large models and massive datasets. This study introduces the MAC array, a parallel architecture on each processing element (PE) of SpiNNaker 2, into the computational process of SNN inference. Based on the work of single-core optimization algorithms, we investigate the parallel acceleration algorithms for collaborating with multi-core MAC arrays. The proposed Echelon Reorder model information densification algorithm, along with the adapted multi-core two-stage splitting and authorization deployment strategies, achieves efficient spatio-temporal load balancing and optimization performance. We evaluate the performance by benchmarking a wide range of constructed SNN models to research on the influence degree of different factors. We also benchmark with two actual SNN models (the gesture recognition model of the real-world application and balanced random cortex-like network from neuroscience) on the neuromorphic multi-core hardware SpiNNaker 2. The echelon optimization algorithm with mixed processors realizes 74.28% and 85.78% memory footprint of the original MAC calculation on these two models, respectively. The execution time of echelon algorithms using only MAC or mixed processors accounts for ≤ 24.56% of the serial ARM baseline. Accelerating SNN inference with algorithms in this study is essentially the general sparse matrix-matrix multiplication (SpGEMM) problem. This article explicitly expands the application field of the SpGEMM issue to SNN, developing novel SpGEMM optimization algorithms fitting the SNN feature and MAC array.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
Reviewed by: Zhuo Zou, Fudan University, China; Steffen Albrecht, The University of Auckland, New Zealand; Arindam Basu, City University of Hong Kong, Hong Kong SAR, China
Edited by: Lei Deng, Tsinghua University, China
ISSN:1662-453X
1662-4548
1662-453X
DOI:10.3389/fnins.2023.1223262