A hardware acceleration for surface EMG non-negative matrix factorization

To the present day, a multitude of studies aims to understand how the Central Nervous System (CNS) translates neural pulses to muscle motor tasks, through the analysis of surface EMG (sEMG) recordings. One of the most considerable methods applies the Non-Negative Matrix Factorization (NMF) to data r...

Full description

Saved in:
Bibliographic Details
Published in2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) pp. 168 - 174
Main Authors Cerina, Luca, Cancian, Pierandrea, Franco, Giuseppe, Santambrogio, Marco Domenico
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.05.2017
Subjects
Online AccessGet full text
DOI10.1109/IPDPSW.2017.66

Cover

More Information
Summary:To the present day, a multitude of studies aims to understand how the Central Nervous System (CNS) translates neural pulses to muscle motor tasks, through the analysis of surface EMG (sEMG) recordings. One of the most considerable methods applies the Non-Negative Matrix Factorization (NMF) to data recorded from sEMG electrodes, to extract coordinated motor patterns, the so-called muscle synergies, which hypothesize a modular control of the muscles by the CNS. The application of NMF could lead to novel applications in muscular rehabilitation or advanced prosthesis motor control. However, its application is still restricted to research laboratories, because the complexity of the NMF algorithm does not allow an efficient implementation on embedded systems, such as portable or wearable devices. This paper presents a FPGA-based hardware acceleration of a NMF algorithm, the Projected Gradient Alternating Least Square (ALS-PG), on a Xilinx Zynq7000 System-on-Chip (SoC) development board. In particular, we propose a case study for the pre-processing of sEMG in a hand prosthesis control system, where the NMF is subject to realtime processing constraints. Preliminary hardware optimizations produced a overall speedup of 2.92x of the algorithm (23.9x considering only the accelerated sections) compared to the software implementation executed on the board ARM core. The real-time constraint could not be met because the entire process is slower than a standard CPU implementation (6.14x worse), nevertheless, the overall system shown a promising increase in the power efficiency of 1.65x.
DOI:10.1109/IPDPSW.2017.66