Massively Parallel Signal Processing Using the Graphics Processing Unit for Real-Time Brain-Computer Interface Feature Extraction

The clock speeds of modern computer processors have nearly plateaued in the past 5 years. Consequently, neural prosthetic systems that rely on processing large quantities of data in a short period of time face a bottleneck, in that it may not be possible to process all of the data recorded from an e...

Full description

Saved in:

Bibliographic Details
Published in	Frontiers in neuroengineering Vol. 2; p. 11
Main Author	Wilson, J. Adam
Format	Journal Article
Language	English
Published	Switzerland Frontiers Research Foundation 2009
Subjects	Neuroscience CUDA brain–computer interface BCI2000 NVIDIA parallel processing
Online Access	Get full text
ISSN	1662-6443 1662-6443
DOI	10.3389/neuro.16.011.2009

Cover

More Information
Summary:	The clock speeds of modern computer processors have nearly plateaued in the past 5 years. Consequently, neural prosthetic systems that rely on processing large quantities of data in a short period of time face a bottleneck, in that it may not be possible to process all of the data recorded from an electrode array with high channel counts and bandwidth, such as electrocorticographic grids or other implantable systems. Therefore, in this study a method of using the processing capabilities of a graphics card [graphics processing unit (GPU)] was developed for real-time neural signal processing of a brain-computer interface (BCI). The NVIDIA CUDA system was used to offload processing to the GPU, which is capable of running many operations in parallel, potentially greatly increasing the speed of existing algorithms. The BCI system records many channels of data, which are processed and translated into a control signal, such as the movement of a computer cursor. This signal processing chain involves computing a matrix-matrix multiplication (i.e., a spatial filter), followed by calculating the power spectral density on every channel using an auto-regressive method, and finally classifying appropriate features for control. In this study, the first two computationally intensive steps were implemented on the GPU, and the speed was compared to both the current implementation and a central processing unit-based implementation that uses multi-threading. Significant performance gains were obtained with GPU processing: the current implementation processed 1000 channels of 250 ms in 933 ms, while the new GPU method took only 27 ms, an improvement of nearly 35 times.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 Edited by: Michele Giugliano, Ecole Polytechnique de Lausanne, Switzerland; University of Antwerp, Belgium Reviewed by: Eleni Vasilaki, University of Sheffield, UK; EPFL, Switzerland; Stephan Theiss, University of Dusseldorf, Germany
ISSN:	1662-6443 1662-6443
DOI:	10.3389/neuro.16.011.2009