An Optimised CNN Hardware Accelerator Applicable to IoT End Nodes for Disruptive Healthcare

In the evolving landscape of computer vision, the integration of machine learning algorithms with cutting-edge hardware platforms is increasingly pivotal, especially in the context of disruptive healthcare systems. This study introduces an optimized implementation of a Convolutional Neural Network (...

Full description

Saved in:

Bibliographic Details
Published in	IoT Vol. 5; no. 4; pp. 901 - 921
Main Authors	Ghani, Arfan, Aina, Akinyemi, Hwang See, Chan
Format	Journal Article
Language	English
Published	Montreal MDPI AG 01.12.2024
Subjects	Accuracy applied artificial intelligence Classification CNN computer vision Datasets Deep learning Energy efficiency Field programmable gate arrays integer-based architecture Internet of Things IoT for healthcare Machine learning Optimization Software
Online Access	Get full text
ISSN	2624-831X 2624-831X
DOI	10.3390/iot5040041

Cover

More Information
Summary:	In the evolving landscape of computer vision, the integration of machine learning algorithms with cutting-edge hardware platforms is increasingly pivotal, especially in the context of disruptive healthcare systems. This study introduces an optimized implementation of a Convolutional Neural Network (CNN) on the Basys3 FPGA, designed specifically for accelerating the classification of cytotoxicity in human kidney cells. Addressing the challenges posed by constrained dataset sizes, compute-intensive AI algorithms, and hardware limitations, the approach presented in this paper leverages efficient image augmentation and pre-processing techniques to enhance both prediction accuracy and the training efficiency. The CNN, quantized to 8-bit precision and tailored for the FPGA’s resource constraints, significantly accelerates training by a factor of three while consuming only 1.33% of the power compared to a traditional software-based CNN running on an NVIDIA K80 GPU. The network architecture, composed of seven layers with excessive hyperparameters, processes downscale grayscale images, achieving notable gains in speed and energy efficiency. A cornerstone of our methodology is the emphasis on parallel processing, data type optimization, and reduced logic space usage through 8-bit integer operations. We conducted extensive image pre-processing, including histogram equalization and artefact removal, to maximize feature extraction from the augmented dataset. Achieving an accuracy of approximately 91% on unseen images, this FPGA-implemented CNN demonstrates the potential for rapid, low-power medical diagnostics within a broader IoT ecosystem where data could be assessed online. This work underscores the feasibility of deploying resource-efficient AI models in environments where traditional high-performance computing resources are unavailable, typically in healthcare settings, paving the way for and contributing to advanced computer vision techniques in embedded systems.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2624-831X 2624-831X
DOI:	10.3390/iot5040041