A YOLO v3-tiny FPGA Architecture using a Reconfigurable Hardware Accelerator for Real-time Region of Interest Detection

With the recent advances in the fields of machine learning, neural networks and deep-learning algorithms have become a prevalent subject of computer vision. Especially for tasks like object classification and detection Convolutional Neu-ronal Networks (CNNs) have surpassed the previous traditional a...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings (Euromicro Conference on Digital Systems Design) pp. 84 - 92
Main Authors	Herrmann, Viktor, Knapheide, Justin, Steinert, Fritjof, Stabernack, Benno
Format	Conference Proceeding
Language	English
Published	IEEE 01.08.2022
Subjects	Classification algorithms Computational modeling Convolutional neural networks FPGA Hardware Object detection Real-time systems Throughput Video sequences YOLO
Online Access	Get full text
ISSN	2771-2508
DOI	10.1109/DSD57027.2022.00021

Cover

Abstract	With the recent advances in the fields of machine learning, neural networks and deep-learning algorithms have become a prevalent subject of computer vision. Especially for tasks like object classification and detection Convolutional Neu-ronal Networks (CNNs) have surpassed the previous traditional approaches. In addition to these applications, CNNs can recently also be found in other applications. For example the parametrization of video encoding algorithms as used in our example is quite a new application domain. Especially CNN's high recognition rate makes them particularly suitable for finding Regions of Interest (ROIs) in video sequences, which can be used for adapting the data rate of the compressed video stream accordingly. On the downside, these CNN require an immense amount of processing power and memory bandwidth. Object detection networks such as You Only Look Once (YOLO) try to balance processing speed and accuracy but still rely on power-hungry GPUs to meet real-time requirements. Specialized hardware like Field Programmable Gate Array (FPGA) implementations proved to strongly reduce this problem while still providing sufficient computational power. In this paper we propose a flexible architecture for object detection hardware acceleration based on the YOLO v3-tiny model. The reconfigurable accelerator comprises a high throughput convolution engine, custom blocks for all additional CNN operations and a programmable control unit to manage on-chip execution. The model can be deployed without significant changes based on 32-bit floating point values and without further methods that would reduce the model accuracy. Experimental results show a high capability of the design to accelerate the object detection task with a processing time of 27.5 ms per frame. It is thus real-time-capable for 30 FPS applications at frequency of 200 MHz.
AbstractList	With the recent advances in the fields of machine learning, neural networks and deep-learning algorithms have become a prevalent subject of computer vision. Especially for tasks like object classification and detection Convolutional Neu-ronal Networks (CNNs) have surpassed the previous traditional approaches. In addition to these applications, CNNs can recently also be found in other applications. For example the parametrization of video encoding algorithms as used in our example is quite a new application domain. Especially CNN's high recognition rate makes them particularly suitable for finding Regions of Interest (ROIs) in video sequences, which can be used for adapting the data rate of the compressed video stream accordingly. On the downside, these CNN require an immense amount of processing power and memory bandwidth. Object detection networks such as You Only Look Once (YOLO) try to balance processing speed and accuracy but still rely on power-hungry GPUs to meet real-time requirements. Specialized hardware like Field Programmable Gate Array (FPGA) implementations proved to strongly reduce this problem while still providing sufficient computational power. In this paper we propose a flexible architecture for object detection hardware acceleration based on the YOLO v3-tiny model. The reconfigurable accelerator comprises a high throughput convolution engine, custom blocks for all additional CNN operations and a programmable control unit to manage on-chip execution. The model can be deployed without significant changes based on 32-bit floating point values and without further methods that would reduce the model accuracy. Experimental results show a high capability of the design to accelerate the object detection task with a processing time of 27.5 ms per frame. It is thus real-time-capable for 30 FPS applications at frequency of 200 MHz.
Author	Steinert, Fritjof Stabernack, Benno Knapheide, Justin Herrmann, Viktor
Author_xml	– sequence: 1 givenname: Viktor surname: Herrmann fullname: Herrmann, Viktor email: viktor.herrmann@hhi.fraunhofer.de organization: Fraunhofer Institute for Telecommunications, Heinrich Hertz Institute,Berlin,Germany – sequence: 2 givenname: Justin surname: Knapheide fullname: Knapheide, Justin email: justin.knapheide@hhi.fraunhofer.de organization: Fraunhofer Institute for Telecommunications, Heinrich Hertz Institute,Berlin,Germany – sequence: 3 givenname: Fritjof surname: Steinert fullname: Steinert, Fritjof email: fritjof.steinert@hhi-extern.fraunhofer.de organization: Fraunhofer Institute for Telecommunications, Heinrich Hertz Institute,Berlin,Germany – sequence: 4 givenname: Benno surname: Stabernack fullname: Stabernack, Benno email: benno.stabernack@hhi.fraunhofer.de organization: Fraunhofer Institute for Telecommunications, Heinrich Hertz Institute,Berlin,Germany
BookMark	eNotjMtOwkAYhUejiYA8AZt5geI_t05n2YBcEhIM6sIVmZn-xTGlNdMi4e2p0cXJOflO8g3JXd3USMiEwZQxME_z17nSwPWUA-dTAODshoyNzliaKqklSH1LBlxrlnAF2QMZtu0XgBLMyAE55_Rju9nSH5F0ob7Qxcsyp3n0n6FD350i0lMb6gO1dIe-qctwOEXrKqQrG4uz7f_ce6ww2q6JtOyzQ1v1riP26xCamjYlXdcdRmw7OsdfbU8fyX1pqxbH_z0i74vnt9kq2WyX61m-SQITokuk9065DBw6oVKrAFNvoGBOoiqt1KWzGedK2gKUFk5krECnMw1OaeM1FyMy-fMGRNx_x3C08bI3xqRGSXEF7v9emw
CODEN	IEEPAD
ContentType	Conference Proceeding
DBID	6IE 6IL CBEJK RIE RIL
DOI	10.1109/DSD57027.2022.00021
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Applied Sciences
EISBN	9781665474047 1665474041
EISSN	2771-2508
EndPage	92
ExternalDocumentID	9996954
Genre	orig-research
GroupedDBID	6IE 6IF 6IL 6IN AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK OCL RIE RIL
ID	FETCH-LOGICAL-i133t-4ccb5b80beb356a50e6c90d1b4e5fa47fba82254ad0573b381deb7870b579c723
IEDL.DBID	RIE
IngestDate	Wed Aug 27 02:14:38 EDT 2025
IsPeerReviewed	false
IsScholarly	true
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i133t-4ccb5b80beb356a50e6c90d1b4e5fa47fba82254ad0573b381deb7870b579c723
PageCount	9
ParticipantIDs	ieee_primary_9996954
PublicationCentury	2000
PublicationDate	2022-Aug.
PublicationDateYYYYMMDD	2022-08-01
PublicationDate_xml	– month: 08 year: 2022 text: 2022-Aug.
PublicationDecade	2020
PublicationTitle	Proceedings (Euromicro Conference on Digital Systems Design)
PublicationTitleAbbrev	DSD
PublicationYear	2022
Publisher	IEEE
Publisher_xml	– name: IEEE
SSID	ssj0053194
Score	2.2346432
Snippet	With the recent advances in the fields of machine learning, neural networks and deep-learning algorithms have become a prevalent subject of computer vision....
SourceID	ieee
SourceType	Publisher
StartPage	84
SubjectTerms	Classification algorithms Computational modeling Convolutional neural networks FPGA Hardware Object detection Real-time systems Throughput Video sequences YOLO
Title	A YOLO v3-tiny FPGA Architecture using a Reconfigurable Hardware Accelerator for Real-time Region of Interest Detection
URI	https://ieeexplore.ieee.org/document/9996954
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1bT8IwFG6AJ59QwXhPH3y0sEvbrY-LiMSokCgJPpHeRohmGNwk-uttu4HE-OBbszXZ0tt3Ts93vgPABY08KmmkkB9oD-HY54hHhKCA8ZQxKaQvHdvigQ7G-HZCJjVwucmF0Vo78pnu2KaL5auFLOxVWdca54zgOqhHMS1ztdanrl1KuFIV8j3W7T32SGRcLuMBBk6S04qBbtVPcfDRb4L79YdL1shLp8hFR3790mT875_tgvZPoh4cbSBoD9R0tg-alWUJq3373gKrBD4P74bwI0T5PPuE_dFNApOtGAK0_PcZ5NC6o1k6nxVLm1MFbWB_xc37REoDUC4mD42da_rxV2QL05uWpTTDRQrd7aJBGdjTuWN4ZW0w7l8_XQ1QVXIBzY2zmiMspSAi9oTxsQnlxNNUMk_5AmuSchylghuLgmCurJCiMHCvtLB7XpCIySgID0AjW2T6EMCUEJKGpqsIPIwDFVNzJkuugkCFLGTkCLTsOE7fSlWNaTWEx38_PgE7diZL6t0paOTLQp8ZcyAX524dfAPbPrVf
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LTwIxEG4QD3pCBePbHjxa2Ednlx6JiKi8EiHBE2m7XUI0i8FFor_etixIjAdvzW6T3fT1zXS--QahqyB0AhmEEXE95RBadTnhIQDxGI8Zk0K60rItOkFzQB-GMMyh63UujFLKks9U2TRtLD-ayrm5KqsY45wB3ULbQCmFZbbW6tw1i4lmukKuwyr1pzqE2unSPqBnRTmNHOhGBRULII0Caq8-veSNvJTnqSjLr1-qjP_9tz1U-knVw701CO2jnEoOUCGzLXG2c9-LaFHDz91WF3_4JJ0kn7jRu6vh2kYUARsG_BhzbBzSJJ6M5zOTVYVNaH_B9fualBqibFQea0tX9-OvxJSm1y1DasbTGNv7RY0zuK5Sy_FKSmjQuO3fNElWdIFMtLuaEiqlAFF1hPayIeDgqEAyJ3IFVRBzGsaCa5sCKI-MlKLQgB8pYXa9gJDJ0PMPUT6ZJuoI4RgAYl93FZ5DqRdVA30qSx55XuQzn8ExKppxHL0tdTVG2RCe_P34Eu00--3WqHXfeTxFu2ZWl0S8M5RPZ3N1ro2DVFzYNfENhrm4rA
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%28Euromicro+Conference+on+Digital+Systems+Design%29&rft.atitle=A+YOLO+v3-tiny+FPGA+Architecture+using+a+Reconfigurable+Hardware+Accelerator+for+Real-time+Region+of+Interest+Detection&rft.au=Herrmann%2C+Viktor&rft.au=Knapheide%2C+Justin&rft.au=Steinert%2C+Fritjof&rft.au=Stabernack%2C+Benno&rft.date=2022-08-01&rft.pub=IEEE&rft.eissn=2771-2508&rft.spage=84&rft.epage=92&rft_id=info:doi/10.1109%2FDSD57027.2022.00021&rft.externalDocID=9996954