A YOLO v3-tiny FPGA Architecture using a Reconfigurable Hardware Accelerator for Real-time Region of Interest Detection

With the recent advances in the fields of machine learning, neural networks and deep-learning algorithms have become a prevalent subject of computer vision. Especially for tasks like object classification and detection Convolutional Neu-ronal Networks (CNNs) have surpassed the previous traditional a...

Full description

Saved in:
Bibliographic Details
Published inProceedings (Euromicro Conference on Digital Systems Design) pp. 84 - 92
Main Authors Herrmann, Viktor, Knapheide, Justin, Steinert, Fritjof, Stabernack, Benno
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.08.2022
Subjects
Online AccessGet full text
ISSN2771-2508
DOI10.1109/DSD57027.2022.00021

Cover

Abstract With the recent advances in the fields of machine learning, neural networks and deep-learning algorithms have become a prevalent subject of computer vision. Especially for tasks like object classification and detection Convolutional Neu-ronal Networks (CNNs) have surpassed the previous traditional approaches. In addition to these applications, CNNs can recently also be found in other applications. For example the parametrization of video encoding algorithms as used in our example is quite a new application domain. Especially CNN's high recognition rate makes them particularly suitable for finding Regions of Interest (ROIs) in video sequences, which can be used for adapting the data rate of the compressed video stream accordingly. On the downside, these CNN require an immense amount of processing power and memory bandwidth. Object detection networks such as You Only Look Once (YOLO) try to balance processing speed and accuracy but still rely on power-hungry GPUs to meet real-time requirements. Specialized hardware like Field Programmable Gate Array (FPGA) implementations proved to strongly reduce this problem while still providing sufficient computational power. In this paper we propose a flexible architecture for object detection hardware acceleration based on the YOLO v3-tiny model. The reconfigurable accelerator comprises a high throughput convolution engine, custom blocks for all additional CNN operations and a programmable control unit to manage on-chip execution. The model can be deployed without significant changes based on 32-bit floating point values and without further methods that would reduce the model accuracy. Experimental results show a high capability of the design to accelerate the object detection task with a processing time of 27.5 ms per frame. It is thus real-time-capable for 30 FPS applications at frequency of 200 MHz.
AbstractList With the recent advances in the fields of machine learning, neural networks and deep-learning algorithms have become a prevalent subject of computer vision. Especially for tasks like object classification and detection Convolutional Neu-ronal Networks (CNNs) have surpassed the previous traditional approaches. In addition to these applications, CNNs can recently also be found in other applications. For example the parametrization of video encoding algorithms as used in our example is quite a new application domain. Especially CNN's high recognition rate makes them particularly suitable for finding Regions of Interest (ROIs) in video sequences, which can be used for adapting the data rate of the compressed video stream accordingly. On the downside, these CNN require an immense amount of processing power and memory bandwidth. Object detection networks such as You Only Look Once (YOLO) try to balance processing speed and accuracy but still rely on power-hungry GPUs to meet real-time requirements. Specialized hardware like Field Programmable Gate Array (FPGA) implementations proved to strongly reduce this problem while still providing sufficient computational power. In this paper we propose a flexible architecture for object detection hardware acceleration based on the YOLO v3-tiny model. The reconfigurable accelerator comprises a high throughput convolution engine, custom blocks for all additional CNN operations and a programmable control unit to manage on-chip execution. The model can be deployed without significant changes based on 32-bit floating point values and without further methods that would reduce the model accuracy. Experimental results show a high capability of the design to accelerate the object detection task with a processing time of 27.5 ms per frame. It is thus real-time-capable for 30 FPS applications at frequency of 200 MHz.
Author Steinert, Fritjof
Stabernack, Benno
Knapheide, Justin
Herrmann, Viktor
Author_xml – sequence: 1
  givenname: Viktor
  surname: Herrmann
  fullname: Herrmann, Viktor
  email: viktor.herrmann@hhi.fraunhofer.de
  organization: Fraunhofer Institute for Telecommunications, Heinrich Hertz Institute,Berlin,Germany
– sequence: 2
  givenname: Justin
  surname: Knapheide
  fullname: Knapheide, Justin
  email: justin.knapheide@hhi.fraunhofer.de
  organization: Fraunhofer Institute for Telecommunications, Heinrich Hertz Institute,Berlin,Germany
– sequence: 3
  givenname: Fritjof
  surname: Steinert
  fullname: Steinert, Fritjof
  email: fritjof.steinert@hhi-extern.fraunhofer.de
  organization: Fraunhofer Institute for Telecommunications, Heinrich Hertz Institute,Berlin,Germany
– sequence: 4
  givenname: Benno
  surname: Stabernack
  fullname: Stabernack, Benno
  email: benno.stabernack@hhi.fraunhofer.de
  organization: Fraunhofer Institute for Telecommunications, Heinrich Hertz Institute,Berlin,Germany
BookMark eNotjMtOwkAYhUejiYA8AZt5geI_t05n2YBcEhIM6sIVmZn-xTGlNdMi4e2p0cXJOflO8g3JXd3USMiEwZQxME_z17nSwPWUA-dTAODshoyNzliaKqklSH1LBlxrlnAF2QMZtu0XgBLMyAE55_Rju9nSH5F0ob7Qxcsyp3n0n6FD350i0lMb6gO1dIe-qctwOEXrKqQrG4uz7f_ce6ww2q6JtOyzQ1v1riP26xCamjYlXdcdRmw7OsdfbU8fyX1pqxbH_z0i74vnt9kq2WyX61m-SQITokuk9065DBw6oVKrAFNvoGBOoiqt1KWzGedK2gKUFk5krECnMw1OaeM1FyMy-fMGRNx_x3C08bI3xqRGSXEF7v9emw
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/DSD57027.2022.00021
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Applied Sciences
EISBN 9781665474047
1665474041
EISSN 2771-2508
EndPage 92
ExternalDocumentID 9996954
Genre orig-research
GroupedDBID 6IE
6IF
6IL
6IN
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
OCL
RIE
RIL
ID FETCH-LOGICAL-i133t-4ccb5b80beb356a50e6c90d1b4e5fa47fba82254ad0573b381deb7870b579c723
IEDL.DBID RIE
IngestDate Wed Aug 27 02:14:38 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i133t-4ccb5b80beb356a50e6c90d1b4e5fa47fba82254ad0573b381deb7870b579c723
PageCount 9
ParticipantIDs ieee_primary_9996954
PublicationCentury 2000
PublicationDate 2022-Aug.
PublicationDateYYYYMMDD 2022-08-01
PublicationDate_xml – month: 08
  year: 2022
  text: 2022-Aug.
PublicationDecade 2020
PublicationTitle Proceedings (Euromicro Conference on Digital Systems Design)
PublicationTitleAbbrev DSD
PublicationYear 2022
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0053194
Score 2.2346432
Snippet With the recent advances in the fields of machine learning, neural networks and deep-learning algorithms have become a prevalent subject of computer vision....
SourceID ieee
SourceType Publisher
StartPage 84
SubjectTerms Classification algorithms
Computational modeling
Convolutional neural networks
FPGA
Hardware
Object detection
Real-time systems
Throughput
Video sequences
YOLO
Title A YOLO v3-tiny FPGA Architecture using a Reconfigurable Hardware Accelerator for Real-time Region of Interest Detection
URI https://ieeexplore.ieee.org/document/9996954
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1bT8IwFG6AJ59QwXhPH3y0sEvbrY-LiMSokCgJPpHeRohmGNwk-uttu4HE-OBbszXZ0tt3Ts93vgPABY08KmmkkB9oD-HY54hHhKCA8ZQxKaQvHdvigQ7G-HZCJjVwucmF0Vo78pnu2KaL5auFLOxVWdca54zgOqhHMS1ztdanrl1KuFIV8j3W7T32SGRcLuMBBk6S04qBbtVPcfDRb4L79YdL1shLp8hFR3790mT875_tgvZPoh4cbSBoD9R0tg-alWUJq3373gKrBD4P74bwI0T5PPuE_dFNApOtGAK0_PcZ5NC6o1k6nxVLm1MFbWB_xc37REoDUC4mD42da_rxV2QL05uWpTTDRQrd7aJBGdjTuWN4ZW0w7l8_XQ1QVXIBzY2zmiMspSAi9oTxsQnlxNNUMk_5AmuSchylghuLgmCurJCiMHCvtLB7XpCIySgID0AjW2T6EMCUEJKGpqsIPIwDFVNzJkuugkCFLGTkCLTsOE7fSlWNaTWEx38_PgE7diZL6t0paOTLQp8ZcyAX524dfAPbPrVf
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LTwIxEG4QD3pCBePbHjxa2Ednlx6JiKi8EiHBE2m7XUI0i8FFor_etixIjAdvzW6T3fT1zXS--QahqyB0AhmEEXE95RBadTnhIQDxGI8Zk0K60rItOkFzQB-GMMyh63UujFLKks9U2TRtLD-ayrm5KqsY45wB3ULbQCmFZbbW6tw1i4lmukKuwyr1pzqE2unSPqBnRTmNHOhGBRULII0Caq8-veSNvJTnqSjLr1-qjP_9tz1U-knVw701CO2jnEoOUCGzLXG2c9-LaFHDz91WF3_4JJ0kn7jRu6vh2kYUARsG_BhzbBzSJJ6M5zOTVYVNaH_B9fualBqibFQea0tX9-OvxJSm1y1DasbTGNv7RY0zuK5Sy_FKSmjQuO3fNElWdIFMtLuaEiqlAFF1hPayIeDgqEAyJ3IFVRBzGsaCa5sCKI-MlKLQgB8pYXa9gJDJ0PMPUT6ZJuoI4RgAYl93FZ5DqRdVA30qSx55XuQzn8ExKppxHL0tdTVG2RCe_P34Eu00--3WqHXfeTxFu2ZWl0S8M5RPZ3N1ro2DVFzYNfENhrm4rA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%28Euromicro+Conference+on+Digital+Systems+Design%29&rft.atitle=A+YOLO+v3-tiny+FPGA+Architecture+using+a+Reconfigurable+Hardware+Accelerator+for+Real-time+Region+of+Interest+Detection&rft.au=Herrmann%2C+Viktor&rft.au=Knapheide%2C+Justin&rft.au=Steinert%2C+Fritjof&rft.au=Stabernack%2C+Benno&rft.date=2022-08-01&rft.pub=IEEE&rft.eissn=2771-2508&rft.spage=84&rft.epage=92&rft_id=info:doi/10.1109%2FDSD57027.2022.00021&rft.externalDocID=9996954