A YOLO v3-tiny FPGA Architecture using a Reconfigurable Hardware Accelerator for Real-time Region of Interest Detection
With the recent advances in the fields of machine learning, neural networks and deep-learning algorithms have become a prevalent subject of computer vision. Especially for tasks like object classification and detection Convolutional Neu-ronal Networks (CNNs) have surpassed the previous traditional a...
        Saved in:
      
    
          | Published in | Proceedings (Euromicro Conference on Digital Systems Design) pp. 84 - 92 | 
|---|---|
| Main Authors | , , , | 
| Format | Conference Proceeding | 
| Language | English | 
| Published | 
            IEEE
    
        01.08.2022
     | 
| Subjects | |
| Online Access | Get full text | 
| ISSN | 2771-2508 | 
| DOI | 10.1109/DSD57027.2022.00021 | 
Cover
| Abstract | With the recent advances in the fields of machine learning, neural networks and deep-learning algorithms have become a prevalent subject of computer vision. Especially for tasks like object classification and detection Convolutional Neu-ronal Networks (CNNs) have surpassed the previous traditional approaches. In addition to these applications, CNNs can recently also be found in other applications. For example the parametrization of video encoding algorithms as used in our example is quite a new application domain. Especially CNN's high recognition rate makes them particularly suitable for finding Regions of Interest (ROIs) in video sequences, which can be used for adapting the data rate of the compressed video stream accordingly. On the downside, these CNN require an immense amount of processing power and memory bandwidth. Object detection networks such as You Only Look Once (YOLO) try to balance processing speed and accuracy but still rely on power-hungry GPUs to meet real-time requirements. Specialized hardware like Field Programmable Gate Array (FPGA) implementations proved to strongly reduce this problem while still providing sufficient computational power. In this paper we propose a flexible architecture for object detection hardware acceleration based on the YOLO v3-tiny model. The reconfigurable accelerator comprises a high throughput convolution engine, custom blocks for all additional CNN operations and a programmable control unit to manage on-chip execution. The model can be deployed without significant changes based on 32-bit floating point values and without further methods that would reduce the model accuracy. Experimental results show a high capability of the design to accelerate the object detection task with a processing time of 27.5 ms per frame. It is thus real-time-capable for 30 FPS applications at frequency of 200 MHz. | 
    
|---|---|
| AbstractList | With the recent advances in the fields of machine learning, neural networks and deep-learning algorithms have become a prevalent subject of computer vision. Especially for tasks like object classification and detection Convolutional Neu-ronal Networks (CNNs) have surpassed the previous traditional approaches. In addition to these applications, CNNs can recently also be found in other applications. For example the parametrization of video encoding algorithms as used in our example is quite a new application domain. Especially CNN's high recognition rate makes them particularly suitable for finding Regions of Interest (ROIs) in video sequences, which can be used for adapting the data rate of the compressed video stream accordingly. On the downside, these CNN require an immense amount of processing power and memory bandwidth. Object detection networks such as You Only Look Once (YOLO) try to balance processing speed and accuracy but still rely on power-hungry GPUs to meet real-time requirements. Specialized hardware like Field Programmable Gate Array (FPGA) implementations proved to strongly reduce this problem while still providing sufficient computational power. In this paper we propose a flexible architecture for object detection hardware acceleration based on the YOLO v3-tiny model. The reconfigurable accelerator comprises a high throughput convolution engine, custom blocks for all additional CNN operations and a programmable control unit to manage on-chip execution. The model can be deployed without significant changes based on 32-bit floating point values and without further methods that would reduce the model accuracy. Experimental results show a high capability of the design to accelerate the object detection task with a processing time of 27.5 ms per frame. It is thus real-time-capable for 30 FPS applications at frequency of 200 MHz. | 
    
| Author | Steinert, Fritjof Stabernack, Benno Knapheide, Justin Herrmann, Viktor  | 
    
| Author_xml | – sequence: 1 givenname: Viktor surname: Herrmann fullname: Herrmann, Viktor email: viktor.herrmann@hhi.fraunhofer.de organization: Fraunhofer Institute for Telecommunications, Heinrich Hertz Institute,Berlin,Germany – sequence: 2 givenname: Justin surname: Knapheide fullname: Knapheide, Justin email: justin.knapheide@hhi.fraunhofer.de organization: Fraunhofer Institute for Telecommunications, Heinrich Hertz Institute,Berlin,Germany – sequence: 3 givenname: Fritjof surname: Steinert fullname: Steinert, Fritjof email: fritjof.steinert@hhi-extern.fraunhofer.de organization: Fraunhofer Institute for Telecommunications, Heinrich Hertz Institute,Berlin,Germany – sequence: 4 givenname: Benno surname: Stabernack fullname: Stabernack, Benno email: benno.stabernack@hhi.fraunhofer.de organization: Fraunhofer Institute for Telecommunications, Heinrich Hertz Institute,Berlin,Germany  | 
    
| BookMark | eNotjMtOwkAYhUejiYA8AZt5geI_t05n2YBcEhIM6sIVmZn-xTGlNdMi4e2p0cXJOflO8g3JXd3USMiEwZQxME_z17nSwPWUA-dTAODshoyNzliaKqklSH1LBlxrlnAF2QMZtu0XgBLMyAE55_Rju9nSH5F0ob7Qxcsyp3n0n6FD350i0lMb6gO1dIe-qctwOEXrKqQrG4uz7f_ce6ww2q6JtOyzQ1v1riP26xCamjYlXdcdRmw7OsdfbU8fyX1pqxbH_z0i74vnt9kq2WyX61m-SQITokuk9065DBw6oVKrAFNvoGBOoiqt1KWzGedK2gKUFk5krECnMw1OaeM1FyMy-fMGRNx_x3C08bI3xqRGSXEF7v9emw | 
    
| CODEN | IEEPAD | 
    
| ContentType | Conference Proceeding | 
    
| DBID | 6IE 6IL CBEJK RIE RIL  | 
    
| DOI | 10.1109/DSD57027.2022.00021 | 
    
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present  | 
    
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher  | 
    
| DeliveryMethod | fulltext_linktorsrc | 
    
| Discipline | Applied Sciences | 
    
| EISBN | 9781665474047 1665474041  | 
    
| EISSN | 2771-2508 | 
    
| EndPage | 92 | 
    
| ExternalDocumentID | 9996954 | 
    
| Genre | orig-research | 
    
| GroupedDBID | 6IE 6IF 6IL 6IN AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK OCL RIE RIL  | 
    
| ID | FETCH-LOGICAL-i133t-4ccb5b80beb356a50e6c90d1b4e5fa47fba82254ad0573b381deb7870b579c723 | 
    
| IEDL.DBID | RIE | 
    
| IngestDate | Wed Aug 27 02:14:38 EDT 2025 | 
    
| IsPeerReviewed | false | 
    
| IsScholarly | true | 
    
| Language | English | 
    
| LinkModel | DirectLink | 
    
| MergedId | FETCHMERGED-LOGICAL-i133t-4ccb5b80beb356a50e6c90d1b4e5fa47fba82254ad0573b381deb7870b579c723 | 
    
| PageCount | 9 | 
    
| ParticipantIDs | ieee_primary_9996954 | 
    
| PublicationCentury | 2000 | 
    
| PublicationDate | 2022-Aug. | 
    
| PublicationDateYYYYMMDD | 2022-08-01 | 
    
| PublicationDate_xml | – month: 08 year: 2022 text: 2022-Aug.  | 
    
| PublicationDecade | 2020 | 
    
| PublicationTitle | Proceedings (Euromicro Conference on Digital Systems Design) | 
    
| PublicationTitleAbbrev | DSD | 
    
| PublicationYear | 2022 | 
    
| Publisher | IEEE | 
    
| Publisher_xml | – name: IEEE | 
    
| SSID | ssj0053194 | 
    
| Score | 2.2346432 | 
    
| Snippet | With the recent advances in the fields of machine learning, neural networks and deep-learning algorithms have become a prevalent subject of computer vision.... | 
    
| SourceID | ieee | 
    
| SourceType | Publisher | 
    
| StartPage | 84 | 
    
| SubjectTerms | Classification algorithms Computational modeling Convolutional neural networks FPGA Hardware Object detection Real-time systems Throughput Video sequences YOLO  | 
    
| Title | A YOLO v3-tiny FPGA Architecture using a Reconfigurable Hardware Accelerator for Real-time Region of Interest Detection | 
    
| URI | https://ieeexplore.ieee.org/document/9996954 | 
    
| hasFullText | 1 | 
    
| inHoldings | 1 | 
    
| isFullTextHit | |
| isPrint | |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1bT8IwFG6AJ59QwXhPH3y0sEvbrY-LiMSokCgJPpHeRohmGNwk-uttu4HE-OBbszXZ0tt3Ts93vgPABY08KmmkkB9oD-HY54hHhKCA8ZQxKaQvHdvigQ7G-HZCJjVwucmF0Vo78pnu2KaL5auFLOxVWdca54zgOqhHMS1ztdanrl1KuFIV8j3W7T32SGRcLuMBBk6S04qBbtVPcfDRb4L79YdL1shLp8hFR3790mT875_tgvZPoh4cbSBoD9R0tg-alWUJq3373gKrBD4P74bwI0T5PPuE_dFNApOtGAK0_PcZ5NC6o1k6nxVLm1MFbWB_xc37REoDUC4mD42da_rxV2QL05uWpTTDRQrd7aJBGdjTuWN4ZW0w7l8_XQ1QVXIBzY2zmiMspSAi9oTxsQnlxNNUMk_5AmuSchylghuLgmCurJCiMHCvtLB7XpCIySgID0AjW2T6EMCUEJKGpqsIPIwDFVNzJkuugkCFLGTkCLTsOE7fSlWNaTWEx38_PgE7diZL6t0paOTLQp8ZcyAX524dfAPbPrVf | 
    
| linkProvider | IEEE | 
    
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LTwIxEG4QD3pCBePbHjxa2Ednlx6JiKi8EiHBE2m7XUI0i8FFor_etixIjAdvzW6T3fT1zXS--QahqyB0AhmEEXE95RBadTnhIQDxGI8Zk0K60rItOkFzQB-GMMyh63UujFLKks9U2TRtLD-ayrm5KqsY45wB3ULbQCmFZbbW6tw1i4lmukKuwyr1pzqE2unSPqBnRTmNHOhGBRULII0Caq8-veSNvJTnqSjLr1-qjP_9tz1U-knVw701CO2jnEoOUCGzLXG2c9-LaFHDz91WF3_4JJ0kn7jRu6vh2kYUARsG_BhzbBzSJJ6M5zOTVYVNaH_B9fualBqibFQea0tX9-OvxJSm1y1DasbTGNv7RY0zuK5Sy_FKSmjQuO3fNElWdIFMtLuaEiqlAFF1hPayIeDgqEAyJ3IFVRBzGsaCa5sCKI-MlKLQgB8pYXa9gJDJ0PMPUT6ZJuoI4RgAYl93FZ5DqRdVA30qSx55XuQzn8ExKppxHL0tdTVG2RCe_P34Eu00--3WqHXfeTxFu2ZWl0S8M5RPZ3N1ro2DVFzYNfENhrm4rA | 
    
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%28Euromicro+Conference+on+Digital+Systems+Design%29&rft.atitle=A+YOLO+v3-tiny+FPGA+Architecture+using+a+Reconfigurable+Hardware+Accelerator+for+Real-time+Region+of+Interest+Detection&rft.au=Herrmann%2C+Viktor&rft.au=Knapheide%2C+Justin&rft.au=Steinert%2C+Fritjof&rft.au=Stabernack%2C+Benno&rft.date=2022-08-01&rft.pub=IEEE&rft.eissn=2771-2508&rft.spage=84&rft.epage=92&rft_id=info:doi/10.1109%2FDSD57027.2022.00021&rft.externalDocID=9996954 |