OPTIMIZATIONS OF DEEP LEARNING OBJECTS DETECTION MODELS FOR INFERENCE ACCELERATION ON GENERAL-PURPOSE AND HARDWARE-ACCELERATED SINGLE-BOARD PLATFORMS

Background. Description and preparation of modern approaches for deep learning object detection models are provided. Deep learning frameworks for model training and inference, such as TensorFlow and TensorFlow Lite, are used as bases. The concepts of deep learning model optimization are analyzed. Ma...

Full description

Saved in:

Bibliographic Details
Published in	Електроніка та інформаційні технологіі Vol. 29; no. 29; pp. 57 - 68
Main Authors	Myroniuk, Dmytro, Blagitko, Bohdan
Format	Journal Article
Language	English
Published	Ivan Franko National University of Lviv 01.03.2025
Subjects	benchmarking modeling neural networks object detection optimization single-board computers
Online Access	Get full text
ISSN	2224-087X 2224-0888 2224-0888
DOI	10.30970/eli.29.6

Cover

More Information
Summary:	Background. Description and preparation of modern approaches for deep learning object detection models are provided. Deep learning frameworks for model training and inference, such as TensorFlow and TensorFlow Lite, are used as bases. The concepts of deep learning model optimization are analyzed. Materials and Methods. The quantized int8 models are used as a baseline for optimization effectiveness estimation. The delegation approach includes software or hardware-optimized variants of neural operations. It prepared to speed up the inference process on target devices. The device with reduced performance resources or microcontroller without floating-point blocks uses a case of base-optimization model with int8 weights. The TensorFlow Lite framework has various quantization types outlined in a detailed explanation. Benchmarks for modern single-board devices are ready, and the correlation between using different optimization approaches, types of single-board platforms, and model inference speed analyses. Results and Discussion. All tested models are pretrained using the MS COCO dataset (80 classes). All models were prepared for the experiment with 8-bit full integer quantization and output-TFLite model generation using TensorFlow Object Detection API Docker images and Python 3.11. The testing data samples are obtained from the MS COCO validation dataset archive. The size of the image input is 640x640 RGB. The comparison of image recognition time to 640x640 RGB was conducted on Raspberry Pi 5, Raspberry Pi 4, and Jetson Nano 2GB. Only the Raspberry Pi 5 target device achieved real-time execution (100 ms at most or one fps) as it has more CPU performance than other devices. Conclusion. Confirmation of the real-time execution approach was achieved by using reference models with reduced image sizes (320x320 RGB). TensorFlow standard model Zoo models, compiled with the TensorRT compiler, were used for the Jetson Nano target as an NPU-optimized case. Real-time execution (100 ms at most or one fps) is reaching for most models and target devices. Such an approach is suitable for less powerful devices with ARM Cortex-A processors.
ISSN:	2224-087X 2224-0888 2224-0888
DOI:	10.30970/eli.29.6