Feasibility Analysis of Machine Learning Optimization on GPU-based Low-cost Edges

Many AI algorithms have been deployed on edge devices as edge computing has the advantages of reducing latency, saving network bandwidth, and protecting data privacy. Whether edge devices can run AI algorithms is an important challenge due to the low-power and low-cost characteristics of edge device...

Full description

Saved in:

Bibliographic Details
Published in	2021 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/IOP/SCI) pp. 89 - 96
Main Authors	Suo, Jiashun, Zhang, Xingzhou, Zhang, Shilei, Zhou, Wei, Shi, Weisong
Format	Conference Proceeding
Language	English
Published	IEEE 01.10.2021
Subjects	Computational modeling Deep learning Edge computing Edge intelligence Image edge detection Memory management NVIDIA Jetson Nano NVIDIA Jetson NX Performance evaluation Power demand Quantification Quantization (signal) YOLOv3
Online Access	Get full text
DOI	10.1109/SWC50871.2021.00022

Cover

More Information
Summary:	Many AI algorithms have been deployed on edge devices as edge computing has the advantages of reducing latency, saving network bandwidth, and protecting data privacy. Whether edge devices can run AI algorithms is an important challenge due to the low-power and low-cost characteristics of edge devices. Therefore, this paper analyzed the performance of optimization techniques by running YOLOv3 on a typical GPU-based low-cost edge device, NVIDIA Jetson Nano. YOLOv3 is a representative object detection algorithm, which is widely used as the benchmark in AI scenarios. We compared latency, memory, and power consumption of three deep learning frameworks, TensorFlow, PyTorch, and TensorRT. Then we squeezed the extreme performance using multiple optimization techniques, including model quantization, model parallelization, and image scaling on TensorRT. The running speed of YOLOv3 increases from 3.9FPS to 13.1FPS on NVIDIA Jetson Nano. It proves that the resource-limited edge device can run AI applications with high computing power requirements in a real-time manner. Moreover, we summarized nine observations and five insights to guide the selection and design of optimization techniques and verified the generalization of these rules on NVIDIA Jetson Xavier NX. We also provided a series of suggestions to help developers choose the appropriate method to deploy AI algorithms on edge devices.
DOI:	10.1109/SWC50871.2021.00022