Feasibility Analysis of Machine Learning Optimization on GPU-based Low-cost Edges
Many AI algorithms have been deployed on edge devices as edge computing has the advantages of reducing latency, saving network bandwidth, and protecting data privacy. Whether edge devices can run AI algorithms is an important challenge due to the low-power and low-cost characteristics of edge device...
Saved in:
| Published in | 2021 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/IOP/SCI) pp. 89 - 96 |
|---|---|
| Main Authors | , , , , |
| Format | Conference Proceeding |
| Language | English |
| Published |
IEEE
01.10.2021
|
| Subjects | |
| Online Access | Get full text |
| DOI | 10.1109/SWC50871.2021.00022 |
Cover
| Summary: | Many AI algorithms have been deployed on edge devices as edge computing has the advantages of reducing latency, saving network bandwidth, and protecting data privacy. Whether edge devices can run AI algorithms is an important challenge due to the low-power and low-cost characteristics of edge devices. Therefore, this paper analyzed the performance of optimization techniques by running YOLOv3 on a typical GPU-based low-cost edge device, NVIDIA Jetson Nano. YOLOv3 is a representative object detection algorithm, which is widely used as the benchmark in AI scenarios. We compared latency, memory, and power consumption of three deep learning frameworks, TensorFlow, PyTorch, and TensorRT. Then we squeezed the extreme performance using multiple optimization techniques, including model quantization, model parallelization, and image scaling on TensorRT. The running speed of YOLOv3 increases from 3.9FPS to 13.1FPS on NVIDIA Jetson Nano. It proves that the resource-limited edge device can run AI applications with high computing power requirements in a real-time manner. Moreover, we summarized nine observations and five insights to guide the selection and design of optimization techniques and verified the generalization of these rules on NVIDIA Jetson Xavier NX. We also provided a series of suggestions to help developers choose the appropriate method to deploy AI algorithms on edge devices. |
|---|---|
| DOI: | 10.1109/SWC50871.2021.00022 |