Real-time optical flow processing on embedded GPU: an hardware-aware algorithm to implementation strategy

Determining the optical flow of a video is a compute-intensive task essential for computer vision. For achieving this processing in real time, the whole algorithm deployment chain must be thought of for efficiency first. The development is usually divided into two parts: first, designing an algorith...

Full description

Saved in:

Bibliographic Details
Published in	Journal of real-time image processing Vol. 19; no. 2; pp. 317 - 329
Main Authors	Seznec, Mickaël, Gac, Nicolas, Orieux, François, Naik, Alvin Sashala
Format	Journal Article
Language	English
Published	Berlin/Heidelberg Springer Berlin Heidelberg 01.04.2022 Springer Nature B.V Springer Verlag
Subjects	Accuracy Algorithms Computer Graphics Computer Science Computer vision Embedded systems Engineering Sciences Field programmable gate arrays Graphics processing units Hardware Image Processing and Computer Vision Impact analysis Industrial applications Linear algebra Methods Microprocessors Multimedia Information Systems Numerical analysis Optical flow (image analysis) Optimization Original Research Paper Pattern Recognition Real time Signal and Image processing Signal,Image and Speech Processing Solvers GPU optimization Linear solvers Algorithm design Image processing Optical flow Linear Solvers Image Processing Optical Flow GPU Optimization
Online Access	Get full text
ISSN	1861-8200 1861-8219 1861-8219
DOI	10.1007/s11554-021-01187-8

Cover

More Information
Summary:	Determining the optical flow of a video is a compute-intensive task essential for computer vision. For achieving this processing in real time, the whole algorithm deployment chain must be thought of for efficiency first. The development is usually divided into two parts: first, designing an algorithm that meets precision constraints, then, implementing and optimizing its execution on the targeted platform. We argue that unifying those operations enhances performance on the embedded processor. This paper is based on an industrial use case of computer vision. The objective is to determine dense optical flow in real time on an embedded GPU platform: the Nvidia AGX Xavier. The CLG (combined local–global) optical flow method, initially chosen, is analyzed to understand the convergence speed of its underlying optimization problem. The Jacobi solver is selected for implementation because of its parallel nature. The whole multi-level processing is then ported to the GPU, using several specific optimization strategies. In particular, we analyze the impact of fusing the solver’s iterations with the roofline model. As a result, with a 30 W power budget, our implementation runs at 60FPS, on 640 × 512 images, with a four-level processing. Hopefully, this example should provide feedback on the issues that arise when trying to port a method to a parallel platform and serve for further implementations of computer vision algorithms on specialized hardware.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1861-8200 1861-8219 1861-8219
DOI:	10.1007/s11554-021-01187-8