Implementation of a motion estimation algorithm for Intel FPGAs using OpenCL

Motion Estimation is one of the main tasks behind any video encoder. It is a computationally costly task; therefore, it is usually delegated to specific or reconfigurable hardware, such as FPGAs. Over the years, multiple FPGA implementations have been developed, mainly using hardware description lan...

Full description

Saved in:
Bibliographic Details
Published inThe Journal of supercomputing Vol. 79; no. 9; pp. 9866 - 9888
Main Authors de Castro, Manuel, Osorio, Roberto R., Vilariño, David L., Gonzalez-Escribano, Arturo, Llanos, Diego R.
Format Journal Article
LanguageEnglish
Published New York Springer US 01.06.2023
Springer Nature B.V
Subjects
Online AccessGet full text
ISSN0920-8542
1573-0484
1573-0484
DOI10.1007/s11227-023-05051-3

Cover

More Information
Summary:Motion Estimation is one of the main tasks behind any video encoder. It is a computationally costly task; therefore, it is usually delegated to specific or reconfigurable hardware, such as FPGAs. Over the years, multiple FPGA implementations have been developed, mainly using hardware description languages such as Verilog or VHDL. Since programming using hardware description languages is a complex task, it is desirable to use higher-level languages to develop FPGA applications.The aim of this work is to evaluate OpenCL, in terms of expressiveness, as a tool for developing this kind of FPGA applications. To do so, we present and evaluate a parallel implementation of the Block Matching Motion Estimation process using OpenCL for Intel FPGAs, usable and tested on an Intel Stratix 10 FPGA. The implementation efficiently processes Full HD frames completely inside the FPGA. In this work, we show the resource utilization when synthesizing the code on an Intel Stratix 10 FPGA, as well as a performance comparison with multiple CPU implementations with varying levels of optimization and vectorization capabilities. We also compare the proposed OpenCL implementation, in terms of resource utilization and performance, with estimations obtained from an equivalent VHDL implementation.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:0920-8542
1573-0484
1573-0484
DOI:10.1007/s11227-023-05051-3