NAS Parallel Benchmarks with Python: a performance and programming effort analysis focusing on GPUs

Compiled low-level languages, such as C/C++ and Fortran, have been employed as programming tools to implement applications to explore GPU devices. As a counterpoint to that trend, this paper presents a performance and programming effort analysis with Python, an interpreted and high-level language, w...

Full description

Saved in:
Bibliographic Details
Published inThe Journal of supercomputing Vol. 79; no. 8; pp. 8890 - 8911
Main Authors Di Domenico, Daniel, Lima, João V. F., Cavalheiro, Gerson G. H.
Format Journal Article
LanguageEnglish
Published New York Springer US 01.05.2023
Springer Nature B.V
Subjects
Online AccessGet full text
ISSN0920-8542
1573-0484
1573-0484
DOI10.1007/s11227-022-04932-3

Cover

More Information
Summary:Compiled low-level languages, such as C/C++ and Fortran, have been employed as programming tools to implement applications to explore GPU devices. As a counterpoint to that trend, this paper presents a performance and programming effort analysis with Python, an interpreted and high-level language, which was applied to develop the kernels and applications of NAS Parallel Benchmarks targeting GPUs. We used Numba environment to enable CUDA support in Python, a tool that allows us to implement the GPU programs with pure Python code. Our experimental results showed that Python applications reached a performance similar to C++ programs employing CUDA and better than C++ using OpenACC for most NPB benchmarks. Furthermore, Python codes demanded less operations related to the GPU framework than CUDA, mainly because Python needs a lower number of statements to manage memory allocations and data transfers. Despite that, our Python implementations required more operations than OpenACC ones.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0920-8542
1573-0484
1573-0484
DOI:10.1007/s11227-022-04932-3