NAS Parallel Benchmarks with Python: a performance and programming effort analysis focusing on GPUs
Compiled low-level languages, such as C/C++ and Fortran, have been employed as programming tools to implement applications to explore GPU devices. As a counterpoint to that trend, this paper presents a performance and programming effort analysis with Python, an interpreted and high-level language, w...
Saved in:
| Published in | The Journal of supercomputing Vol. 79; no. 8; pp. 8890 - 8911 |
|---|---|
| Main Authors | , , |
| Format | Journal Article |
| Language | English |
| Published |
New York
Springer US
01.05.2023
Springer Nature B.V |
| Subjects | |
| Online Access | Get full text |
| ISSN | 0920-8542 1573-0484 1573-0484 |
| DOI | 10.1007/s11227-022-04932-3 |
Cover
| Summary: | Compiled low-level languages, such as C/C++ and Fortran, have been employed as programming tools to implement applications to explore GPU devices. As a counterpoint to that trend, this paper presents a performance and programming effort analysis with Python, an interpreted and high-level language, which was applied to develop the kernels and applications of NAS Parallel Benchmarks targeting GPUs. We used Numba environment to enable CUDA support in Python, a tool that allows us to implement the GPU programs with pure Python code. Our experimental results showed that Python applications reached a performance similar to C++ programs employing CUDA and better than C++ using OpenACC for most NPB benchmarks. Furthermore, Python codes demanded less operations related to the GPU framework than CUDA, mainly because Python needs a lower number of statements to manage memory allocations and data transfers. Despite that, our Python implementations required more operations than OpenACC ones. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 0920-8542 1573-0484 1573-0484 |
| DOI: | 10.1007/s11227-022-04932-3 |