d2o: a distributed data object for parallel high-performance computing in Python
We introduce d2o , a Python module for cluster-distributed multi-dimensional numerical arrays. It acts as a layer of abstraction between the algorithm code and the data-distribution logic. The main goal is to achieve usability without losing numerical performance and scalability. d2o ’s global inter...
Saved in:
| Published in | Journal of big data Vol. 3; no. 1; pp. 1 - 34 |
|---|---|
| Main Authors | , , , |
| Format | Journal Article |
| Language | English |
| Published |
Cham
Springer International Publishing
15.09.2016
Springer Nature B.V |
| Subjects | |
| Online Access | Get full text |
| ISSN | 2196-1115 2196-1115 |
| DOI | 10.1186/s40537-016-0052-5 |
Cover
| Summary: | We introduce
d2o
, a Python module for cluster-distributed multi-dimensional numerical arrays. It acts as a layer of abstraction between the algorithm code and the data-distribution logic. The main goal is to achieve usability without losing numerical performance and scalability.
d2o
’s global interface is similar to the one of a numpy.ndarray, whereas the cluster node’s local data is directly accessible for use in customized high-performance modules.
d2o
is written in pure Python which makes it portable and easy to use and modify. Expensive operations are carried out by dedicated external libraries like
numpy
and
mpi4py
. The performance of
d2o
is on a par with numpy for serial applications and scales well when moving to an MPI cluster.
d2o
is open-source software available under the GNU General Public License v3 (GPL-3) at
https://gitlab.mpcdf.mpg.de/ift/D2O
. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 2196-1115 2196-1115 |
| DOI: | 10.1186/s40537-016-0052-5 |