Performance Measurements of the 3D FFT on the Blue Gene/L Supercomputer
This paper presents performance characteristics of a communications-intensive kernel, the complex data 3D FFT, running on the Blue Gene/L architecture. Two implementations of the volumetric FFT algorithm were characterized, one built on the MPI library using an optimized collective all-to-all operat...
Saved in:
| Published in | Euro-Par 2005 Parallel Processing pp. 795 - 803 |
|---|---|
| Main Authors | , , , , |
| Format | Book Chapter Conference Proceeding |
| Language | English |
| Published |
Berlin, Heidelberg
Springer Berlin Heidelberg
2005
Springer |
| Series | Lecture Notes in Computer Science |
| Subjects | |
| Online Access | Get full text |
| ISBN | 3540287000 9783540287001 |
| ISSN | 0302-9743 1611-3349 |
| DOI | 10.1007/11549468_87 |
Cover
| Summary: | This paper presents performance characteristics of a communications-intensive kernel, the complex data 3D FFT, running on the Blue Gene/L architecture. Two implementations of the volumetric FFT algorithm were characterized, one built on the MPI library using an optimized collective all-to-all operation [2] and another built on a low-level System Programming Interface (SPI) of the Blue Gene/L Advanced Diagnostics Environment (BG/L ADE) [17]. We compare the current results to those obtained using a reference MPI implementation (MPICH2 ported to BG/L with unoptimized collectives) and to a port of version 2.1.5 the FFTW library [14]. Performance experiments on the Blue Gene/L prototype indicate that both of our implementations scale well and the current MPI-based implementation shows a speedup of 730 on 2048 nodes for 3D FFTs of size 128 × 128 × 128. Moreover, the volumetric FFT outperforms FFTW port by a factor 8 for a 128× 128× 128 complex FFT on 2048 nodes. |
|---|---|
| ISBN: | 3540287000 9783540287001 |
| ISSN: | 0302-9743 1611-3349 |
| DOI: | 10.1007/11549468_87 |