Performance Measurements of the 3D FFT on the Blue Gene/L Supercomputer

This paper presents performance characteristics of a communications-intensive kernel, the complex data 3D FFT, running on the Blue Gene/L architecture. Two implementations of the volumetric FFT algorithm were characterized, one built on the MPI library using an optimized collective all-to-all operat...

Full description

Saved in:
Bibliographic Details
Published inEuro-Par 2005 Parallel Processing pp. 795 - 803
Main Authors Eleftheriou, Maria, Fitch, Blake, Rayshubskiy, Aleksandr, Ward, T. J. Christopher, Germain, Robert
Format Book Chapter Conference Proceeding
LanguageEnglish
Published Berlin, Heidelberg Springer Berlin Heidelberg 2005
Springer
SeriesLecture Notes in Computer Science
Subjects
Online AccessGet full text
ISBN3540287000
9783540287001
ISSN0302-9743
1611-3349
DOI10.1007/11549468_87

Cover

More Information
Summary:This paper presents performance characteristics of a communications-intensive kernel, the complex data 3D FFT, running on the Blue Gene/L architecture. Two implementations of the volumetric FFT algorithm were characterized, one built on the MPI library using an optimized collective all-to-all operation [2] and another built on a low-level System Programming Interface (SPI) of the Blue Gene/L Advanced Diagnostics Environment (BG/L ADE) [17]. We compare the current results to those obtained using a reference MPI implementation (MPICH2 ported to BG/L with unoptimized collectives) and to a port of version 2.1.5 the FFTW library [14]. Performance experiments on the Blue Gene/L prototype indicate that both of our implementations scale well and the current MPI-based implementation shows a speedup of 730 on 2048 nodes for 3D FFTs of size 128 × 128 × 128. Moreover, the volumetric FFT outperforms FFTW port by a factor 8 for a 128× 128× 128 complex FFT on 2048 nodes.
ISBN:3540287000
9783540287001
ISSN:0302-9743
1611-3349
DOI:10.1007/11549468_87