A Framework for Lattice QCD Calculations on GPUs

Computing platforms equipped with accelerators like GPUs have proven to provide great computational power. However, exploiting such platforms for existing scientific applications is not a trivial task. Current GPU programming frameworks such as CUDA C/C++ require low-level programming from the devel...

Full description

Saved in:
Bibliographic Details
Published inProceedings - IEEE International Parallel and Distributed Processing Symposium pp. 1073 - 1082
Main Authors Winter, F. T., Clark, M. A., Edwards, R. G., Joo, B.
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.05.2014
Subjects
Online AccessGet full text
ISBN1479937991
9781479937998
ISSN1530-2075
DOI10.1109/IPDPS.2014.112

Cover

More Information
Summary:Computing platforms equipped with accelerators like GPUs have proven to provide great computational power. However, exploiting such platforms for existing scientific applications is not a trivial task. Current GPU programming frameworks such as CUDA C/C++ require low-level programming from the developer in order to achieve high performance code. As a result porting of applications to GPUs is typically limited to time-dominant algorithms and routines, leaving the remainder not accelerated which can open a serious Amdahl's law issue. The Lattice QCD application Chroma allows us to explore a different porting strategy. The layered structure of the software architecture logically separates the data-parallel from the application layer. The QCD Data-Parallel software layer provides data types and expressions with stencil-like operations suitable for lattice field theory. Chroma implements algorithms in terms of this high-level interface. Thus by porting the low-level layer one effectively ports the whole application layer in one swing. The QDP-JIT/PTX library, our reimplementation of the low-level layer, provides a framework for Lattice QCD calculations for the CUDA architecture. The complete software interface is supported and thus applications can be run unaltered on GPU-based parallel computers. This reimplementation was possible due to the availability of a JIT compiler which translates an assembly language (PTX) to GPU code. The existing expression templates enabled us to employ compile-time computations in order to build code generators and to automate the memory management for CUDA. Our implementation has allowed us to deploy the full Chroma gauge-generation program on large scale GPU-based machines such as Titan and Blue Waters and accelerate the calculation by more than an order of magnitude.
ISBN:1479937991
9781479937998
ISSN:1530-2075
DOI:10.1109/IPDPS.2014.112