Lattice QCD with domain decomposition on Intel® Xeon Phi™ co-processors

The gap between the cost of moving data and the cost of computing continues to grow, making it ever harder to design iterative solvers on extreme-scale architectures. This problem can be alleviated by alternative algorithms that reduce the amount of data movement. We investigate this in the context...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis pp. 69 - 80
Main Authors	Heybrock, Simon, Joó, Bálint, Kalamkar, Dhiraj D., Smelyanskiy, Mikhail, Vaidyanathan, Karthikeyan, Wettig, Tilo, Dubey, Pradeep
Format	Conference Proceeding
Language	English
Published	Piscataway, NJ, USA IEEE Press 16.11.2014 IEEE
Series	ACM Conferences
Subjects	and very la Applied computing > Physical sciences and engineering > Physics Computing methodologies > Symbolic and algebraic manipulation > Symbolic and algebraic algorithms > Linear algebra algorithms Domain decomposition G.1.3 [Numerical Analysis]: Numerical Linear Algebra Sparse Gold Intel® Xeon Phi coprocessor Jacobian matrices Lattice QCD Categories and subject descriptors: D.3.4 [Programming Languages]: Processors Optimization Lattices Layout Linear systems Mathematics of computing > Mathematical analysis > Numerical analysis > Computations on matrices Mathematics of computing > Mathematical software Prefetching Software and its engineering > Software notations and tools > Compilers structured Vectors Xeon Phi domain decomposition Intel coprocessor lattice QCD
Online Access	Get full text
ISBN	1479955000 9781479955008
ISSN	2167-4329
DOI	10.1109/SC.2014.11

Cover

More Information
Summary:	The gap between the cost of moving data and the cost of computing continues to grow, making it ever harder to design iterative solvers on extreme-scale architectures. This problem can be alleviated by alternative algorithms that reduce the amount of data movement. We investigate this in the context of Lattice Quantum Chromodynamics and implement such an alternative solver algorithm, based on domain decomposition, on Intel® Xeon Phi™ co-processor (KNC) clusters. We demonstrate close-to-linear on-chip scaling to all 60 cores of the KNC. With a mix of single- and half-precision the domain-decomposition method sustains 400-500 Gflop/s per chip. Compared to an optimized KNC implementation of a standard solver [1], our full multi-node domain-decomposition solver strong-scales to more nodes and reduces the time-to-solution by a factor of 5.
ISBN:	1479955000 9781479955008
ISSN:	2167-4329
DOI:	10.1109/SC.2014.11