A Heterogeneous Parallel LU Factorization Algorithm Based on a Basic Column Block Uniform Allocation Strategy

Most supercomputers are shipped with both a CPU and a GPU. With the powerful parallel computing capability of GPUs, heterogeneous computing architecture produces new challenges for system software development and application design. Because of the significantly different architectures and programmin...

Full description

Saved in:

Bibliographic Details
Published in	Mathematical problems in engineering Vol. 2019; no. 2019; pp. 1 - 12
Main Authors	Wu, Rongteng, Xie, Xiaohong
Format	Journal Article
Language	English
Published	Cairo, Egypt Hindawi Publishing Corporation 01.01.2019 Hindawi John Wiley & Sons, Inc
Subjects	Algorithms Architectural engineering Computation Computer science Computers Decomposition Design Engineering Factorization Graphics processing units Libraries Linear algebra Mathematical analysis Matrices (mathematics) Matrix algebra Optimization Optimization techniques Processors Researchers Software Software development Supercomputers Synchronism
Online Access	Get full text
ISSN	1024-123X 1026-7077 1563-5147 1563-5147
DOI	10.1155/2019/3720450

Cover

More Information
Summary:	Most supercomputers are shipped with both a CPU and a GPU. With the powerful parallel computing capability of GPUs, heterogeneous computing architecture produces new challenges for system software development and application design. Because of the significantly different architectures and programming models of CPUs and GPUs, conventional optimization techniques for CPUs may not work well in a heterogeneous multi-CPU and multi-GPU system. We present a heterogeneous parallel LU factorization algorithm for heterogeneous architectures. According to the different performances of the processors in the system, any given matrix is partitioned into different sizes of basic column blocks. Then, a static task allocation strategy is used to distribute the basic column blocks to corresponding processors uniformly. The idle time is minimized by optimized sizes and the number of basic column blocks. Right-looking ahead technology is also used in systems configured with one CPU core to one GPU to decrease the wait time. Experiments are conducted to test the performance of synchronization and load balancing, communication cost, and scalability of the heterogeneous parallel LU factorization in different systems and compare it with the related matrix algebra algorithm on a heterogeneous system configured with multiple GPUs and CPUs.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1024-123X 1026-7077 1563-5147 1563-5147
DOI:	10.1155/2019/3720450