Dynamic task scheduling for linear algebra algorithms on distributed-memory multicore systems
This paper presents a dynamic task scheduling approach to executing dense linear algebra algorithms on multicore systems (either shared-memory or distributed-memory). We use a task-based library to replace the existing linear algebra subroutines such as PBLAS to transparently provide the same interf...
Saved in:
| Published in | Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis pp. 1 - 11 |
|---|---|
| Main Authors | , , |
| Format | Conference Proceeding |
| Language | English |
| Published |
New York, NY, USA
ACM
14.11.2009
|
| Series | ACM Conferences |
| Subjects |
Computing methodologies
> Symbolic and algebraic manipulation
> Symbolic and algebraic algorithms
> Linear algebra algorithms
Software and its engineering
> Software creation and management
> Software verification and validation
> Operational analysis
Software and its engineering
> Software organization and properties
> Contextual software domains
> Operating systems
> Process management
> Scheduling
Theory of computation
> Design and analysis of algorithms
> Approximation algorithms analysis
> Scheduling algorithms
|
| Online Access | Get full text |
| ISBN | 1605587443 9781605587448 |
| ISSN | 2167-4329 |
| DOI | 10.1145/1654059.1654079 |
Cover
| Summary: | This paper presents a dynamic task scheduling approach to executing dense linear algebra algorithms on multicore systems (either shared-memory or distributed-memory). We use a task-based library to replace the existing linear algebra subroutines such as PBLAS to transparently provide the same interface and computational function as the ScaLAPACK library. Linear algebra programs are written with the task-based library and executed by a dynamic runtime system. We mainly focus our runtime system design on the metric of performance scalability. We propose a distributed algorithm to solve data dependences without process cooperation. We have implemented the runtime system and applied it to three linear algebra algorithms: Cholesky, LU, and QR factorizations. Our experiments on both shared-memory machines (16, 32 cores) and distributed-memory machines (1024 cores) demonstrate that our runtime system is able to achieve good scalability. Furthermore, we provide analytical analysis to show why the tiled algorithms are scalable and the expected execution time. |
|---|---|
| ISBN: | 1605587443 9781605587448 |
| ISSN: | 2167-4329 |
| DOI: | 10.1145/1654059.1654079 |