Scalability Evaluation of a Polymorphic Register File: A CG Case Study
We evaluate the scalability of a Polymorphic Register File using the Conjugate Gradient method as a case study. We focus on a heterogeneous multi-processor architecture, taking into consideration critical parameters such as cache bandwidth and memory latency. We compare the performance of 256 Polymo...
Saved in:
Published in | Architecture of Computing Systems - ARCS 2011 pp. 13 - 25 |
---|---|
Main Authors | , , , , |
Format | Book Chapter Publication |
Language | English |
Published |
Berlin, Heidelberg
Springer Berlin Heidelberg
2011
Springer |
Series | Lecture Notes in Computer Science |
Subjects | |
Online Access | Get full text |
ISBN | 3642191363 9783642191367 |
ISSN | 0302-9743 1611-3349 |
DOI | 10.1007/978-3-642-19137-4_2 |
Cover
Summary: | We evaluate the scalability of a Polymorphic Register File using the Conjugate Gradient method as a case study. We focus on a heterogeneous multi-processor architecture, taking into consideration critical parameters such as cache bandwidth and memory latency. We compare the performance of 256 Polymorphic Register File-augmented workers against a single Cell PowerPC Processor Unit (PPU). In such a scenario, simulation results suggest that for the Sparse Matrix Vector Multiplication kernel, absolute speedups of up to 200 times can be obtained. Moreover, when equal number of workers in the range 1-256 is employed, our design is between 1.7 and 4.2 times faster than a Cell PPU-based system. Furthermore, we study the memory latency and cache bandwidth impact on the sustainable speedups of the system considered. Our tests suggest that a 128 worker configuration requires the caches to deliver 1638.4 GB/sec in order to preserve 80% of its peak speedup. |
---|---|
ISBN: | 3642191363 9783642191367 |
ISSN: | 0302-9743 1611-3349 |
DOI: | 10.1007/978-3-642-19137-4_2 |