Modeling memory concurrency for multi-socket multi-core systems

Multi-core computers are ubiquitous and multi-socket versions dominate as nodes in compute clusters. Given the high level of parallelism inherent in processor chips, the ability of memory systems to serve a large number of concurrent memory access operations is becoming a critical performance proble...

Full description

Saved in:

Bibliographic Details
Published in	2010 IEEE International Symposium on Performance Analysis of Systems and Software pp. 66 - 75
Main Authors	Mandal, Anirban, Fowler, Rob, Porterfield, Allan
Format	Conference Proceeding
Language	English
Published	IEEE 01.03.2010
Subjects	Bandwidth Concurrent computing Delay Multicore processing Parallel processing Pervasive computing Random access memory Semiconductor device measurement
Online Access	Get full text
ISBN	1424460239 9781424460236
DOI	10.1109/ISPASS.2010.5452064

Cover

More Information
Summary:	Multi-core computers are ubiquitous and multi-socket versions dominate as nodes in compute clusters. Given the high level of parallelism inherent in processor chips, the ability of memory systems to serve a large number of concurrent memory access operations is becoming a critical performance problem. The most common model of memory performance uses just two numbers, peak bandwidth and typical access latency. We introduce concurrency as an explicit parameter of the measurement and modeling processes to characterize more accurately the complexity of memory behavior of multi-socket, multi-core systems. We present a detailed experimental multi-socket, multi-core memory study based on the PCHASE benchmark, which can vary memory loads by controlling the number of concurrent memory references per thread. The make-up and structure of the memory have a major impact on achievable bandwidth. Three discrete bottlenecks were observed at different levels of the hardware architecture: limits on the number of references outstanding per core; limits to the memory requests serviced by a single memory controller; and limits on the global memory concurrency. We use these results to build a memory performance model that ties concurrency, latency and bandwidth together to create a more accurate model of overall performance. We show that current commodity memory sub-systems cannot handle the load offered by high-end processor chips.
ISBN:	1424460239 9781424460236
DOI:	10.1109/ISPASS.2010.5452064