基于异构平台的BH算法高效并行实现

针对多核CPU和众核加速器或协处理器异构平台的架构特征进行了研究,以MPI和OpenMP混合编程模型实现了N体问题BH算法的并行,采用了正交递归二分法(ORB)使进程之间负载均衡,并对程序进行了并行优化和MIC加速。优化和加速后的程序性能提升到原版本的3.4倍以上,其中MIC加速后性能提升到加速前的1.7倍;程序具有较好的扩展性,计算粒子规模达到上亿时,可扩展到32个节点共4480核心(640个CPU核心和3840个MIC核心)。...

Full description

Saved in:
Bibliographic Details
Published in计算机应用研究 Vol. 33; no. 8; pp. 2255 - 2259
Main Author 李婵怡 王武 冯仰德 谢力
Format Journal Article
LanguageChinese
Published 中国科学院大学,北京 100049%中国科学院计算机网络信息中心 超级计算中心,北京,100190 2016
中国科学院计算机网络信息中心 超级计算中心,北京 100190
Subjects
Online AccessGet full text
ISSN1001-3695
DOI10.3969/j.issn.1001-3695.2016.08.003

Cover

More Information
Summary:针对多核CPU和众核加速器或协处理器异构平台的架构特征进行了研究,以MPI和OpenMP混合编程模型实现了N体问题BH算法的并行,采用了正交递归二分法(ORB)使进程之间负载均衡,并对程序进行了并行优化和MIC加速。优化和加速后的程序性能提升到原版本的3.4倍以上,其中MIC加速后性能提升到加速前的1.7倍;程序具有较好的扩展性,计算粒子规模达到上亿时,可扩展到32个节点共4480核心(640个CPU核心和3840个MIC核心)。
Bibliography:51-1196/TP
Studying the architecture' s characteristics of the multi-core CPU and accelerators or coprocessors heterogeneous platforms, this paper was about the parallel implementation of N-body BH algorithm with hybrid MPI and OpenMP programming model. It used orthogonal recursive bisection (ORB) to balance load between processors, then carefully optimized the code on multi-core CPU and accelerated it on MIC. Testing result shows, after optimizing and accelerating, the code' s performance rea- ches above 3.4x speedup than original version and gets a 1.7x speedup than only running on muhi-core CPU. The code also has a good scalability with a 100 million particles running on a 32 nodes cluster, which has 4 480 cores (640 CPU cores and 3 840 MIC cores).
Li Chanyi, Wang Wu , Feng Yangde , Xie Li (1. Supercomputing Center, Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China; 2. University of Chinese Academy of Sciences, Beijing 100049, China)
N-body problem; BH algorithm; heteroge
ISSN:1001-3695
DOI:10.3969/j.issn.1001-3695.2016.08.003