Solving Electromagnetic Scattering Problems With Tens of Billions of Unknowns Using GPU Accelerated Massively Parallel MLFMA

In this article, a massively parallel approach of the multilevel fast multipole algorithm (PMLFMA) on graphics processing unit (GPU) heterogeneous platform, noted as GPU-PMLFMA, is presented for solving extremely large electromagnetic scattering problems involving tens of billions of unknowns, In th...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on antennas and propagation Vol. 70; no. 7; pp. 5672 - 5682
Main Authors	He, Wei-Jia, Yang, Zeng, Huang, Xiao-Wei, Wang, Wu, Yang, Ming-Lin, Sheng, Xin-Qing
Format	Journal Article
Language	English
Published	New York IEEE 01.07.2022 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Algorithms Central processing units Computational efficiency Computational modeling Compute unified device architecture (CUDA) Computer architecture Computing time CPUs Electromagnetic scattering extremely large-scale problems Graphics processing units Instruction sets Memory management Message passing message-passing interface (MPI) parallelization multilevel fast multipole algorithm (MLFMA) Multipoles Octrees OpenMP Optimization Random access memory scattering problems
Online Access	Get full text
ISSN	0018-926X 1558-2221
DOI	10.1109/TAP.2022.3161520

Cover

More Information
Summary:	In this article, a massively parallel approach of the multilevel fast multipole algorithm (PMLFMA) on graphics processing unit (GPU) heterogeneous platform, noted as GPU-PMLFMA, is presented for solving extremely large electromagnetic scattering problems involving tens of billions of unknowns, In this approach, the flexible and efficient ternary partitioning scheme is employed at first to partition the MLFMA octree among message-passing interface (MPI) processes. Then, the computationally intensive parts of the PMLFMA on each MPI process, matrix filling, aggregation and disaggregation, and so on are accelerated by using the GPU. Different parallelization strategies in coincidence with the ternary parallel MLFMA approach are designed for GPU to ensure high computational throughput. Special memory usage strategy is designed to improve computational efficiency and benefit data reusing. The CPU/GPU asynchronous computing pattern is designed with the OpenMP and compute unified device architecture (CUDA), respectively, for accelerating the CPU and GPU execution parts and computation time overlapped. GPU architecture-based optimization strategies are implemented to further improve the computational efficiency. Numerical results demonstrate that the proposed GPU-PMLFMA can achieve over three times speedup, compared with the eight-threaded conventional PMLFMA. Solutions of scattering by electrically large and complicated objects with about 24 000 wavelengths and over 41.8 billion unknowns are presented.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0018-926X 1558-2221
DOI:	10.1109/TAP.2022.3161520