Advanced Parallelism of DGTD Method With Local Time Stepping Based on Novel MPI + MPI Unified Parallel Algorithm

In this communication, a novel message passing interface (MPI) parallel algorithm for nodal discontinuous Galerkin time-domain (NDGTD) method has been developed. A unified MPI + MPI technique has been introduced for extreme parallelism on a large-scale computer cluster. Through the data transmission...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on antennas and propagation Vol. 70; no. 5; pp. 3916 - 3921
Main Authors	Ban, Zhen Guo, Shi, Yan, Wang, Peng
Format	Journal Article
Language	English
Published	New York IEEE 01.05.2022 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Algorithms Basis functions Central Processing Unit Central processing units Communication CPUs Data transmission Discontinuous Galerkin time-domain method (DGTD) Eigenvalues and eigenfunctions Electromagnetics Estimation Finite element analysis Message passing message passing interface (MPI) Method of moments MPI + MPI MPI shared memory Time domain analysis time step estimation Windows (computer programs)
Online Access	Get full text
ISSN	0018-926X 1558-2221
DOI	10.1109/TAP.2021.3137455

Cover

More Information
Summary:	In this communication, a novel message passing interface (MPI) parallel algorithm for nodal discontinuous Galerkin time-domain (NDGTD) method has been developed. A unified MPI + MPI technique has been introduced for extreme parallelism on a large-scale computer cluster. Through the data transmission between CPU nodes using MPI persistent nonblocking two-side communication and the direct data connection between processors in the same node via MPI shared memory windows, a two-layered parallel architecture is implemented to minimize the communication. To further accelerate the solution of the multiscale problems, the local time stepping (LTS) technique has been employed in the NDGTD method. A fast time step estimation method has been presented in this communication. With high overlap between the information transmission and the data calculation, the proposed MPI + MPI scheme overcomes the degradation of the parallel efficiency of the pure MPI technique in the scenario of the LTS technique and the large-scale CPU cores. Up to 94% parallel efficiency in 6400 CPU cores is achieved for the average single-core loading about 1700 finite elements, and 18 times acceleration for time step estimation can be obtained with the fourth-order basis function. Three practical complex examples are given to demonstrate a good performance of the proposed method.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0018-926X 1558-2221
DOI:	10.1109/TAP.2021.3137455