Advanced Parallelism of DGTD Method With Local Time Stepping Based on Novel MPI + MPI Unified Parallel Algorithm
In this communication, a novel message passing interface (MPI) parallel algorithm for nodal discontinuous Galerkin time-domain (NDGTD) method has been developed. A unified MPI + MPI technique has been introduced for extreme parallelism on a large-scale computer cluster. Through the data transmission...
        Saved in:
      
    
          | Published in | IEEE transactions on antennas and propagation Vol. 70; no. 5; pp. 3916 - 3921 | 
|---|---|
| Main Authors | , , | 
| Format | Journal Article | 
| Language | English | 
| Published | 
        New York
          IEEE
    
        01.05.2022
     The Institute of Electrical and Electronics Engineers, Inc. (IEEE)  | 
| Subjects | |
| Online Access | Get full text | 
| ISSN | 0018-926X 1558-2221  | 
| DOI | 10.1109/TAP.2021.3137455 | 
Cover
| Summary: | In this communication, a novel message passing interface (MPI) parallel algorithm for nodal discontinuous Galerkin time-domain (NDGTD) method has been developed. A unified MPI + MPI technique has been introduced for extreme parallelism on a large-scale computer cluster. Through the data transmission between CPU nodes using MPI persistent nonblocking two-side communication and the direct data connection between processors in the same node via MPI shared memory windows, a two-layered parallel architecture is implemented to minimize the communication. To further accelerate the solution of the multiscale problems, the local time stepping (LTS) technique has been employed in the NDGTD method. A fast time step estimation method has been presented in this communication. With high overlap between the information transmission and the data calculation, the proposed MPI + MPI scheme overcomes the degradation of the parallel efficiency of the pure MPI technique in the scenario of the LTS technique and the large-scale CPU cores. Up to 94% parallel efficiency in 6400 CPU cores is achieved for the average single-core loading about 1700 finite elements, and 18 times acceleration for time step estimation can be obtained with the fourth-order basis function. Three practical complex examples are given to demonstrate a good performance of the proposed method. | 
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14  | 
| ISSN: | 0018-926X 1558-2221  | 
| DOI: | 10.1109/TAP.2021.3137455 |