Algorithms and Architectures for Parallel Processing 21st International Conference, ICA3PP 2021, Virtual Event, December 3-5, 2021, Proceedings, Part II

The three volume set LNCS 13155, 13156, and 13157 constitutes the refereed proceedings of the 21st International Conference on Algorithms and Architectures for Parallel Processing, ICA3PP 2021, which was held online during December 3-5, 2021. The total of 145 full papers included in these proceeding...

Full description

Saved in:

Bibliographic Details
Main Authors	Lai, Yongxuan, Wang, Tian, Jiang, Min, Xu, Guangquan, Liang, Wei, Castiglione, Aniello
Format	eBook
Language	English
Published	Cham Springer International Publishing AG 2022 Springer International Publishing
Edition	1
Series	Lecture notes in computer science. Theoretical computer science and general issues
Subjects	Computer algorithms Computer architecture Computer architecture-Congresses Parallel processing (Electronic computers)
Online Access	Get full text
ISBN	9783030953874 3030953874

Cover

Table of Contents:

4 Proposed Optimization Methods for SNMF Technique -- 4.1 Single GPU Computation Model -- 4.2 Multi-GPUs Computation Model -- 5 Experimental Study and Analysis -- 5.1 Environment -- 5.2 Performance Evaluation on Single GPU Platform -- 5.3 Performance Evaluation on Multi-GPU Platform -- 6 Conclusion and Future Work -- References -- HaDPA: A Data-Partition Algorithm for Data Parallel Applications on Heterogeneous HPC Platforms -- 1 Introduction -- 2 Related Work -- 3 HaDPA: Heterogeneous Aware Data-Partition Algorithm -- 3.1 Framework of HaDPA -- 3.2 Computation Characterization Module -- 3.3 Communication Characterization Module -- 3.4 Solving Hetergeneous-Aware Data-Partition Problem Based on the Computation and Communication Module -- 4 Using HaDPA for Data Partition -- 5 Experimental Analysis of HaDPA -- 5.1 Experimental Platform and Applications -- 5.2 Verification of HaDPA's Overall Performance Optimization -- 6 Conclusion -- References -- A NUMA-Aware Parallel Truss Decomposition Algorithm for Large Scale Graphs -- 1 Introduction -- 2 Background and Motivation -- 2.1 Preliminaries -- 2.2 Related Work -- 2.3 Motivation -- 3 NUMA-Aware Parallel Truss Decomposition -- 3.1 Overview -- 3.2 Data Structure -- 3.3 Support Initialization -- 3.4 Iterative Edge Peeling -- 3.5 Heuristic Partition of k -- 3.6 Algorithm Analysis -- 4 Implementation -- 5 Evaluation -- 5.1 Experimental Setup -- 5.2 Evaluation of Comparing with PKT -- 5.3 Evaluation of Scalability -- 5.4 Evaluation of Heuristic k Partition -- 5.5 Evaluation of Triangle Enumeration Optimization -- 6 Conclusion -- References -- Large-Scale Parallel Alignment Algorithm for SMRT Reads -- 1 Introduction -- 2 Parallel of rHAT -- 2.1 Sequence Distribution -- 2.2 MPI Version of rHAT -- 2.3 OpenCL Version of PrHAT -- 3 Results -- 3.1 Performance of PrHAT -- 3.2 Accelerate PrHAT with GPU -- 4 Conclusion
2 DNA Code Constraints
5.4 Impact of Large Page -- 6 Related Works -- 7 Conclusion -- References -- Hybrid GA-SVR: An Effective Way to Predict Short-Term Traffic Flow -- 1 Introduction -- 2 Methodology -- 2.1 Support Vector Regression and Genetic Algorithm -- 2.2 GA-SVR for Traffic Flow Forecasting -- 3 Experiments -- 3.1 Data Description -- 3.2 Evaluation Metrics -- 3.3 Experimental Results -- 4 Conclusion -- References -- Parallel and Distributed Algorithms and Applications -- MobiTrack: Mobile Crowdsensing-Based Object Tracking with Min-Region and Max-Utility -- 1 Introduction -- 2 Related Work -- 2.1 Trajectory Prediction of Object -- 2.2 Task Assignment in MCS -- 3 System Model and Problem Statement -- 3.1 System Model -- 3.2 Problem Statement -- 4 Minimum Region Tracking And Maximum Utility Assignment -- 4.1 Offline Object Movement Prediction -- 4.2 Online Tracking Task Assignment -- 5 Evaluation -- 5.1 Dataset Overview -- 5.2 Simulation Setup -- 6 Experimental Results and Analysis -- 6.1 Effectiveness of Movement Prediction -- 6.2 Performance of Task Assignment Strategies -- 7 Conclusion and Future Work -- References -- Faulty Processor Identification for a Multiprocessor System Under the PMC Model Using a Binary Grey Wolf Optimizer -- 1 Introduction -- 2 Preparation Knowledge -- 2.1 PMC Model -- 2.2 Grey Wolf Optimizer -- 3 Details of the Proposed Algorithm -- 3.1 Population Initialization -- 3.2 Fitness Function -- 3.3 New Competitive Mechanism -- 3.4 Convergence Strategy -- 3.5 Mutation Operator -- 3.6 The Time Complexity of BGWOFD -- 4 Experiment -- 4.1 Influence of Parameters on the Performance -- 4.2 Empirical Comparison -- 5 Conclusion -- References -- Fast On-Road Object Detector on ROS-Based Mobile Robot -- 1 Introduction -- 2 Proposed System -- 3 Object Detector -- 3.1 Basic Algorithm -- 3.2 Optimization for SSD -- 4 Evaluation -- 4.1 Detection Performance
4.2 System Experiment -- 5 Conclusions -- References -- A Lightweight Asynchronous I/O System for Non-volatile Memory -- 1 Introduction -- 2 Background and Motivation -- 2.1 PM File Systems -- 2.2 IO_uring -- 2.3 Motivation -- 3 Design -- 3.1 Overview -- 3.2 A Highly-Efficient Kernel-Level Thread Pool -- 3.3 A Novel Memory Allocation for I/O Buffer -- 3.4 A Self-adaptive I/O Request Submission Strategy -- 4 Implementation -- 5 Evaluation -- 5.1 Experimental Setup -- 5.2 Bandwidth and Latency -- 5.3 IOPS -- 5.4 Overhead Distribution -- 6 Other Related Work -- 7 Conclusion -- References -- The Case for Disjoint Job Mapping on High-Radix Networked Parallel Computers -- 1 Introduction -- 2 Background Information and Related Works -- 2.1 Co-packaged Optical Switch -- 2.2 High-Radix Network Topology -- 2.3 Job Mapping on Interconnection Networks -- 3 Job Mapping on Ultra High-Radix Networks -- 4 Evaluation -- 4.1 Network Configurations -- 4.2 Workloads -- 4.3 Metrics for Comparison -- 4.4 Results -- 5 Conclusion -- References -- FastCache: A Client-Side Cache with Variable-Position Merging Schema in Network Storage System -- 1 Introduction -- 2 Background and Motivation -- 2.1 Background -- 2.2 Motivation -- 3 FastCache Design -- 3.1 Overview -- 3.2 API Interceptor -- 3.3 Memory Cache Entry Manager -- 3.4 Cache Flush Controller -- 4 Implementation -- 5 Evaluation -- 5.1 Experiment Setup -- 5.2 Performance Improvement by FastCache -- 5.3 Comparison with Existing Client-Side Cache Algorithm -- 5.4 Related Work -- 6 Conclusion -- References -- An Efficient Parallelization Model for Sparse Non-negative Matrix Factorization Using cuSPARSE Library on Multi-GPU Platform -- 1 Introduction -- 2 Related Work -- 3 General Concepts -- 3.1 Non-negative Matrix Factorization -- 3.2 Sparse Non-negative Matrix Factorization
Appendix -- References -- Square Fractional Repetition Codes for Distributed Storage Systems -- 1 Introduction -- 2 Background and Related Work -- 2.1 Fractional Repetition Codes -- 2.2 Related Work -- 3 Square Fractional Repetition Codes -- 3.1 An Illustrative Example -- 3.2 Code Construction -- 3.3 Supported File Size -- 4 Conclusion -- References -- An Anti-forensic Method Based on RS Coding and Distributed Storage -- 1 Introduction -- 1.1 Our Contributions -- 1.2 Related Works -- 1.3 Paper Organization -- 2 Preliminaries -- 2.1 File Header Signatures -- 2.2 Reed-Solomon Codes -- 3 System Architecture and Design Goals -- 3.1 System Architecture -- 3.2 Design Goals -- 4 System Design -- 4.1 A New File Signature -- 4.2 CSM Scheme -- 5 Analysis -- 5.1 Security and Theoretical Analysis -- 5.2 Performance Analysis -- 6 Conclusion -- References -- Data Science -- Predicting Consumers' Coupon-usage in E-commerce with Capsule Network -- 1 Introduction -- 2 Related Work -- 3 Problem Statement and Preliminaries -- 4 Data Processing and Feature Engineering -- 4.1 Data Processing -- 4.2 Feature Engineering -- 5 Coupon Usage Prediction with Capsule Network -- 5.1 Prediction with Explicit Feature Capsule Network (ECapsNet) -- 5.2 Prediction with Implicit Feature Capsule Network (ICapsNet) -- 5.3 Sample Imbalance Processing -- 5.4 Coupon Usage Prediction -- 6 Experiments -- 6.1 Parameter Sensitivity -- 6.2 Comparative Experiments -- 6.3 Ablation Study -- 7 Conclusion and Future Work -- References -- A High-Availability K-modes Clustering Method Based on Differential Privacy -- 1 Introduction -- 2 Related Work -- 3 Preliminaries -- 4 Our Proposed Scheme -- 4.1 Overview -- 4.2 Program Description -- 4.3 Privacy Analysis -- 5 Conclusion -- References -- A Strategy-based Optimization Algorithm to Design Codes for DNA Data Storage System -- 1 Introduction
Intro -- Preface -- Organization -- Contents - Part II -- Software Systems and Efficient Algorithms -- The Design and Realization of a Novel Intelligent Drying Rack System Based on STM32 -- 1 Introduction -- 2 Related Work -- 3 System Design -- 3.1 System Framework -- 3.2 System Function Module -- 3.3 Principle of Control -- 3.4 System Flow -- 4 System Hardware Design -- 4.1 Main Control Chip -- 4.2 Raindrop Sensor Module -- 4.3 Photoresistor Module -- 4.4 Wind Speed Sensor Module -- 4.5 Infrared Module -- 4.6 Button Module -- 4.7 Speech Recognition Module -- 4.8 WIFI Module -- 5 Software Design -- 6 Performance Evaluation -- 6.1 Environment Settings -- 6.2 Physical Test -- 6.3 Sample Test -- 6.4 Performance Evaluation -- 7 Conclusion -- References -- Efficient Estimation of Time-Dependent Shortest Paths Based on Shortcuts -- 1 Introduction -- 2 Related Work -- 3 Problem Definition -- 3.1 Time-Dependent Road Network -- 3.2 Partitions of Road Network -- 4 Graph Partitioning and Shortcuts Construction -- 4.1 Bidirectional Partitioning -- 4.2 Add Shortcuts -- 5 Time-Dependent A* with Shortcuts -- 6 Avoid Detours -- 6.1 Hop on Directionally -- 6.2 Hop Off Early -- 7 Experiments -- 7.1 Compared Algorithms -- 7.2 Result Analysis -- 8 Conclusion -- References -- Multi-level PWB and PWC for Reducing TLB Miss Overheads on GPUs -- 1 Introduction -- 2 Background -- 2.1 GPU Architecture and Execution Model -- 2.2 Virtual Memory Support in GPU -- 2.3 GPU Memory Management Unit -- 2.4 Summary -- 3 The Multi-level PWB and PWC -- 3.1 The Multi-level PWB -- 3.2 The Multi-level PWC -- 3.3 The WT -- 4 MMU Mechanism -- 4.1 Process 1: Filling Requests -- 4.2 Process 2: Processing Requests -- 4.3 Process 3: Updating Information -- 4.4 Process 4: Rearrangement -- 5 Performance Analysis -- 5.1 Experiment Setting -- 5.2 Area Overheads -- 5.3 Performance Improvement