Parallel design for error-resilient entropy coding algorithm on GPU
The error-resilient entropy coding (EREC) algorithm is an effective method for combating error propagation at low cost in many compression methods using variable-length coding (VLC). However, the main drawback of the EREC is its high complexity. In order to overcome this disadvantage, a parallel ERE...
Saved in:
| Published in | Journal of parallel and distributed computing Vol. 73; no. 4; pp. 411 - 419 |
|---|---|
| Main Authors | , , , |
| Format | Journal Article |
| Language | English |
| Published |
Amsterdam
Elsevier Inc
01.04.2013
Elsevier |
| Subjects | |
| Online Access | Get full text |
| ISSN | 0743-7315 1096-0848 |
| DOI | 10.1016/j.jpdc.2012.12.008 |
Cover
| Summary: | The error-resilient entropy coding (EREC) algorithm is an effective method for combating error propagation at low cost in many compression methods using variable-length coding (VLC). However, the main drawback of the EREC is its high complexity. In order to overcome this disadvantage, a parallel EREC is implemented on a graphics processing unit (GPU) using the NVIDIA CUDA technology. The original EREC is a finer-grained parallel at each stage which brings additional communication overhead. To achieve high efficiency of parallel EREC, we propose partitioning the EREC (P-EREC) algorithm, which splits variable-length blocks into groups and then every group is coded using the EREC separately. Each GPU thread processes one group so as to make the EREC coarse-grained parallel. In addition, some optimization strategies are discussed in order to obtain higher performance using the GPU. In the case that the variable-length data blocks are divided into 128 groups (256 groups, resp.), experimental results show that the parallel P-EREC achieves 32× to 123× (54× to 350×, resp.) speedup over the original C code of EREC compiled with the O2 optimization option. Higher speedup can even be obtained with more groups. Compared to the EREC, the P-EREC not only achieves a good speedup performance, but it also slightly improves the resilience of the VLC bit-stream against burst or random errors.
► We attempt to optimize the performance of the EREC by parallel processing. ► A parallel EREC is implemented on a graphics processing unit (GPU). ► We propose partitioning the EREC (P-EREC) algorithm. ► We implemented the parallel P-EREC on GPU and optimized techniques were fully used. ► The parallel P-EREC gains 32× to 123× speedup compared with the original C code. |
|---|---|
| ISSN: | 0743-7315 1096-0848 |
| DOI: | 10.1016/j.jpdc.2012.12.008 |