Blocking optimized SIMD tree search on modern processors
Tree search is a widely used fundamental algorithm. Modern processors provide tremendous computing power by integrating multiple cores, each with a vector processing unit. This paper reviews some studies on exploiting single instruction multiple date (SIMD) capacity of processors to improve the perf...
Saved in:
Published in | Journal of Shanghai University Vol. 15; no. 5; pp. 437 - 444 |
---|---|
Main Author | |
Format | Journal Article |
Language | English |
Published |
Heidelberg
Shanghai University Press
01.10.2011
School of Computer Engineering and Science, Shanghai University, Shanghai 200072, P.R.China |
Subjects | |
Online Access | Get full text |
ISSN | 1007-6417 1863-236X |
DOI | 10.1007/s11741-011-0765-2 |
Cover
Summary: | Tree search is a widely used fundamental algorithm. Modern processors provide tremendous computing power by integrating multiple cores, each with a vector processing unit. This paper reviews some studies on exploiting single instruction multiple date (SIMD) capacity of processors to improve the performance of tree search, and proposes several improvement methods on reported SIMD tree search algorithms. Based on blocking tree structure, blocking for memory alignment and dynamic blocking prefetch are proposed to optimize the overhead of memory access. Furthermore, as a way of non-linear loop unrolling, the search branch unwinding shows that the number of branches can exceed the data width of SIMD instructions in the SIMD search algorithm. The experiments suggest that blocking optimized SIMD tree search algorithm can achieve 1.6 times response speed faster than the un-optimized algorithm. |
---|---|
Bibliography: | single instruction multiple date (SIMD), tree search, binary search, streaming SIMD extensions (SSE), Cell broadband engine (BE) ZHANG Zhuo , LU Yu-fan , SHEN Wen-feng, XU Wei-min , ZHENG Yan-heng ( School of Computer Engineering and Science, Shanghai University, Shanghai 200072, P. R. China) Tree search is a widely used fundamental algorithm. Modern processors provide tremendous computing power by integrating multiple cores, each with a vector processing unit. This paper reviews some studies on exploiting single instruction multiple date (SIMD) capacity of processors to improve the performance of tree search, and proposes several improvement methods on reported SIMD tree search algorithms. Based on blocking tree structure, blocking for memory alignment and dynamic blocking prefetch are proposed to optimize the overhead of memory access. Furthermore, as a way of non-linear loop unrolling, the search branch unwinding shows that the number of branches can exceed the data width of SIMD instructions in the SIMD search algorithm. The experiments suggest that blocking optimized SIMD tree search algorithm can achieve 1.6 times response speed faster than the un-optimized algorithm. 31-1735/N ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23 |
ISSN: | 1007-6417 1863-236X |
DOI: | 10.1007/s11741-011-0765-2 |