Blocking optimized SIMD tree search on modern processors

Tree search is a widely used fundamental algorithm. Modern processors provide tremendous computing power by integrating multiple cores, each with a vector processing unit. This paper reviews some studies on exploiting single instruction multiple date (SIMD) capacity of processors to improve the perf...

Full description

Saved in:
Bibliographic Details
Published inJournal of Shanghai University Vol. 15; no. 5; pp. 437 - 444
Main Author 张倬 陆宇凡 沈文枫 徐炜民 郑衍衡
Format Journal Article
LanguageEnglish
Published Heidelberg Shanghai University Press 01.10.2011
School of Computer Engineering and Science, Shanghai University, Shanghai 200072, P.R.China
Subjects
Online AccessGet full text
ISSN1007-6417
1863-236X
DOI10.1007/s11741-011-0765-2

Cover

More Information
Summary:Tree search is a widely used fundamental algorithm. Modern processors provide tremendous computing power by integrating multiple cores, each with a vector processing unit. This paper reviews some studies on exploiting single instruction multiple date (SIMD) capacity of processors to improve the performance of tree search, and proposes several improvement methods on reported SIMD tree search algorithms. Based on blocking tree structure, blocking for memory alignment and dynamic blocking prefetch are proposed to optimize the overhead of memory access. Furthermore, as a way of non-linear loop unrolling, the search branch unwinding shows that the number of branches can exceed the data width of SIMD instructions in the SIMD search algorithm. The experiments suggest that blocking optimized SIMD tree search algorithm can achieve 1.6 times response speed faster than the un-optimized algorithm.
Bibliography:single instruction multiple date (SIMD), tree search, binary search, streaming SIMD extensions (SSE), Cell broadband engine (BE)
ZHANG Zhuo , LU Yu-fan , SHEN Wen-feng, XU Wei-min , ZHENG Yan-heng ( School of Computer Engineering and Science, Shanghai University, Shanghai 200072, P. R. China)
Tree search is a widely used fundamental algorithm. Modern processors provide tremendous computing power by integrating multiple cores, each with a vector processing unit. This paper reviews some studies on exploiting single instruction multiple date (SIMD) capacity of processors to improve the performance of tree search, and proposes several improvement methods on reported SIMD tree search algorithms. Based on blocking tree structure, blocking for memory alignment and dynamic blocking prefetch are proposed to optimize the overhead of memory access. Furthermore, as a way of non-linear loop unrolling, the search branch unwinding shows that the number of branches can exceed the data width of SIMD instructions in the SIMD search algorithm. The experiments suggest that blocking optimized SIMD tree search algorithm can achieve 1.6 times response speed faster than the un-optimized algorithm.
31-1735/N
ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 23
ISSN:1007-6417
1863-236X
DOI:10.1007/s11741-011-0765-2