GPU Based N-Gram String Matching Algorithm with Score Table Approach for String Searching in Many Documents
String searching in documents has become a tedious task with the evolution of Big Data. Generation of large data sets demand for a high performance search algorithm in areas such as text mining, information retrieval and many others. The popularity of GPU’s for general purpose computing has been inc...
Saved in:
| Published in | Journal of the Institution of Engineers (India). Series B, Electrical Engineering, Electronics and telecommunication engineering, Computer engineering Vol. 98; no. 5; pp. 467 - 476 |
|---|---|
| Main Authors | , |
| Format | Journal Article |
| Language | English |
| Published |
New Delhi
Springer India
01.10.2017
Springer Nature B.V |
| Subjects | |
| Online Access | Get full text |
| ISSN | 2250-2106 2250-2114 |
| DOI | 10.1007/s40031-017-0295-3 |
Cover
| Summary: | String searching in documents has become a tedious task with the evolution of Big Data. Generation of large data sets demand for a high performance search algorithm in areas such as text mining, information retrieval and many others. The popularity of GPU’s for general purpose computing has been increasing for various applications. Therefore it is of great interest to exploit the thread feature of a GPU to provide a high performance search algorithm. This paper proposes an optimized new approach to N-gram model for string search in a number of lengthy documents and its GPU implementation. The algorithm exploits GPGPUs for searching strings in many documents employing character level N-gram matching with parallel Score Table approach and search using CUDA API. The new approach of Score table used for frequency storage of N-grams in a document, makes the search independent of the document’s length and allows faster access to the frequency values, thus decreasing the search complexity. The extensive thread feature in a GPU has been exploited to enable parallel pre-processing of trigrams in a document for Score Table creation and parallel search in huge number of documents, thus speeding up the whole search process even for a large pattern size. Experiments were carried out for many documents of varied length and search strings from the standard
Lorem Ipsum
text on NVIDIA’s GeForce GT 540M GPU with 96 cores. Results prove that the parallel approach for Score Table creation and searching gives a good speed up than the same approach executed serially. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 2250-2106 2250-2114 |
| DOI: | 10.1007/s40031-017-0295-3 |