String Matching in Hardware Using the FM-Index

String matching is a ubiquitous problem that arises in a wide range of applications in computing, e.g., packet routing, intrusion detection, web querying, and genome analysis. Due to its importance, dozens of algorithms and several data structures have been developed over the years. A recent breakth...

Full description

Saved in:
Bibliographic Details
Published in2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines pp. 218 - 225
Main Authors Fernandez, E, Najjar, W, Lonardi, S
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.05.2011
Subjects
Online AccessGet full text
ISBN9781612842776
1612842771
DOI10.1109/FCCM.2011.55

Cover

More Information
Summary:String matching is a ubiquitous problem that arises in a wide range of applications in computing, e.g., packet routing, intrusion detection, web querying, and genome analysis. Due to its importance, dozens of algorithms and several data structures have been developed over the years. A recent breakthrough in this field is the FM-index, a data structure that synergistically combines the Burrows-Wheeler transform and the suffix array. In software, the FM-index allows searching (exact and approximate) in times comparable to the fastest known indices for large texts (suffix trees and suffix arrays), but has the additional advantage of being more space-efficient than those approaches. In this paper, we describe the first FPGA-based hardware implementation of the FM-index for exact pattern matching. We report experimental results on the problem of mapping short DNA sequences to a reference genome. We show that the throughput of the FM-index is significantly higher than the naive (brute force) approach. Like the Bowtie software tool, the FM-index can abandon early the hardware matching. It outperforms Bowtie by two orders of magnitude.
ISBN:9781612842776
1612842771
DOI:10.1109/FCCM.2011.55