A Survey on Regular Expression Matching for Deep Packet Inspection: Applications, Algorithms, and Hardware Platforms

Deep packet inspection (DPI) is widely used in content-aware network applications such as network intrusion detection systems, traffic billing, load balancing, and government surveillance. Pattern matching is a core and critical step in DPI, which checks the payload of each packet for known signatur...

Full description

Saved in:
Bibliographic Details
Published inIEEE Communications surveys and tutorials Vol. 18; no. 4; pp. 2991 - 3029
Main Authors Chengcheng Xu, Shuhui Chen, Jinshu Su, Yiu, S. M., Hui, Lucas C. K.
Format Journal Article
LanguageEnglish
Published IEEE 01.01.2016
Subjects
Online AccessGet full text
ISSN2373-745X
DOI10.1109/COMST.2016.2566669

Cover

More Information
Summary:Deep packet inspection (DPI) is widely used in content-aware network applications such as network intrusion detection systems, traffic billing, load balancing, and government surveillance. Pattern matching is a core and critical step in DPI, which checks the payload of each packet for known signatures (patterns) in order to identify packets with certain characteristics (e.g., malicious packets that carry viruses or worms). Regular expression is the major tool for signature description due to its powerful and flexible expressive ability. However, this flexibility also brings great challenges for efficient implementation in practice. Despite of hundreds to thousands of empirical proposals, wire-speed matching for large scale regular expressions still remains a big challenge. The gap between the matching throughput and the link speed is widening with the ever-increasing network link speed and pattern scale. This survey begins with a full-scale application background of DPI and technical background of regular expression matching in order to provide a global view and essential knowledge for readers. We then analyze the challenges in regular expression matching originated from the state explosion of finite state automaton used for regular expression matching. The nature of state explosion is analyzed in details, and the state-of-the-art solutions are grouped into categories of methods to relieve state expansion and methods to avoid state explosion, suggestions are also provided for building compact and efficient automata in different scenarios. Furthermore, proposals employing parallel platforms, including field-programmable gate array, GPU, general multi-processors, and ternary content addressable memory, to accelerate the matching process are introduced and thoroughly discussed. We also provide guidelines for efficient deployment for each of these platforms.
ISSN:2373-745X
DOI:10.1109/COMST.2016.2566669