130 likes | 140 Views
A Time- and Memory- Efficient String Matching Algorithm for Intrusion Detection Systems. Author : Tzu-Fang Sheu ,Nen-Fu Huang and Hsiao-Ping Lee Publisher : IEEE Globecom , 2006 Presenter : Tsung-Lin Hsieh Date : 2012/05/16. Outline . Introduction Related Work
E N D
A Time- and Memory- Efficient String Matching Algorithm for Intrusion Detection Systems Author: Tzu-Fang Sheu ,Nen-Fu Huang and Hsiao-Ping Lee Publisher: IEEE Globecom , 2006 Presenter: Tsung-Lin Hsieh Date: 2012/05/16
Outline • Introduction • Related Work • A Cost-Effective String Matching Algorithm • Results
Introduction • The Aho-Corasick algorithm (AC) had the best worst-case computational time complexity( compared to other algorithms [1] ~ [7]). But is also needs a lot of memory. • Tuck et al modified the AC with a compressed data structure, which reduced the memory size, but also increased the processing time[10].
Introduction • In this paper, we will propose a practical multiple-pattern matching algorithm that has better worst-case performance as well as smaller required memory. • The proposed novel scheme is based on the property of Chinese Remainder Theorem and contributes modifications to the AC.
Related Work • Aho-Corasick Algorithm each node – 1028 bytes -> too much memory
Related Work • Aho-Corasick Algorithm with Bitmap • It can reduce the memory to only 44 bytes/per state • But it needs doing “popcount” to calculate the offset of the starting pointer -> too much time
Proposed Method • Assume that we can find a simple function so that the input symbols {h,s,i} can be mapped to {0,1,2} • Assume that there is a magic number X
Proposed Method • For example, assume we have three valid symbols {h, s, i} that have paths to the child state as shown in Figure 3. Assign three prime numbers {2, 3, 5} for {h, s, i} respectively. • We want to find X % 2 = 0 ,X % 3 = 1 ,X % 5 = 2. • According to CRT -> X = 22 • Using “s” for test ,prime represent “s” is 3. 22 % 3 = 1 -> when input is “s” ,visit child 1.
Proposed Method 52 bytes • The Multiple-Pattern Matching with a Magic Number • Input “ish”
Results • ACO-100 means penalty of external access is 100 cycles
Results • As the required time and memory are usually trade-off, to compare the overall costs of these three algorithms, we define an evaluation function C: C = CM × CT. (CM: total memory, CT: average time)