Memory Efficient Regular Expression Search Using State Merging

## PowerPoint Slideshow about ' Memory Efficient Regular Expression Search Using State Merging' - urania

### Memory Efficient Regular Expression Search Using State Merging

Safe packets

Safe pay1

Safe pay2

Incoming packets

FTP.OPEN.*

www.spyware

Host=.*HTTP

Hosxyz

blaBLAb

Malicious packets

xHost=

FTP.OPEN

Context- Regular expression matching is a critical operation in networking
- Intrusion detection
- Context based billing
- Peer-to-peer traffic detection and prioritization
- Application level filtering

- Challenge: perform regular expression matching at line rate
- Processing time
- Memory requirement (occupancy and bandwidth)

Background

- Two algorithmic solutions
- Non deterministic finite automata (NFAs)
- High time complexity
- Compact representation
- Deterministic finite automata (DFAs)
- Low time complexity
- Potentially exponential number of states w/ respect to NFAs
- Multiple implementation approaches
- FPGA [Sidhu FCCM 2001, Clark 2003]
- Software [Paxson 1998, Roesh 1999, Tuck 2004]
- Custom hardware [Kumar 2006]
- Problem: given a DFA, how to compactly represent it without violating the processing time bound

In this paper

- New method to compact a DFA called state merging
- Data structure to support state merging
- Algorithm to perform state merging
- Evaluation on real security rule-sets (from Bro and Snort NIDS)

Automata!

State Merging: the ideapattern: ((a[b-e][g-i])|(f[g-h]j))k+

0

a

a

1

a

3

1

[b-e]

a

[b-e]

a

.0

/0,1

a

a

a

[g-i]

f

a

f

f

a

/0

[g-i]

j

k

3_4

5

0

6

k

0

5

6

k

k

/1

j

a

a

a

f

f

[g-h]

.1

f

f

[g-h]

f

f

/0,1

2

4

f

2

f

f

f

Input text: acjk

- common outgoing transitions are compressed
- input labels keep 1-step history information
- outgoing conditional transition ensure functional equivalence

State Merging – selecting the states

DFA

pattern: ((a[b-e][g-i])|(f[g-h]j))k+

a

a

[b-e]

1

3

a

[g-i]

f

a

f

a

0

k

5

6

k

j

a

a

f

Space reduction graph

f

[g-h]

2

4

f

3

1

f

f

6

5

0

4

2

- bold edge has weight 3
- remaining edges have weight 2

6

0

3_4

5

2

State Merging – selecting the states (cont’d)a

DFA

1

a

[b-e].0

a/0,1

a

a

f

[g-i]/0

j/1

k

3_4

5

6

0

k

a

f

[g-h].1

f

f/0,1

f

2

Space reduction graph

f

State 1 and 2 have now one more target in common: merged state 3_4!

State merging can create new merging opportunities.

a.0

a.0/0,1,

f.1/0,1

a.0

0

a.0, f.1

1_2

3_4

5

6

[b-e].0/0

[g-i]/0

j/1

k

k

[g-h].1/1

f.1

f.1

f.1

State Merging – selecting the states (cont’d)DFA

- Key point: Labels can be reused
- State merging stops when label overhead exceeds potential saving
- Old and new DFA are functionally equivalent

0 … 0 1 1 1 1 1 1 0 0 0 0 0 ... 0

Bitmap

a

1

a

[b-e]

1

3

256 bits

Pointer Indirection

a

[g-i]

f

0

1

1

1

1

2

a

f

a

0

k

5

6

k

Pointer Indirection + Label

# 1 in

bitmap

0

0

0

0

0

0

0

1

1

1

1

2

j

a

a

f

f

[g-h]

2

4

f

# 1 in

bitmap

f

log2(distinct targets)

Transition Table

f

1

# distinct

targets

3

log2(distinct targets)+log2(labels)

2

potential

saving

through

state merging

32 bit

A data structure to support state mergingb

1

pattern: ((a[b-e][g-i])|(f[g-h]j))k+

1

- Bitmap:
- No replication of frequent transitions
- Pointer indirection:
- No pointer replication w/in a state
- Character-transition target decoupling

3

1 0

0 … 0 1 0 0 0 0 1 1 1 0 … 0

0

b,

0

1

1

1

1

0

0

0

0

0

0

1

0

1

1

1

0

0

1

1

1_2

3_4

Data structure after state merginga.0

a.0/0,1

f.1/0,1

a.0

a.0

Saving: combined transition table

Overhead: labels

a.0, f.1

[b-e].0/0

[g-i]/0

j/1

k

0

1_2

3_4

5

6

k

[g-h].1/1

f.1

f.1

f.1

Bitmap 0

Bitmap 1

1

1_2

Pointer Indirection + Label

Pointer Indirection + Label

Combined Transition Table

0

3_4

Summary

- Regular expression matching: critical operation in many networking applications
- Two classical solutions: NFAs and DFAs
- NFAs slow, DFAs fast but impractical
- In this paper, we present a new method to compact a DFA called state merging
- Data structure and fast algorithm to support state merging
- Evaluation on real security rule-sets (from Bro and Snort NIDS)
- 1000x reduction in number of transitions
- 20x reduction in number of states
- 25x memory reduction

Experimental evaluation

ck/0

S1,2

Sy

Sw

cn/1

Sz

