- 52 Views
- Uploaded on
- Presentation posted in: General

Pattern-Based DFA for Memory-Efficient Multiple Regular Expression Matching

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Pattern-Based DFA for Memory-Efficient Multiple Regular Expression Matching

Presenter: Junchen Jiang (Tsinghua University)

Yang Xu (Polytechnic Institute of NYU)

Tian Pan (Tsinghua University)

Bin Liu (Tsinghua University)

Email: [email protected]

- Background
- Regular expression
- DFA space explosion

- Problem statement & Idea of pattern grouping
- Pattern-Based DFA
- Grouping algorithms
- Results
- Summary

- RegEx (pattern) matching is now widely used
- Network Intrusion Detection Systems (SNORT)
- L7-filter: protocol identification
- Example: ^220[\x09-\x0d -~]*ftp

- Common Technique
- Deterministic Finite Automaton (DFA)

- Challenges
- High memory requirement
- Low processing speed

- Space Problem – DFA state explosion
- Exponential worst-case space complexity

- Solution – Pattern Grouping
- Example

DFA

DFA

P3

P1

Two smaller DFAs

Fast memories

One big DFA

Slow memory

P4

P2

DFA

P5

After partition patterns into two groups

- Background
- Problem Statement & Idea
- Pattern-Based DFA & Pattern-Based Structure
- Grouping Algorithms
- Results
- Summary

- Minimize group number (speed) while greatly reduce DFA size (space)
- Regex Set A
- For General purpose processor architecture
- Sequentially process all groups stored in one shared memory

- For Multi-parallel processor architecture
- Parallel processor for one group stored in individual memory

- Challenge
- Quantify the influence of each pattern！

- Traditional Approach – Group patterns with little interactions together.
- Pattern p and q have interaction iff DFA of p and q has a size larger than the total size of DFA of p and the one of q.
- In our evaluation, only 23.6% pattern pairs in L7-filter and about only 5% pattern pairs have no interaction!

- Add new specification to DFA structure by which we can quantify the influence of each pattern in the final DFA.
- Based on new DFA structure, give more refined grouping algorithms

- Why traditional DFA insufficient ?
- Observation: No information of individual pattern is preserved in the resulting DFA (renumbered or not)

- Pattern-based DFA (P-DFA)
- Objective: Store information of each pattern in the states

- Background
- Problem Statement & Idea
- Pattern-Based DFA & Pattern-Based Structure
- Grouping Algorithms
- Results
- Summary

- Construction

Traditional DFA

P-DFA

P1

P2

P3

P1

P2

P3

NFA

NFA

NFA

NFA

DFA

DFA

DFA

Equivalent

DFA

P-DFA

- Each state in P-DFA contains some sub-states, each of which is derived from one RegEx pattern.
- Example: state 0,3,6 (sub-state 0: P1, 3: P2, 6: P3)
- Stored in Pattern-Based Structure (PBS)

1,3,8

^a

^ax

b

^b

c

DFA of P1

0,3,6

1,3,7

2,3,6

a

b

0

1

2

a

b

x

x

b

y

^x

^y

DFA of P2

0,4,6

1,4,6

2,4,6

b

4

5

3

x

y

y

y

b

a

^ac

a

0,5,6

1,5,6

1,4,8

a

DFA of P3

P-DFA of P1, P2, P3

7

8

6

y

c

a

y

1,4,7

c

^ac

- Add pattern to P-DFA is trivial
- Remove one pattern
- remove sub-states + merge states

- We can predict the size of P-DFA when any pattern is removed.

1,3,8

^ax

c

0,3,6

1,3,7

2,3,6

a

a

a

b

c

3,7

3,8

c

x

x

b

^ax

4,8

a

a

0,4,6

1,4,6

2,4,6

b

4,7

3,6

a

^ac

a

y

y

y

b

x

a

4,6

5,6

y

Remove P1:

Remove all red

numbers and

merge identical

states

0,5,6

1,5,6

1,4,8

y

^ay

y

y

1,4,7

c

P-DFA of P2, P3

P-DFA of P1, P2, P3

- Background
- Problem Statement & Idea
- Pattern-Based DFA
- Grouping Algorithms
- Results
- Summary

- General Scheme of pattern grouping using P-DFA.
- Core idea: Get a P-DFA of all patterns first, then greedily subtract the pattern that maximizes the decrease of the size of P-DFA.

Greedy pattern grouping algorithm

Hardware Implementation

(Matching)

DFA

RegEx Pattern #1

P-DFA #1

DFA

PBS

Software Operation

(Combine, Delete)

…

…

PBS

RegEx Pattern #k

DFA

P-DFA #t

PBS

P-DFA

- General Processor Architecture (Group1 )
- Generate the complete P-DFA
- Repeat: split the current largest group in size into two small groups
- Until the sum of all groups’ size is smaller than the given limit L.

- Multi-parallel processor architecture (Group2)
- For any group
- If the size of its P-DFA is larger than the limit then
- Extracts a pattern from the group so that the size of P-DFA is more closer to the limit L

- If the size of its P-DFA is larger than the limit then

- For any group

- Background
- Problem Statement & Idea
- Pattern-Based DFA
- Grouping Algorithms
- Results
- Summary

- Evaluation database: randomly select 300 RegEx patterns from Snort’s web pcre ruleset
- General processor architecture

- Multi-parallel processor architecture

- RegEx pattern matching is challengeable
- Elaborately grouping RegEx patterns to ease memory inflation

- We present P-DFA, a new method to construct DFA
- Quantify the influence of each pattern
- Store information of each pattern in the state

- Experiments show that our approach reduces almost half the number of groups in comparison with the traditional method.

Questions?

Email: [email protected]