pattern based dfa for memory efficient multiple regular expression matching
Download
Skip this Video
Download Presentation
Pattern-Based DFA for Memory-Efficient Multiple Regular Expression Matching

Loading in 2 Seconds...

play fullscreen
1 / 20

Pattern-Based DFA for Memory-Efficient Multiple Regular Expression Matching - PowerPoint PPT Presentation


  • 82 Views
  • Uploaded on

Pattern-Based DFA for Memory-Efficient Multiple Regular Expression Matching. Presenter: Junchen Jiang (Tsinghua University) Yang Xu (Polytechnic Institute of NYU) Tian Pan (Tsinghua University) Bin Liu (Tsinghua University). Email: [email protected] Outline. Background Regular expression

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Pattern-Based DFA for Memory-Efficient Multiple Regular Expression Matching' - varden


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
pattern based dfa for memory efficient multiple regular expression matching

Pattern-Based DFA for Memory-Efficient Multiple Regular Expression Matching

Presenter: Junchen Jiang (Tsinghua University)

Yang Xu (Polytechnic Institute of NYU)

Tian Pan (Tsinghua University)

Bin Liu (Tsinghua University)

Email: [email protected]

outline
Outline
  • Background
    • Regular expression
    • DFA space explosion
  • Problem statement & Idea of pattern grouping
  • Pattern-Based DFA
  • Grouping algorithms
  • Results
  • Summary
background cont
Background (cont.)
  • RegEx (pattern) matching is now widely used
    • Network Intrusion Detection Systems (SNORT)
    • L7-filter: protocol identification
    • Example: ^220[\x09-\x0d -~]*ftp
  • Common Technique
    • Deterministic Finite Automaton (DFA)
  • Challenges
    • High memory requirement
    • Low processing speed
background cont1
Background (cont.)
  • Space Problem – DFA state explosion
    • Exponential worst-case space complexity
  • Solution – Pattern Grouping
    • Example

DFA

DFA

P3

P1

Two smaller DFAs

Fast memories

One big DFA

Slow memory

P4

P2

DFA

P5

After partition patterns into two groups

outline1
Outline
  • Background
  • Problem Statement & Idea
  • Pattern-Based DFA & Pattern-Based Structure
  • Grouping Algorithms
  • Results
  • Summary
problem statement idea cont
Problem Statement & Idea (cont.)
  • Minimize group number (speed) while greatly reduce DFA size (space)
  • Regex Set A
  • For General purpose processor architecture
    • Sequentially process all groups stored in one shared memory
  • For Multi-parallel processor architecture
    • Parallel processor for one group stored in individual memory
  • Challenge
    • Quantify the influence of each pattern!
problem statement idea cont1
Problem Statement & Idea (cont.)
    • Traditional Approach – Group patterns with little interactions together.
    • Pattern p and q have interaction iff DFA of p and q has a size larger than the total size of DFA of p and the one of q.
    • In our evaluation, only 23.6% pattern pairs in L7-filter and about only 5% pattern pairs have no interaction!
  • Interaction between patterns is not an accurate measurement for grouping patterns!
  • Our contribution
    • Add new specification to DFA structure by which we can quantify the influence of each pattern in the final DFA.
    • Based on new DFA structure, give more refined grouping algorithms
problem statement idea cont2
Problem Statement & Idea (cont.)
  • Why traditional DFA insufficient ?
    • Observation: No information of individual pattern is preserved in the resulting DFA (renumbered or not)
  • Pattern-based DFA (P-DFA)
    • Objective: Store information of each pattern in the states
outline2
Outline
  • Background
  • Problem Statement & Idea
  • Pattern-Based DFA & Pattern-Based Structure
  • Grouping Algorithms
  • Results
  • Summary
pattern based dfa p dfa cont
Pattern-Based DFA (P-DFA) (cont.)
  • Construction

Traditional DFA

P-DFA

P1

P2

P3

P1

P2

P3

NFA

NFA

NFA

NFA

DFA

DFA

DFA

Equivalent

DFA

P-DFA

pattern based dfa p dfa cont1
Pattern-Based DFA (P-DFA) (cont.)
  • Each state in P-DFA contains some sub-states, each of which is derived from one RegEx pattern.
    • Example: state 0,3,6 (sub-state 0: P1, 3: P2, 6: P3)
    • Stored in Pattern-Based Structure (PBS)

1,3,8

^a

^ax

b

^b

c

DFA of P1

0,3,6

1,3,7

2,3,6

a

b

0

1

2

a

b

x

x

b

y

^x

^y

DFA of P2

0,4,6

1,4,6

2,4,6

b

4

5

3

x

y

y

y

b

a

^ac

a

0,5,6

1,5,6

1,4,8

a

DFA of P3

P-DFA of P1, P2, P3

7

8

6

y

c

a

y

1,4,7

c

^ac

pattern based dfa p dfa cont2
Pattern-Based DFA (P-DFA) (cont.)
  • Add pattern to P-DFA is trivial
  • Remove one pattern
    • remove sub-states + merge states
  • We can predict the size of P-DFA when any pattern is removed.

1,3,8

^ax

c

0,3,6

1,3,7

2,3,6

a

a

a

b

c

3,7

3,8

c

x

x

b

^ax

4,8

a

a

0,4,6

1,4,6

2,4,6

b

4,7

3,6

a

^ac

a

y

y

y

b

x

a

4,6

5,6

y

Remove P1:

Remove all red

numbers and

merge identical

states

0,5,6

1,5,6

1,4,8

y

^ay

y

y

1,4,7

c

P-DFA of P2, P3

P-DFA of P1, P2, P3

outline3
Outline
  • Background
  • Problem Statement & Idea
  • Pattern-Based DFA
  • Grouping Algorithms
  • Results
  • Summary
grouping algorithms
Grouping Algorithms
  • General Scheme of pattern grouping using P-DFA.
  • Core idea: Get a P-DFA of all patterns first, then greedily subtract the pattern that maximizes the decrease of the size of P-DFA.

Greedy pattern grouping algorithm

Hardware Implementation

(Matching)

DFA

RegEx Pattern #1

P-DFA #1

DFA

PBS

Software Operation

(Combine, Delete)

PBS

RegEx Pattern #k

DFA

P-DFA #t

PBS

P-DFA

grouping algorithms1
Grouping Algorithms
  • General Processor Architecture (Group1 )
    • Generate the complete P-DFA
    • Repeat: split the current largest group in size into two small groups
    • Until the sum of all groups’ size is smaller than the given limit L.
  • Multi-parallel processor architecture (Group2)
    • For any group
      • If the size of its P-DFA is larger than the limit then
        • Extracts a pattern from the group so that the size of P-DFA is more closer to the limit L
outline4
Outline
  • Background
  • Problem Statement & Idea
  • Pattern-Based DFA
  • Grouping Algorithms
  • Results
  • Summary
experimental result cont
Experimental Result (cont.)
  • Evaluation database: randomly select 300 RegEx patterns from Snort’s web pcre ruleset
  • General processor architecture
experimental result cont1
Experimental Result (cont.)
  • Multi-parallel processor architecture
summary
Summary
  • RegEx pattern matching is challengeable
    • Elaborately grouping RegEx patterns to ease memory inflation
  • We present P-DFA, a new method to construct DFA
    • Quantify the influence of each pattern
    • Store information of each pattern in the state
  • Experiments show that our approach reduces almost half the number of groups in comparison with the traditional method.
ad