Pattern based dfa for memory efficient multiple regular expression matching
This presentation is the property of its rightful owner.
Sponsored Links
1 / 20

Pattern-Based DFA for Memory-Efficient Multiple Regular Expression Matching PowerPoint PPT Presentation


  • 49 Views
  • Uploaded on
  • Presentation posted in: General

Pattern-Based DFA for Memory-Efficient Multiple Regular Expression Matching. Presenter: Junchen Jiang (Tsinghua University) Yang Xu (Polytechnic Institute of NYU) Tian Pan (Tsinghua University) Bin Liu (Tsinghua University). Email: [email protected] Outline. Background Regular expression

Download Presentation

Pattern-Based DFA for Memory-Efficient Multiple Regular Expression Matching

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Pattern based dfa for memory efficient multiple regular expression matching

Pattern-Based DFA for Memory-Efficient Multiple Regular Expression Matching

Presenter: Junchen Jiang (Tsinghua University)

Yang Xu (Polytechnic Institute of NYU)

Tian Pan (Tsinghua University)

Bin Liu (Tsinghua University)

Email: [email protected]


Outline

Outline

  • Background

    • Regular expression

    • DFA space explosion

  • Problem statement & Idea of pattern grouping

  • Pattern-Based DFA

  • Grouping algorithms

  • Results

  • Summary


Background cont

Background (cont.)

  • RegEx (pattern) matching is now widely used

    • Network Intrusion Detection Systems (SNORT)

    • L7-filter: protocol identification

    • Example: ^220[\x09-\x0d -~]*ftp

  • Common Technique

    • Deterministic Finite Automaton (DFA)

  • Challenges

    • High memory requirement

    • Low processing speed


Background cont1

Background (cont.)

  • Space Problem – DFA state explosion

    • Exponential worst-case space complexity

  • Solution – Pattern Grouping

    • Example

DFA

DFA

P3

P1

Two smaller DFAs

Fast memories

One big DFA

Slow memory

P4

P2

DFA

P5

After partition patterns into two groups


Outline1

Outline

  • Background

  • Problem Statement & Idea

  • Pattern-Based DFA & Pattern-Based Structure

  • Grouping Algorithms

  • Results

  • Summary


Problem statement idea cont

Problem Statement & Idea (cont.)

  • Minimize group number (speed) while greatly reduce DFA size (space)

  • Regex Set A

  • For General purpose processor architecture

    • Sequentially process all groups stored in one shared memory

  • For Multi-parallel processor architecture

    • Parallel processor for one group stored in individual memory

  • Challenge

    • Quantify the influence of each pattern!


Problem statement idea cont1

Problem Statement & Idea (cont.)

  • Traditional Approach – Group patterns with little interactions together.

  • Pattern p and q have interaction iff DFA of p and q has a size larger than the total size of DFA of p and the one of q.

  • In our evaluation, only 23.6% pattern pairs in L7-filter and about only 5% pattern pairs have no interaction!

  • Interaction between patterns is not an accurate measurement for grouping patterns!

  • Our contribution

    • Add new specification to DFA structure by which we can quantify the influence of each pattern in the final DFA.

    • Based on new DFA structure, give more refined grouping algorithms


  • Problem statement idea cont2

    Problem Statement & Idea (cont.)

    • Why traditional DFA insufficient ?

      • Observation: No information of individual pattern is preserved in the resulting DFA (renumbered or not)

    • Pattern-based DFA (P-DFA)

      • Objective: Store information of each pattern in the states


    Outline2

    Outline

    • Background

    • Problem Statement & Idea

    • Pattern-Based DFA & Pattern-Based Structure

    • Grouping Algorithms

    • Results

    • Summary


    Pattern based dfa p dfa cont

    Pattern-Based DFA (P-DFA) (cont.)

    • Construction

    Traditional DFA

    P-DFA

    P1

    P2

    P3

    P1

    P2

    P3

    NFA

    NFA

    NFA

    NFA

    DFA

    DFA

    DFA

    Equivalent

    DFA

    P-DFA


    Pattern based dfa p dfa cont1

    Pattern-Based DFA (P-DFA) (cont.)

    • Each state in P-DFA contains some sub-states, each of which is derived from one RegEx pattern.

      • Example: state 0,3,6 (sub-state 0: P1, 3: P2, 6: P3)

      • Stored in Pattern-Based Structure (PBS)

    1,3,8

    ^a

    ^ax

    b

    ^b

    c

    DFA of P1

    0,3,6

    1,3,7

    2,3,6

    a

    b

    0

    1

    2

    a

    b

    x

    x

    b

    y

    ^x

    ^y

    DFA of P2

    0,4,6

    1,4,6

    2,4,6

    b

    4

    5

    3

    x

    y

    y

    y

    b

    a

    ^ac

    a

    0,5,6

    1,5,6

    1,4,8

    a

    DFA of P3

    P-DFA of P1, P2, P3

    7

    8

    6

    y

    c

    a

    y

    1,4,7

    c

    ^ac


    Pattern based dfa p dfa cont2

    Pattern-Based DFA (P-DFA) (cont.)

    • Add pattern to P-DFA is trivial

    • Remove one pattern

      • remove sub-states + merge states

    • We can predict the size of P-DFA when any pattern is removed.

    1,3,8

    ^ax

    c

    0,3,6

    1,3,7

    2,3,6

    a

    a

    a

    b

    c

    3,7

    3,8

    c

    x

    x

    b

    ^ax

    4,8

    a

    a

    0,4,6

    1,4,6

    2,4,6

    b

    4,7

    3,6

    a

    ^ac

    a

    y

    y

    y

    b

    x

    a

    4,6

    5,6

    y

    Remove P1:

    Remove all red

    numbers and

    merge identical

    states

    0,5,6

    1,5,6

    1,4,8

    y

    ^ay

    y

    y

    1,4,7

    c

    P-DFA of P2, P3

    P-DFA of P1, P2, P3


    Outline3

    Outline

    • Background

    • Problem Statement & Idea

    • Pattern-Based DFA

    • Grouping Algorithms

    • Results

    • Summary


    Grouping algorithms

    Grouping Algorithms

    • General Scheme of pattern grouping using P-DFA.

    • Core idea: Get a P-DFA of all patterns first, then greedily subtract the pattern that maximizes the decrease of the size of P-DFA.

    Greedy pattern grouping algorithm

    Hardware Implementation

    (Matching)

    DFA

    RegEx Pattern #1

    P-DFA #1

    DFA

    PBS

    Software Operation

    (Combine, Delete)

    PBS

    RegEx Pattern #k

    DFA

    P-DFA #t

    PBS

    P-DFA


    Grouping algorithms1

    Grouping Algorithms

    • General Processor Architecture (Group1 )

      • Generate the complete P-DFA

      • Repeat: split the current largest group in size into two small groups

      • Until the sum of all groups’ size is smaller than the given limit L.

    • Multi-parallel processor architecture (Group2)

      • For any group

        • If the size of its P-DFA is larger than the limit then

          • Extracts a pattern from the group so that the size of P-DFA is more closer to the limit L


    Outline4

    Outline

    • Background

    • Problem Statement & Idea

    • Pattern-Based DFA

    • Grouping Algorithms

    • Results

    • Summary


    Experimental result cont

    Experimental Result (cont.)

    • Evaluation database: randomly select 300 RegEx patterns from Snort’s web pcre ruleset

    • General processor architecture


    Experimental result cont1

    Experimental Result (cont.)

    • Multi-parallel processor architecture


    Summary

    Summary

    • RegEx pattern matching is challengeable

      • Elaborately grouping RegEx patterns to ease memory inflation

    • We present P-DFA, a new method to construct DFA

      • Quantify the influence of each pattern

      • Store information of each pattern in the state

    • Experiments show that our approach reduces almost half the number of groups in comparison with the traditional method.


    Pattern based dfa for memory efficient multiple regular expression matching

    Questions?

    Email: [email protected]


  • Login