Fast submatch extraction using obdds
This presentation is the property of its rightful owner.
Sponsored Links
1 / 29

Fast Submatch Extraction using OBDDs PowerPoint PPT Presentation


  • 98 Views
  • Uploaded on
  • Presentation posted in: General

Fast Submatch Extraction using OBDDs. Liu Yang 1 , Pratyusa Manadhata 2 , William Horne 2 , Prasad Rao 2 , Vinod Ganapathy 1 Rutgers University 1 HP Laboratories 2. Applications of Regular Expressions. Signatures. NIDS. Network traffic. Alerts.

Download Presentation

Fast Submatch Extraction using OBDDs

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Fast submatch extraction using obdds

Fast Submatch Extraction using OBDDs

Liu Yang1, Pratyusa Manadhata2, William Horne2,

Prasad Rao2, Vinod Ganapathy1

Rutgers University1

HP Laboratories2


Applications of regular expressions

Applications of Regular Expressions

Signatures

NIDS

Network traffic

Alerts

Network intrusion detection systems (NIDS) employ regular expressions to represent attack signatures.


Applications of regular expressions cont

Applications of Regular Expressions (cont.)

Web security compliance

Connectors (rule set)

SIEM

Email security compliance

Security information and event management (SIEM) systems employ regular expressions to normalize event logs generated by hardware connectors and software systems.


Submatch extraction

Submatch Extraction

Rule set

username=(.*), hostname=(.*)

username=Bob, hostname=Foo

Submatch extraction

$1 = Bob, $2 = Foo


Signature matching

Signature Matching

  • Non-deterministic finite automaton (NFAs)

    • Space efficient, time inefficient

  • Deterministic finite automaton (DFAs)

    • Time efficient, states blow-up

  • Recursive backtracking

    • Fast in general

    • Vulnerable to algorithmic complexity attacks


Motivation time space tradeoff

Motivation: Time/Space Tradeoff

NFA (non-deterministic finite automaton)

Backtracking

Time

Our approach

DFA (deterministic finite automaton)

Ideal

Space


Our contributions

Our Contributions

  • A novel way of annotating capturing groups, tagged-NFAs

  • Design of a novel technique on submatch extraction (called Submatch-OBDD)

    • Extending Thompson’s algorithm

    • Using Boolean functions to represent tagged-NFAs

    • Using ordered binary decision diagrams (OBDDs) to improve time efficiency

  • Evaluation and comparison with RE2 and PCRE

Note: RE2 is a hybrid approach, using a mix of DFA/NFA, while PCRE uses recursive backtracking.


Solution overview

Solution Overview

RegExps with capturing groups

Tagged-NFAs

Boolean Representations

OBDD representations


Nfa representation of regexps

NFA Representation of RegExps

E = a*aa

NFA of regexp “a*aa”

Transition table T(x,i,y)


Submatch tagging tagged nfas

Submatch Tagging: tagged NFAs

E = (a*)aa

Tag(E) = (a*)taa

1

/ t1

Tagged NFA of “(a*)aa” with submatch tagging t1

Extended transition table T(x,i,y,t) of the tagged NFA


Match test

Match Test

RegExp=(a*)aa; Input: aaaa

{t1}

{t1}

{t1}

{t1}

1

2

3

accept

a

a

a

a

Frontier

{1}

{1,2}

{1,2,3}

{1,2,3}

{1,2,3}


Submatch extraction1

Submatch Extraction

{t1}

{t1}

{t1}

{t1}

1

2

3

accept

a

a

a

a

$1=aa

Frontier

{1}

{1,2}

{1,2,3}

{1,2,3}

{1,2,3}

Any path from an accept state to a start state generates a valid assignment of submatches.


Complexity of tagged nfas

Complexity of Tagged NFAs

Match test:

Submatch extraction:

n – size of tagged NFA

l – length of input string

Can we make the operations faster?


Submatch obdd

Submatch-OBDD

  • Representing tagged NFAs using Boolean functions

    • Updating frontiers in one-step using a single Boolean formula

  • Using OBDDs to manipulate Boolean functions


Transitions as boolean functions

Transitions as Boolean Functions

RegExp: (a*)aa

(1 Λ a Λ 1 Λ t1)

V (1 Λ a Λ 2 Λ{})

V (2 Λ a Λ 3 Λ{})

T(x,i,y,t) =


Match test using boolean functions

Match Test using Boolean Functions

Transition table

Next states

Start states

(1ΛaΛ 1 Λt1)

V (1ΛaΛ 2 Λ{})

aaaa

{1} Λ a Λ T(x,i,y,t)

Input symbol

Intermediate transitions

aaaa

(1ΛaΛ 1 Λ t1)

V (1ΛaΛ 2 Λ{})

V (2ΛaΛ 3 Λ{})

{1,2} Λ a Λ T(x,i,y,t)

Current states

aaaa

(1ΛaΛ 1 Λt1)

V (1ΛaΛ 2 Λ{})

V (2ΛaΛ 3 Λ{})

{1,2,3} Λ a Λ T(x,i,y,t)

Accept


Submatch extraction using boolean functions

Submatch Extraction using Boolean Functions

The last input symbol

Start from the last symbol, going backwards

No output submatch tag

(1ΛaΛ1Λt1)

V (1ΛaΛ2Λ{})

V (2ΛaΛ3Λ{})

aΛ3 Λ

2ΛaΛ3Λ{}

aaaa

Intermediate transitions [4]

Previous state of 3

Accept state

Rename previous state as current state and continue

No output submatch tag

(1ΛaΛ1Λt1)

V (1ΛaΛ2Λ{})

V (2ΛaΛ3Λ{})

aΛ2Λ

1ΛaΛ2Λ{}

aaaa

Previous state of 2

Intermediate transitions [3]


Submatch extraction using boolean functions1

Submatch Extraction using Boolean Functions

Output submatch tag

(1ΛaΛ1Λt1)

V (1ΛaΛ2Λ{})

V (2ΛaΛ3Λ{})

aΛ1Λ

1ΛaΛ1Λ t1

aaaa

Intermediate transitions [2]

Previous state of 1

Output submatch tag

(1ΛaΛ1Λt1)

V (1ΛaΛ2Λ{})

aΛ1Λ

1ΛaΛ1Λ t1

aaaa

Intermediate transitions [1]

Previous state of 1

aaaa

$1=aa

t1

t1


More formal match test

More Formal: Match Test

Finding new frontiers after processing an input symbol:

Next frontiers =

Checking acceptance:


More formal submatch extraction

More Formal: Submatch Extraction

A back traversal approach: starting from the last input symbol.

Submatch extraction: the last consecutive sequence of characters that are assigned with ti


Submatch obdd1

Submatch-OBDD

  • Representation of tagged NFAs, match test, and submatch extraction using OBDDs

  • OBDD representations for

    • Transitions with submatch tags

    • Intermediate transitions

    • Submatch tags

    • Set of start states

    • Set of accept states

    • Set of frontiers

    • Input symbols


Implementation

Implementation

Toolchain in C++, interfacing with the CUDD*

Input strings / network traffic

Tagged NFAs

RE2TNFA

TNFA2OBDD

PATTERNMATCH

RegExps

OBDDs

No match

Matched at reg#

Submatches $1= …, $2 = …

*CUDD is a package for manipulation of Binary Decision Diagrams


Feasibility study

Feasibility Study

  • Data sets

    • Snort-2009

      • RegExps: 115 regexps with capturing groups from HTTP rules

      • Traces

        • 1.2GB department network traffic (average packet size 126 bytes)

        • 1.3GB Twitter traffic (average packet size 1202 bytes)

        • 1MB synthetic trace (average string length 311 bytes)

    • Snort-2012

      • RegExps: 403 regexps with capturing groups from HTTP rules

      • Traces

        • 1.2GB department network traffic (average packet size 126 bytes)

        • 1.3GB Twitter traffic (average packet size 1202 bytes)

        • 1MB synthetic trace (average string length 689 bytes)

    • Firewall-504

      • RegExps: 504 patterns from a commercial firewall F

      • Trace: 87MB of firewall logs (average line size 87 bytes)


Experimental setup

Experimental Setup

  • Platform: Intel Core2 Duo E7500, Linux-2.6.3, 2GB RAM

  • Two configurations on pattern matching

    • Conf. S

      • patterns compiled individually

      • Compiled pattern matched sequentially against input traces

    • Conf.C

      • patterns combined with UNION and compiled

      • combined pattern matched against input traces


Performance

Performance

Execution time (cycles/byte) and memory consumption (MB) of RE2, PCRE, and Submatch-OBDD for the Snort-2009 data set


Performance1

Performance

Execution time (cycles/byte) and memory consumption (MB) of RE2, PCRE, and Submatch-OBDD for the Snort-2012 data set


Performance2

Performance

Execution time (cycles/byte) and memory consumption (MB) of RE2, PCRE, and Submatch-OBDD for the Firewall-504 data set


Related work

Related Work

  • NFA-OBDD [Yang et al., RAID’10, Chasaki and Wolf, ANCS’10]

  • RE2 [Cox, code.google.com/p/re2]

  • PCRE [www.pcre.org]

  • TNFA [Laurikari et al., SPIRE’00]

  • MDFA [Yu et al., ANCS’06]

  • Hybrid FA [Becchi and Crowley, CoNEXT’07]

  • XFA [Smith et al., Oakland’08]

  • More – see paper for details


Conclusion

Conclusion

  • A novel way of annotating capturing groups

  • Submatch-OBDD: a novel technique on submatch extraction using OBDDs

  • Feasibility study

    • Submatch-OBDD achieves ideal performance when patterns are combined

    • Faster than RE2 and PCRE when patterns are combined


  • Login