advanced algorithms for fast and scalable deep packet inspection l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Advanced Algorithms for Fast and Scalable Deep Packet Inspection PowerPoint Presentation
Download Presentation
Advanced Algorithms for Fast and Scalable Deep Packet Inspection

Loading in 2 Seconds...

play fullscreen
1 / 16

Advanced Algorithms for Fast and Scalable Deep Packet Inspection - PowerPoint PPT Presentation


  • 81 Views
  • Uploaded on

Advanced Algorithms for Fast and Scalable Deep Packet Inspection. Sailesh Kumar Jonathan Turner John Williams. Why Regular Expressions Acceleration?. RegEx are now widely used Network intrusion detection systems, NIDS Layer 7 switches, load balancing

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Advanced Algorithms for Fast and Scalable Deep Packet Inspection' - ostinmannual


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
advanced algorithms for fast and scalable deep packet inspection

Advanced Algorithms for Fast and Scalable Deep Packet Inspection

Sailesh Kumar

Jonathan Turner

John Williams

why regular expressions acceleration
Why Regular Expressions Acceleration?
  • RegEx are now widely used
    • Network intrusion detection systems, NIDS
    • Layer 7 switches, load balancing
    • Firewalls, filtering, authentication and monitoring
    • Content-based traffic management and routing
  • RegEx matching is expensive
    • Space: Large amount of memory
    • Bandwidth: Requires 1+ state traversal per byte
  • RegEx is performance bottleneck
    • In enterprise switches from Cisco, etc
    • Many security appliances
      • Use DFA, 1+ GB memory, still sub-gigabit throughput
    • Need to accelerate RegEx!
can we do better
Can we do better?
  • Well studied in compiler literature
    • What’s different in Networking?
    • Can we do better?
  • Construction time versus execution time (grep)
    • Traditionally, (construction + execution) time is the metric
    • In networking context, execution time is critical
    • Also, there may be thousands of patterns
  • DFAs are fast
    • But can have exponentially large number of states
    • Algorithms exist to minimize number of states
    • Still 1) low performance and 2) gigabytes of memory
delayed input dfa d 2 fa sigcomm 06

a

2

c

a

d

a

a

b

a

b

c

c

3

5

1

b

b

c

d

b

d

d

4

c

d

Delayed Input DFA (D2FA), SIGCOMM’06
  • Many transitions
    • 256 transitions per state
    • 50+ distinct transitions per state (real world datasets)
    • Need 50+ words per state
  • Reduce number of transitions in a DFA

Three rules

a+, b+c, c*d+

Look at state pairs: there are many common transitions.

How to remove them?

4 transitions

per state

delayed input dfa d 2 fa sigcomm 065

a

a

2

2

c

c

a

a

d

d

a

a

a

a

b

b

a

a

b

b

c

c

c

c

3

3

5

5

1

1

b

b

b

b

c

c

d

d

b

b

d

d

d

d

4

4

c

c

d

d

Delayed Input DFA (D2FA), SIGCOMM’06
  • Many transitions
    • 256 transitions per state
    • 50+ distinct transitions per state (real world datasets)
    • Need 50+ words per state
  • Reduce number of transitions in a DFA

Alternative

Representation

Three rules

a+, b+c, c*d+

4 transitions

per state

Fewer transitions,

less memory

d 2 fa operation

a

2

2

c

a

d

a

a

b

a

a

b

c

c

c

c

3

3

5

5

1

1

b

b

b

c

d

b

d

d

d

4

4

c

d

D2FA Operation

Heavy edges are called default transitions

Take default transitions, whenever, a labeled transition is missing

DFA

D2FA

d 2 fa versus dfa
D2FA versus DFA
  • D2FAs are compact but requires multiple memory accesses
    • Up to 20x increased memory accesses
    • Not desirable in off-chip architecture
  • Can D2FAs match the performance of DFAs
    • YES!!!!
    • Content Addressed D2FAs (CD2FA)
  • CD2FAs require only one memory access per byte
    • Matches the performance of a DFA in cacheless system
    • Systems with data cache, CD2FA are 2-3x faster
  • CD2FAs are 10x compact than DFAs
introduction to cd 2 fa

R

R

all

U

c

cd,R

d

V

a

ab,cd,R

b

Introduction to CD2FA
  • How to avoid multiple memory accesses of D2FAs?
    • Avoid lookup to decide if default path needs to be taken
    • Avoid default path traversal
  • Solution: Assign labels to each state, labels contain:
    • Characters for which it has labeled transitions
    • Information about all of its default states
    • Characters for which its default states have labeled transitions

find node Rat location R

Content

Labels

find node U athash(c,d,R)

find node V athash(a,b,hash(c,d,R))

introduction to cd 2 fa9
Introduction to CD2FA

R

R

all

all

Z

U

c

l

lm,Z

cd,R

Y

d

m

pq,lm,Z

V

a

P

ab,cd,R

X

b

q

hash(p,q,hash(l,m,Z))

hash(c,d,R)

a

d

Input char =

hash(a,b,hash(c,d,R))

Current state:

V (label = ab,cd,R)

→ X (label = pq,lm,Z)

construction of cd 2 fa
Construction of CD2FA
  • We seek to keep the content labels small
  • Twin Objectives:
    • Ensure that states have few labeled transitions
    • Ensure that default paths are as small as possible
  • D2FA construction heuristic based upon maximum weight spanning tree creates long default paths
    • Limit default paths => less space efficient D2FAs
  • Proposed new heuristic called CRO to construct D2FAs
    • Runs in 3 phases: Construction, Reduction and Optimization
    • Default path bound = 2 edges => CRO algorithm constructs upto 10x space efficient D2FAs
    • CD2FAs are constructed from these D2FAs
memory mapping in cd 2 fa
Memory Mapping in CD2FA

R

Z

R

all

all

U

Y

c

l

lm,R

cd,R

d

m

pq,lm,R

V

X

a

P

ab,cd,R

b

q

WE HAVE ASSUMED

THAT HASHING IS

COLLISION FREE

hash(p,q,hash(l,m,Z))

hash(c,d,R))

hash(a,b,hash(c,d,R))

COLLISION

collision free memory mapping
Collision-free Memory Mapping

a

Four states

hash(abc, …)

b

a

b

c

,

….

c

4 memory

locations

p

hash(pqr, …)

q

p

q

r

,

….

r

l

hash(def, …)

hash(mln, …)

WE NEED

SYSTEMATIC

APPRAOCH

n

,

….

l

m

m

n

hash(lmn, …)

d

hash(edf, …)

d

e

f

,

….

e

f

bipartite graph matching
Bipartite Graph Matching
  • Bipartite Graph
    • Left nodes are state content labels
    • Right nodes are memory locations
    • Map state labels to unique memory locations
    • An edge for every choice of content label
    • Perfect matching problem
  • With n left and right nodes
    • Need O(logn) random edges
    • n = 1M implies, we need ~20 edges per node
  • If we provide slight memory over-provisioning
    • We can uniquely map state labels with much fewer edges
  • In our experiments, we found perfect matching without memory over-provisioning
throughput results
Throughput Results

3x Faster 4KB cache

conclusion
Conclusion
  • We have proposed CD2FAs
  • Matches/surpasses a DFA in throughput
  • 10x less memory than table compressed DFA
  • Novel randomized memory mapping algorithm based upon maximum matching in bipartite graph
    • Zero space overhead
    • Zero bandwidth overhead
  • Thank you and Questions???