160 likes | 317 Views
Advanced Algorithms for Fast and Scalable Deep Packet Inspection. Sailesh Kumar Jonathan Turner John Williams. Why Regular Expressions Acceleration?. RegEx are now widely used Network intrusion detection systems, NIDS Layer 7 switches, load balancing
E N D
Advanced Algorithms for Fast and Scalable Deep Packet Inspection Sailesh Kumar Jonathan Turner John Williams
Why Regular Expressions Acceleration? • RegEx are now widely used • Network intrusion detection systems, NIDS • Layer 7 switches, load balancing • Firewalls, filtering, authentication and monitoring • Content-based traffic management and routing • RegEx matching is expensive • Space: Large amount of memory • Bandwidth: Requires 1+ state traversal per byte • RegEx is performance bottleneck • In enterprise switches from Cisco, etc • Many security appliances • Use DFA, 1+ GB memory, still sub-gigabit throughput • Need to accelerate RegEx!
Can we do better? • Well studied in compiler literature • What’s different in Networking? • Can we do better? • Construction time versus execution time (grep) • Traditionally, (construction + execution) time is the metric • In networking context, execution time is critical • Also, there may be thousands of patterns • DFAs are fast • But can have exponentially large number of states • Algorithms exist to minimize number of states • Still 1) low performance and 2) gigabytes of memory
a 2 c a d a a b a b c c 3 5 1 b b c d b d d 4 c d Delayed Input DFA (D2FA), SIGCOMM’06 • Many transitions • 256 transitions per state • 50+ distinct transitions per state (real world datasets) • Need 50+ words per state • Reduce number of transitions in a DFA Three rules a+, b+c, c*d+ Look at state pairs: there are many common transitions. How to remove them? 4 transitions per state
a a 2 2 c c a a d d a a a a b b a a b b c c c c 3 3 5 5 1 1 b b b b c c d d b b d d d d 4 4 c c d d Delayed Input DFA (D2FA), SIGCOMM’06 • Many transitions • 256 transitions per state • 50+ distinct transitions per state (real world datasets) • Need 50+ words per state • Reduce number of transitions in a DFA Alternative Representation Three rules a+, b+c, c*d+ 4 transitions per state Fewer transitions, less memory
a 2 2 c a d a a b a a b c c c c 3 3 5 5 1 1 b b b c d b d d d 4 4 c d D2FA Operation Heavy edges are called default transitions Take default transitions, whenever, a labeled transition is missing DFA D2FA
D2FA versus DFA • D2FAs are compact but requires multiple memory accesses • Up to 20x increased memory accesses • Not desirable in off-chip architecture • Can D2FAs match the performance of DFAs • YES!!!! • Content Addressed D2FAs (CD2FA) • CD2FAs require only one memory access per byte • Matches the performance of a DFA in cacheless system • Systems with data cache, CD2FA are 2-3x faster • CD2FAs are 10x compact than DFAs
R R all U c cd,R d V a ab,cd,R b Introduction to CD2FA • How to avoid multiple memory accesses of D2FAs? • Avoid lookup to decide if default path needs to be taken • Avoid default path traversal • Solution: Assign labels to each state, labels contain: • Characters for which it has labeled transitions • Information about all of its default states • Characters for which its default states have labeled transitions find node Rat location R Content Labels find node U athash(c,d,R) find node V athash(a,b,hash(c,d,R))
Introduction to CD2FA R R all all Z U c l lm,Z cd,R Y d m pq,lm,Z V a P ab,cd,R X b q hash(p,q,hash(l,m,Z)) hash(c,d,R) a d Input char = hash(a,b,hash(c,d,R)) Current state: V (label = ab,cd,R) → X (label = pq,lm,Z)
Construction of CD2FA • We seek to keep the content labels small • Twin Objectives: • Ensure that states have few labeled transitions • Ensure that default paths are as small as possible • D2FA construction heuristic based upon maximum weight spanning tree creates long default paths • Limit default paths => less space efficient D2FAs • Proposed new heuristic called CRO to construct D2FAs • Runs in 3 phases: Construction, Reduction and Optimization • Default path bound = 2 edges => CRO algorithm constructs upto 10x space efficient D2FAs • CD2FAs are constructed from these D2FAs
Memory Mapping in CD2FA R Z R all all U Y c l lm,R cd,R d m pq,lm,R V X a P ab,cd,R b q WE HAVE ASSUMED THAT HASHING IS COLLISION FREE hash(p,q,hash(l,m,Z)) hash(c,d,R)) hash(a,b,hash(c,d,R)) COLLISION
Collision-free Memory Mapping a Four states hash(abc, …) b a b c , …. c 4 memory locations p hash(pqr, …) q p q r , …. r l hash(def, …) hash(mln, …) WE NEED SYSTEMATIC APPRAOCH n , …. l m m n hash(lmn, …) d hash(edf, …) d e f , …. e f
Bipartite Graph Matching • Bipartite Graph • Left nodes are state content labels • Right nodes are memory locations • Map state labels to unique memory locations • An edge for every choice of content label • Perfect matching problem • With n left and right nodes • Need O(logn) random edges • n = 1M implies, we need ~20 edges per node • If we provide slight memory over-provisioning • We can uniquely map state labels with much fewer edges • In our experiments, we found perfect matching without memory over-provisioning
Throughput Results 3x Faster 4KB cache
Conclusion • We have proposed CD2FAs • Matches/surpasses a DFA in throughput • 10x less memory than table compressed DFA • Novel randomized memory mapping algorithm based upon maximum matching in bipartite graph • Zero space overhead • Zero bandwidth overhead • Thank you and Questions???