Advanced Algorithms for Fast and Scalable Deep Packet Inspection

Advanced Algorithms for Fast and Scalable Deep Packet Inspection Sailesh Kumar Jonathan Turner John Williams

Why Regular Expressions Acceleration? • RegEx are now widely used • Network intrusion detection systems, NIDS • Layer 7 switches, load balancing • Firewalls, filtering, authentication and monitoring • Content-based traffic management and routing • RegEx matching is expensive • Space: Large amount of memory • Bandwidth: Requires 1+ state traversal per byte • RegEx is performance bottleneck • In enterprise switches from Cisco, etc • Many security appliances • Use DFA, 1+ GB memory, still sub-gigabit throughput • Need to accelerate RegEx!

Can we do better? • Well studied in compiler literature • What’s different in Networking? • Can we do better? • Construction time versus execution time (grep) • Traditionally, (construction + execution) time is the metric • In networking context, execution time is critical • Also, there may be thousands of patterns • DFAs are fast • But can have exponentially large number of states • Algorithms exist to minimize number of states • Still 1) low performance and 2) gigabytes of memory

a 2 c a d a a b a b c c 3 5 1 b b c d b d d 4 c d Delayed Input DFA (D2FA), SIGCOMM’06 • Many transitions • 256 transitions per state • 50+ distinct transitions per state (real world datasets) • Need 50+ words per state • Reduce number of transitions in a DFA Three rules a+, b+c, c*d+ Look at state pairs: there are many common transitions. How to remove them? 4 transitions per state

a a 2 2 c c a a d d a a a a b b a a b b c c c c 3 3 5 5 1 1 b b b b c c d d b b d d d d 4 4 c c d d Delayed Input DFA (D2FA), SIGCOMM’06 • Many transitions • 256 transitions per state • 50+ distinct transitions per state (real world datasets) • Need 50+ words per state • Reduce number of transitions in a DFA Alternative Representation Three rules a+, b+c, c*d+ 4 transitions per state Fewer transitions, less memory

a 2 2 c a d a a b a a b c c c c 3 3 5 5 1 1 b b b c d b d d d 4 4 c d D2FA Operation Heavy edges are called default transitions Take default transitions, whenever, a labeled transition is missing DFA D2FA

D2FA versus DFA • D2FAs are compact but requires multiple memory accesses • Up to 20x increased memory accesses • Not desirable in off-chip architecture • Can D2FAs match the performance of DFAs • YES!!!! • Content Addressed D2FAs (CD2FA) • CD2FAs require only one memory access per byte • Matches the performance of a DFA in cacheless system • Systems with data cache, CD2FA are 2-3x faster • CD2FAs are 10x compact than DFAs

R R all U c cd,R d V a ab,cd,R b Introduction to CD2FA • How to avoid multiple memory accesses of D2FAs? • Avoid lookup to decide if default path needs to be taken • Avoid default path traversal • Solution: Assign labels to each state, labels contain: • Characters for which it has labeled transitions • Information about all of its default states • Characters for which its default states have labeled transitions find node Rat location R Content Labels find node U athash(c,d,R) find node V athash(a,b,hash(c,d,R))

Introduction to CD2FA R R all all Z U c l lm,Z cd,R Y d m pq,lm,Z V a P ab,cd,R X b q hash(p,q,hash(l,m,Z)) hash(c,d,R) a d Input char = hash(a,b,hash(c,d,R)) Current state: V (label = ab,cd,R) → X (label = pq,lm,Z)

Construction of CD2FA • We seek to keep the content labels small • Twin Objectives: • Ensure that states have few labeled transitions • Ensure that default paths are as small as possible • D2FA construction heuristic based upon maximum weight spanning tree creates long default paths • Limit default paths => less space efficient D2FAs • Proposed new heuristic called CRO to construct D2FAs • Runs in 3 phases: Construction, Reduction and Optimization • Default path bound = 2 edges => CRO algorithm constructs upto 10x space efficient D2FAs • CD2FAs are constructed from these D2FAs

Memory Mapping in CD2FA R Z R all all U Y c l lm,R cd,R d m pq,lm,R V X a P ab,cd,R b q WE HAVE ASSUMED THAT HASHING IS COLLISION FREE hash(p,q,hash(l,m,Z)) hash(c,d,R)) hash(a,b,hash(c,d,R)) COLLISION

Collision-free Memory Mapping a Four states hash(abc, …) b a b c , …. c 4 memory locations p hash(pqr, …) q p q r , …. r l hash(def, …) hash(mln, …) WE NEED SYSTEMATIC APPRAOCH n , …. l m m n hash(lmn, …) d hash(edf, …) d e f , …. e f

Bipartite Graph Matching • Bipartite Graph • Left nodes are state content labels • Right nodes are memory locations • Map state labels to unique memory locations • An edge for every choice of content label • Perfect matching problem • With n left and right nodes • Need O(logn) random edges • n = 1M implies, we need ~20 edges per node • If we provide slight memory over-provisioning • We can uniquely map state labels with much fewer edges • In our experiments, we found perfect matching without memory over-provisioning

Memory Reduction Results

Throughput Results 3x Faster 4KB cache

Conclusion • We have proposed CD2FAs • Matches/surpasses a DFA in throughput • 10x less memory than table compressed DFA • Novel randomized memory mapping algorithm based upon maximum matching in bipartite graph • Zero space overhead • Zero bandwidth overhead • Thank you and Questions???

Advanced Algorithms for Fast and Scalable Deep Packet Inspection

Advanced Algorithms for Fast and Scalable Deep Packet Inspection

Presentation Transcript

Deep Packet Inspection Which Implementation Platform?

Deep packet inspection, technical configurations and privacy

Network Forensics Deep Packet Inspection

StriD 2 FA: Scalable Regular Expression Matching for Deep Packet Inspection

Cache-Based Scalable Deep Packet Inspection with Predictive Automaton

Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection

Deflating the Big Bang: Fast and Scalable Deep Packet Inspection with Extended Finite Automata

Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection

StriD2FA Scalable Regular Expression Matching for Deep Packet Inspection

Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection

Deep Packet Inspection with Regular Expression Matching

Fast Deep Packet Inspection with a Dual Finite Automata

Deep Packet Inspection Market Segment to 2020

Algorithms to Accelerate Multiple Regular Expressions Matching for Deep Packet Inspection

Packet Scheduling for Deep Packet Inspection on Multi-Core Architectures

Algorithms for Advanced Packet Classification with TCAMs

Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection

Variable-Stride Multi-Pattern Matching For Scalable Deep Packet Inspection

Variable-Stride Multi-Pattern Matching For Scalable Deep Packet Inspection

A hybrid finite automaton for practical deep packet inspection

Deep Packet Inspection Using Parallel Bloom Filters