Loading in 2 Seconds...

Efficient Regular Expression Evaluation: Theory to Practice Michela Becchi and Patrick Crowley

Loading in 2 Seconds...

110 Views

Download Presentation
## Efficient Regular Expression Evaluation: Theory to Practice Michela Becchi and Patrick Crowley

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Efficient Regular Expression Evaluation: Theory to**PracticeMichela Becchi and Patrick Crowley ANCS’08**Motivation**• Size and complexity of rule-set increased in recent years • Snort, as of November 2007 • 8536 rules, 5549 Perl Compatible Regular Expressions • 99% with character ranges ([c1-ck],\s,\w…) • 16.3 % with dot-star terms (.*, [^c1..ck]* • 44 % with counting constraints (.{n.m}, [^c1..ck]{n,m}) • Several proposals to accelerate regular expression matching • FPGA • Memory centric architecture**Objectives**• Can we converge distinct algorithmic techniques into a single proposal also for large data-sets? • Can we apply techniques intended for memory centric architectures also on FPGAs? Provide tool to allow anybody to implement a high throughput DPI system on the architecture of choice**Target Architectures**Regex-Matching Engine Memory-centric architectures FPGA logic Generalpurpose processors Network processors FPGA / ASIC + memory available parallelism**Challenges**DFA Memory-centric architectures NFA Generalpurpose processors Network processors FPGA / ASIC + memory FPGA logic • Logic cell utilization • Clock frequency • Memory space • Memory bandwidth**D2FA: default transition compression**• Observations: • DFA state: set of |∑| next state pointers • Transition redundancy • Idea: • Differential state representation through use of non-consumingdefault transitions • In general: s3 a s3 a b s4 s1 b s4 s1 c s5 c s5 s3 a b s4 s2 c s6 c s2 s6 DEFAULT PATH ∑ c1 c2 c1 c4 c3**D2FA algorithms**• Problem: set default transitions so to • Maximize memory compression • Minimize memory bandwidth overhead • [Kumar et al, SIGCOMM’06] • Bound dpMAXon max default path length • O(dpMAX+1) memory accesses per input char • Better compression for higher dpMAX • [Becchi et al, ANCS’07] • Only backward-directed default transitions (skipping k levels) • Amortized memory bandwidth O((k+1/k)N) on N input chars • Depth-first traversal → at DFA creation Memory bandwidth = O((k+1/k)N) Time complexity =O(n2) Space complexity =O(n) Memory bandwidth = O((dpMAX+1)N) Time complexity = O(n2logn) Space complexity = O(n2) vs. Compression w/ k=1 ~ compression w/ dpMAX=∞**0**[a-z] [1-2] [A-Z] 3/1 3/1 [0-2] [a-zA-Z] 1 1 3 [0-9] [a-z] 0 4/2 4/2 [0-2] [a-zA-Z] 0 0 [a-z] 0 [a-z] 0 [1-2] [A-Z] [2-3] [0-9B-Z] 5/3 5/3 [B-Z] 2 2 2 A 1 1 A DFA alphabet reduction Effective for: Ignore-case regex Char-ranges Never used chars = + Alphabet translation table 8**1**aa ab 2 Multiple-stride DFAs • [Brodie et al, ISCA 2006] • Idea: • Process stride input chars at a time • Observations: • Mechanism used on small DFAs (1-2 regex) • No distinct accepting state handling DFA w/ stride 2 DFA 0**Multiple stride + alphabet reduction**• Stride s → Alphabet ∑s • ∑=ASCII alphabet ►| ∑2|=2562=65,536; | ∑4|=2564~4,294M • Effective alphabet much smaller • Char grouping: [a-cef]a, [b-f]b • Alphabet reduction may be necessary to make stride doubling feasible on large DFAs 2-DFA DFA**Multiple stride + default transitions**• Compression • Default transitions eliminate transition redundancy • In multiple stride DFAs • # of states does not substantially change • # of transitions per state increases exponentially ( stride) • Fraction distinct/total transitions decreases • Increased potential for compression! • Accepting state handling • Duplicated states have same outgoing transitions as original states but different depth • Default transition will remove all outgoing transitions from new accepting states DFA 2-DFA**Multiple stride + default transitions (cont’d)**• Problem: • For large ∑ and stride, uncompressed DFA may be unfeasible • Out of memory when generating a 2K node, stride 4 DFA on a Linux machine w/ 4GB memory • Solution • Perform default transition compression during DFA creation • Use [Becchi et al, ANCS 2006] compression algorithm • In the situation above, only 10% memory used**alphabet reduction**||=25-44 avg 1-2 labeled tx/state default transition compression Stride-2 transformation 96.3-98.5% transitions removed Stride-2 DFA Compressed DFA alphabet reduction ||=53-470 • Same memory bandwidth requirement • Initial size=40X-80X final size default transition compression Avg 3-5 labeled tx/state 97.9-99.5% transitions removed Compressed Stride-2 DFA Putting everything together… 1-22 regex 48-1,940 states DFA**NFA**• ab+cd • ab+ce • ab+c.*f • b[d-f]a • bdc**Multiple stride + alphabet reduction**• Stride doubling • Alphabet reduction: • Clustering-based algorithm as for DFA, but sets of target states are compared Avoid new state creation Keep multiple transitions on the same symbol separated**FPGA implementation**One-hot encoding [Sidhu & Prasanna] Quine-McCluskey like minimization scheme + logic reduction schemes (c1=b OR c1=B) AND NOT (c2=a OR c2=A)**FPGA Results – logic utilization**#s=7,864 ∑1=64 ∑2=2206 • Utilization: • 8-46% on XC5VLX50 device (7,400 slices) • XC5VLX330 device has 51,840 slices #s=2,086 ∑1=78 ∑2=1,969 #s=2,147 ∑1=68 ∑2=1640**ASIC – projected results**Regex partitioning into multiple DFAs Content addressing w/ 64 bit words: • 98% states compressed w/ stride 1 • 82% states compressed w/ stride 2 Throughput: SRAM@500 MHz • 2-4 Gbps for stride 1 • 4-8 Gbps for stride 2 Alternative representation: decoders in ASIC or instruction memory**Conclusion**• Algorithm: • Combination of default transition compression, alphabet reduction and stride multiplying on potentially large DFAs • Extension of alphabet reduction and stride multiplying to NFAs • FPGA Implementation: • Use of one-hot encoding w/ incremental improvement schemes • Logic minimization scheme for alphabet reduction & decoding • Additional aspects: • Multiple flow handling: FPGA vs. memory centric architectures • Design improvements tailored to specific architectures and data-sets: • Clustering into smaller NFAs and DFAs to allow smaller alphabets w/ larger strides**Thank you!**• Questions? http://regex.wustl.edu