1 / 21

Efficient Regular Expression Evaluation: Theory to Practice Michela Becchi and Patrick Crowley

Efficient Regular Expression Evaluation: Theory to Practice Michela Becchi and Patrick Crowley. ANCS’08. Motivation. Size and complexity of rule-set increased in recent years Snort, as of November 2007 8536 rules, 5549 Perl Compatible Regular Expressions

tracy
Download Presentation

Efficient Regular Expression Evaluation: Theory to Practice Michela Becchi and Patrick Crowley

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Efficient Regular Expression Evaluation: Theory to PracticeMichela Becchi and Patrick Crowley ANCS’08

  2. Motivation • Size and complexity of rule-set increased in recent years • Snort, as of November 2007 • 8536 rules, 5549 Perl Compatible Regular Expressions • 99% with character ranges ([c1-ck],\s,\w…) • 16.3 % with dot-star terms (.*, [^c1..ck]* • 44 % with counting constraints (.{n.m}, [^c1..ck]{n,m}) • Several proposals to accelerate regular expression matching • FPGA • Memory centric architecture

  3. Objectives • Can we converge distinct algorithmic techniques into a single proposal also for large data-sets? • Can we apply techniques intended for memory centric architectures also on FPGAs? Provide tool to allow anybody to implement a high throughput DPI system on the architecture of choice

  4. Target Architectures Regex-Matching Engine Memory-centric architectures FPGA logic Generalpurpose processors Network processors FPGA / ASIC + memory available parallelism

  5. Challenges DFA Memory-centric architectures NFA Generalpurpose processors Network processors FPGA / ASIC + memory FPGA logic • Logic cell utilization • Clock frequency • Memory space • Memory bandwidth

  6. D2FA: default transition compression • Observations: • DFA state: set of |∑| next state pointers • Transition redundancy • Idea: • Differential state representation through use of non-consumingdefault transitions • In general: s3 a s3 a b s4 s1 b s4 s1 c s5 c s5 s3 a b s4 s2 c s6 c s2 s6 DEFAULT PATH ∑ c1 c2 c1 c4 c3

  7. D2FA algorithms • Problem: set default transitions so to • Maximize memory compression • Minimize memory bandwidth overhead • [Kumar et al, SIGCOMM’06] • Bound dpMAXon max default path length • O(dpMAX+1) memory accesses per input char • Better compression for higher dpMAX • [Becchi et al, ANCS’07] • Only backward-directed default transitions (skipping k levels) • Amortized memory bandwidth O((k+1/k)N) on N input chars • Depth-first traversal → at DFA creation Memory bandwidth = O((k+1/k)N) Time complexity =O(n2) Space complexity =O(n) Memory bandwidth = O((dpMAX+1)N) Time complexity = O(n2logn) Space complexity = O(n2) vs. Compression w/ k=1 ~ compression w/ dpMAX=∞

  8. 0 [a-z] [1-2] [A-Z] 3/1 3/1 [0-2] [a-zA-Z] 1 1 3 [0-9] [a-z] 0 4/2 4/2 [0-2] [a-zA-Z] 0 0 [a-z] 0 [a-z] 0 [1-2] [A-Z] [2-3] [0-9B-Z] 5/3 5/3 [B-Z] 2 2 2 A 1 1 A DFA alphabet reduction Effective for: Ignore-case regex Char-ranges Never used chars = + Alphabet translation table 8

  9. 1 aa ab 2 Multiple-stride DFAs • [Brodie et al, ISCA 2006] • Idea: • Process stride input chars at a time • Observations: • Mechanism used on small DFAs (1-2 regex) • No distinct accepting state handling DFA w/ stride 2 DFA 0

  10. Multiple stride + alphabet reduction • Stride s → Alphabet ∑s • ∑=ASCII alphabet ►| ∑2|=2562=65,536; | ∑4|=2564~4,294M • Effective alphabet much smaller • Char grouping: [a-cef]a, [b-f]b • Alphabet reduction may be necessary to make stride doubling feasible on large DFAs 2-DFA DFA

  11. Multiple stride + default transitions • Compression • Default transitions eliminate transition redundancy • In multiple stride DFAs • # of states does not substantially change • # of transitions per state increases exponentially (  stride) • Fraction distinct/total transitions decreases • Increased potential for compression! • Accepting state handling • Duplicated states have same outgoing transitions as original states but different depth • Default transition will remove all outgoing transitions from new accepting states DFA 2-DFA

  12. Multiple stride + default transitions (cont’d) • Problem: • For large ∑ and stride, uncompressed DFA may be unfeasible • Out of memory when generating a 2K node, stride 4 DFA on a Linux machine w/ 4GB memory • Solution • Perform default transition compression during DFA creation • Use [Becchi et al, ANCS 2006] compression algorithm • In the situation above, only 10% memory used

  13. alphabet reduction ||=25-44 avg 1-2 labeled tx/state default transition compression Stride-2 transformation 96.3-98.5% transitions removed Stride-2 DFA Compressed DFA alphabet reduction ||=53-470 • Same memory bandwidth requirement • Initial size=40X-80X final size default transition compression Avg 3-5 labeled tx/state 97.9-99.5% transitions removed Compressed Stride-2 DFA Putting everything together… 1-22 regex 48-1,940 states DFA

  14. NFA • ab+cd • ab+ce • ab+c.*f • b[d-f]a • bdc

  15. Multiple stride + alphabet reduction • Stride doubling • Alphabet reduction: • Clustering-based algorithm as for DFA, but sets of target states are compared Avoid new state creation Keep multiple transitions on the same symbol separated

  16. FPGA implementation One-hot encoding [Sidhu & Prasanna] Quine-McCluskey like minimization scheme + logic reduction schemes (c1=b OR c1=B) AND NOT (c2=a OR c2=A)

  17. FPGA Results - throughput

  18. FPGA Results – logic utilization #s=7,864 ∑1=64 ∑2=2206 • Utilization: • 8-46% on XC5VLX50 device (7,400 slices) • XC5VLX330 device has 51,840 slices #s=2,086 ∑1=78 ∑2=1,969 #s=2,147 ∑1=68 ∑2=1640

  19. ASIC – projected results Regex partitioning into multiple DFAs Content addressing w/ 64 bit words: • 98% states compressed w/ stride 1 • 82% states compressed w/ stride 2 Throughput: SRAM@500 MHz • 2-4 Gbps for stride 1 • 4-8 Gbps for stride 2 Alternative representation: decoders in ASIC or instruction memory

  20. Conclusion • Algorithm: • Combination of default transition compression, alphabet reduction and stride multiplying on potentially large DFAs • Extension of alphabet reduction and stride multiplying to NFAs • FPGA Implementation: • Use of one-hot encoding w/ incremental improvement schemes • Logic minimization scheme for alphabet reduction & decoding • Additional aspects: • Multiple flow handling: FPGA vs. memory centric architectures • Design improvements tailored to specific architectures and data-sets: • Clustering into smaller NFAs and DFAs to allow smaller alphabets w/ larger strides

  21. Thank you! • Questions? http://regex.wustl.edu

More Related