Efficient Regular Expression Evaluation: Theory to Practice Michela Becchi and Patrick Crowley

Efficient Regular Expression Evaluation: Theory to PracticeMichela Becchi and Patrick Crowley ANCS’08

Motivation • Size and complexity of rule-set increased in recent years • Snort, as of November 2007 • 8536 rules, 5549 Perl Compatible Regular Expressions • 99% with character ranges ([c1-ck],\s,\w…) • 16.3 % with dot-star terms (.*, [^c1..ck]* • 44 % with counting constraints (.{n.m}, [^c1..ck]{n,m}) • Several proposals to accelerate regular expression matching • FPGA • Memory centric architecture

Objectives • Can we converge distinct algorithmic techniques into a single proposal also for large data-sets? • Can we apply techniques intended for memory centric architectures also on FPGAs? Provide tool to allow anybody to implement a high throughput DPI system on the architecture of choice

Target Architectures Regex-Matching Engine Memory-centric architectures FPGA logic Generalpurpose processors Network processors FPGA / ASIC + memory available parallelism

Challenges DFA Memory-centric architectures NFA Generalpurpose processors Network processors FPGA / ASIC + memory FPGA logic • Logic cell utilization • Clock frequency • Memory space • Memory bandwidth

D2FA: default transition compression • Observations: • DFA state: set of |∑| next state pointers • Transition redundancy • Idea: • Differential state representation through use of non-consumingdefault transitions • In general: s3 a s3 a b s4 s1 b s4 s1 c s5 c s5 s3 a b s4 s2 c s6 c s2 s6 DEFAULT PATH ∑ c1 c2 c1 c4 c3

D2FA algorithms • Problem: set default transitions so to • Maximize memory compression • Minimize memory bandwidth overhead • [Kumar et al, SIGCOMM’06] • Bound dpMAXon max default path length • O(dpMAX+1) memory accesses per input char • Better compression for higher dpMAX • [Becchi et al, ANCS’07] • Only backward-directed default transitions (skipping k levels) • Amortized memory bandwidth O((k+1/k)N) on N input chars • Depth-first traversal → at DFA creation Memory bandwidth = O((k+1/k)N) Time complexity =O(n2) Space complexity =O(n) Memory bandwidth = O((dpMAX+1)N) Time complexity = O(n2logn) Space complexity = O(n2) vs. Compression w/ k=1 ~ compression w/ dpMAX=∞

0 [a-z] [1-2] [A-Z] 3/1 3/1 [0-2] [a-zA-Z] 1 1 3 [0-9] [a-z] 0 4/2 4/2 [0-2] [a-zA-Z] 0 0 [a-z] 0 [a-z] 0 [1-2] [A-Z] [2-3] [0-9B-Z] 5/3 5/3 [B-Z] 2 2 2 A 1 1 A DFA alphabet reduction Effective for: Ignore-case regex Char-ranges Never used chars = + Alphabet translation table 8

1 aa ab 2 Multiple-stride DFAs • [Brodie et al, ISCA 2006] • Idea: • Process stride input chars at a time • Observations: • Mechanism used on small DFAs (1-2 regex) • No distinct accepting state handling DFA w/ stride 2 DFA 0

Multiple stride + alphabet reduction • Stride s → Alphabet ∑s • ∑=ASCII alphabet ►| ∑2|=2562=65,536; | ∑4|=2564~4,294M • Effective alphabet much smaller • Char grouping: [a-cef]a, [b-f]b • Alphabet reduction may be necessary to make stride doubling feasible on large DFAs 2-DFA DFA

Multiple stride + default transitions • Compression • Default transitions eliminate transition redundancy • In multiple stride DFAs • # of states does not substantially change • # of transitions per state increases exponentially (  stride) • Fraction distinct/total transitions decreases • Increased potential for compression! • Accepting state handling • Duplicated states have same outgoing transitions as original states but different depth • Default transition will remove all outgoing transitions from new accepting states DFA 2-DFA

Multiple stride + default transitions (cont’d) • Problem: • For large ∑ and stride, uncompressed DFA may be unfeasible • Out of memory when generating a 2K node, stride 4 DFA on a Linux machine w/ 4GB memory • Solution • Perform default transition compression during DFA creation • Use [Becchi et al, ANCS 2006] compression algorithm • In the situation above, only 10% memory used

alphabet reduction ||=25-44 avg 1-2 labeled tx/state default transition compression Stride-2 transformation 96.3-98.5% transitions removed Stride-2 DFA Compressed DFA alphabet reduction ||=53-470 • Same memory bandwidth requirement • Initial size=40X-80X final size default transition compression Avg 3-5 labeled tx/state 97.9-99.5% transitions removed Compressed Stride-2 DFA Putting everything together… 1-22 regex 48-1,940 states DFA

NFA • ab+cd • ab+ce • ab+c.*f • b[d-f]a • bdc

Multiple stride + alphabet reduction • Stride doubling • Alphabet reduction: • Clustering-based algorithm as for DFA, but sets of target states are compared Avoid new state creation Keep multiple transitions on the same symbol separated

FPGA implementation One-hot encoding [Sidhu & Prasanna] Quine-McCluskey like minimization scheme + logic reduction schemes (c1=b OR c1=B) AND NOT (c2=a OR c2=A)

FPGA Results - throughput

FPGA Results – logic utilization #s=7,864 ∑1=64 ∑2=2206 • Utilization: • 8-46% on XC5VLX50 device (7,400 slices) • XC5VLX330 device has 51,840 slices #s=2,086 ∑1=78 ∑2=1,969 #s=2,147 ∑1=68 ∑2=1640

ASIC – projected results Regex partitioning into multiple DFAs Content addressing w/ 64 bit words: • 98% states compressed w/ stride 1 • 82% states compressed w/ stride 2 Throughput: SRAM@500 MHz • 2-4 Gbps for stride 1 • 4-8 Gbps for stride 2 Alternative representation: decoders in ASIC or instruction memory

Conclusion • Algorithm: • Combination of default transition compression, alphabet reduction and stride multiplying on potentially large DFAs • Extension of alphabet reduction and stride multiplying to NFAs • FPGA Implementation: • Use of one-hot encoding w/ incremental improvement schemes • Logic minimization scheme for alphabet reduction & decoding • Additional aspects: • Multiple flow handling: FPGA vs. memory centric architectures • Design improvements tailored to specific architectures and data-sets: • Clustering into smaller NFAs and DFAs to allow smaller alphabets w/ larger strides

Thank you! • Questions? http://regex.wustl.edu

Efficient Regular Expression Evaluation: Theory to Practice Michela Becchi and Patrick Crowley

Efficient Regular Expression Evaluation: Theory to Practice Michela Becchi and Patrick Crowley

Presentation Transcript

Regular Expressions Theory and Practice

Introduction to Program Evaluation: Theory and Practice

Regular Expression 1. What is regular expression?

An Improved Algorithm to Accelerate Regular Expression Evaluation

Regular Expression

Regular Expression

^Regular Expression$

Regular Expression

Regular Expression

An Improved Algorithm to Accelerate Regular Expression Evaluation

Regular Expression

Regular Expression

Reviewed by Michela Becchi

Regular Expression

Reorganized and Compact DFA for Efficient Regular Expression Matching

Regular Expression

Automata and Regular Expression