SSA: A Power and Memory Efficient Scheme to Multi-Match Packet Classification

SSA: A Power and Memory Efficient Scheme to Multi-Match Packet Classification Fang Yu1 T. V. Lakshman2 Marti Austin Motoyama1 Randy H. Katz1 1EECS Department, UC Berkeley , 2Bell Laboratories, Lucent Technologies

Outline • Introduction to multi-match classification • Multi-match classification using TCAM • May consume a large amount of TCAM memory • May consume high power • Set Splitting Algorithm (SSA) • A memory and power efficient scheme for multi-match classification • Simulation results • Conclusions

Packet Classification • Single-Match classification • Assumption: all the filters are associated with priorities • Only the highest priority match matters • E.g., longest prefix match Packetheader Packet Payload • Multi-Match classification • Report all matching results • No priority among filters • Intrusion detection system: identify all the related rules • Also required by accounting applications

Ternary-CAM (TCAM) • Fully associative memory: compare input string with all the entries in parallel • For multiple matches, report the index of the first match • Each cell takes one of three logic states • ‘0’, ‘1’, and ‘?’(don’t care) cell entry width

Challenges of Multi-match Classification using TCAM • Memory efficient • 9Mbits – 18Mbits priced at $200-$300 • Power efficient • Easy update • High speed • TCAM is fast e.g., 4 ns, However,TCAM only returns the first match result • We want all the matching results within a few cycles • If returning a bit vector of the matching result? • Processing the bit vector can take time if the bit vector is long • Not efficient it is a sparse vector in most of the cases

Previous Solutions: Geometric Intersection-based Solution [Hot Interconnects 04] • Add additional intersection filters • High speed • Return all the matching results within one cycle • Memory efficient • Create ~10N intersection filters for the Snort rule set • May create O(NF) intersection filters in the worst case • Energy efficient • Easily updatable

Previous Solution: MUD [ Sigcomm’05] • Encode the index of the entry and include the encoded value in each TCAM entry • Search the TCAM with initial MUD as all don’t cares • After finding a matching result at index j, search again with discriminator field value ‘> j’

Previous Solution: MUD (Cont.) • High speed • 1+d+(k-2)*(d-1) = O(dk) TCAM lookups to get k matching results • d is the logarithm of the number of entries in TCAM (d=log2N) • Decreased to 1+d*(k-1)/r with DIRPE, where r (smaller than d) • Memory efficient • Energy efficient • All the entries in TCAMs are accessed each time  high power consumption. • Easily updatable Our Goal: Find a memory and power efficient solution

Observation Original • Split filters into two sets to reduce intersections • Report the union of results from all sets • No need to include the intersections of the filters from different sets • Decrease the number of filters in TCAM, decrease power consumption • Increase the number of TCAM access FN Two sets Matching FN F1 Matching F1 and FN Matching F1 N filters +O(N2) intersection 1 TCAM lookup N filters + 1 intersection 2 TCAM lookups

Problem Definition • Given a set of filters F(F1,F2, …., FN) • Filters create a set of intersections I(I1,I2, …., IM) • e.g., I1= intersection of (F1, F5, F6) • How to divide the filters into several sets • Residual intersection set I’: intersections from filters in the same set • N + |I’| < TCAM size • Number of sets (TCAM accesses) is minimum • NP hard problem!

Split Rules into Two Sets • Still an NP hard problem (known as maximum set splitting or maximum hypergraph cut) • Best known approximation algorithms • Yield a performance ratio of 0.72 to the optimum solution • Require quadratic programming slow when the number of filters is large • Our SSA algorithm • Remove at least half of the intersections • O(NM) complexity, where N is the total number of filters, and M is the total number of intersections

Maximum Satisfiability Problem • Maximum Satisfiability Problem • A set of literals {F1, F1, F2, F2,.., FN, FN} • A set of clauses, each clause is a subset of literals • E.g., C1={F1 F5 F6} • Goal: Find an assignment of F to satisfy a maximum number of clauses

Johnson’s Algorithm to Maximum Satisfiability Problem • Assign each clause a weight = 2-|c| • E.g., weight of C1={F1, F5 F6} is 2-3 • Let Fi be any literal which hasn’t been assigned a value yet • If the weight of all clauses containing Fi is higher than those containing Fi • Assign Fi a true value and remove all clauses containing Fi • Multiply the weight of all the clauses containing Fi by 2 • Otherwise • Assign Fi a false value and remove all clauses containing Fi • Multiply the weight of all the clauses containing Fi by 2

Johnson’s Theorem • If all the clauses have at least k literals • Johnson’s algorithm can satisfy at least (2K-1)/ 2Kpercent of the total clauses • e.g., k=2, satisfy at least ¾ of the clauses • It is proved that (2K-1)/ 2Kis the best approximation bound for k>2

Filter Set Split Algorithm (SSA) • Convert set splitting problem into maximum satisfiablity problem • Each filter corresponds to a literal • For any intersection (e.g., I1= intersection of F1,, F5, and F6), add two clauses • C={F1, F5 F6} and C’={F1, F5 F6} • Total number of clauses is 2M, M is the number of intersections • Run Johnson’s algorithm and assign each filter Fi either a true (put in set one) or a false value (put in set two)

Filter Set Split Algorithm (SSA) (cont.) • According to Johnson’s theorem • At least ¾ of the clauses are satisfied  2M*3/4=1.5M At least 0.5M of the intersections have both clauses satisfied • Suppose for intersection of F1,, F5, and F6 , C={F1 F5 F6} and C’={F1 F5 F6} both are satisfied • At least one of F1,, F5, F6 is true and at least one is false • F1,, F5, F6 are split into different sets, thus this intersection doesn’t need to be presented in TCAM At least 50% of the intersections are removed!

Review of the SSA Scheme • High speed • Deterministic lookup rate. E.g., if filters are split into two sets, only 2 TCAM lookups per packet are needed. • Sets are logically independent  Lookups can be parallelized • Memory efficient • Guarantee the removal of at least 50% of the intersections each time the filter set is split into two sets • Energy efficient • Low memory requirement • Access each filter only once per packet • Easily updatable • Updates can be inserted to one of the set that creates the least number of intersections

Simulation Setup • Tests on the Snort rule header sets • Compare SSA with two TCAM-based solutions: • MUD • Geometric Intersection-based solution • Compare SSA with two representative software-based solutions: • Hicuts • EGT-PC • Evaluation metrics • Memory consumption • Lookup rate • Power consumption • Update cost

Memory Usage Total number of extra intersections filters in TCAMs. Total number of TCAM entries used.

Classification Speed • MUD • One packet may match up to 12 unique filters, and requires a maximum of 20 TCAM lookups • Common packets like http packets match 4 unique filters and may require 5-9 TCAM lookups. A Napster packet requires 9 to 15 TCAM lookups • Geometric Intersection-based solution • 1 TCAM lookup per packet • SSA-2 • 2 TCAM lookups per packet • SSA-4 • 4 TCAM lookups per packet • If average packet size is 402.7 bytes, SSA-4 operates at 201.35 Gbps classification rate • Worst case, if every packet is 40 bytes, SSA-4 achieves 20Gbps rate

Update Cost • Update cost in terms of newly inserted filters

Power Consumption • Energy used by a TCAM is linear to • The number of entries searched in parallel • The number of TCAM accesses per packet • Metric: total TCAM entries accessed per packet

Conclusions • SSA is a memory and power efficient solution to multi-match classification problem • O(NM) complexity • Guarantee to remove 50% of the intersections each time the filter set splits • Comparing to MUD • Use a similar amount of TCAM memory • Yield a 75% to 95% reduction in power consumption • Comparing to the Geometric Intersection-based Solution • Use 90% less TCAM memory and power • Require one additional TCAM lookup per packet

SSA: A Power and Memory Efficient Scheme to Multi-Match Packet Classification