Data plane algorithms in routers

Data plane algorithms in routers From prefix lookup to deep packet inspection Cristian Estan, University of Wisconsin-Madison

What is the data plane? • The part of the router handling the traffic • Data plane algorithms applied to every packet • Successive packets typically treated independently • Example: deciding on which link to send a packet • Throughput defined as number of packets or bytes handled per second is very important • “Line speed” – keeping up with the rate at which traffic can be transmitted over the wire or fiber • Example: 10Gbps router has 32 ns to handle 40 byte packet • Memory usage limited by technology and costs • Can afford at most tens of megabits of fast on-chip memory DIMACS Tutorial on Algorithms for Next Generation Networks August 6-8 2007

A generic data plane problem • Router has many directives composed of a guard, and an associated action (all guards distinct) • There is a simple procedure for testing how well a guard matches a packet • For each packet, find the guard that matches “best” and take the associated action • Example – routing table lookup: • Each guard is an IP prefix (between 0 and 32 bits) • Matching procedure: is the guard a prefix of the 32 bit destination IP address • “Best” defined as longest matching prefix DIMACS Tutorial on Algorithms for Next Generation Networks August 6-8 2007

The rules of the game • Matching against all guards in sequence is too slow • We build a data structure that captures the semantics of all guards and use it for matching • Primary metrics • How fast the matching algorithm is • How much memory the data structure needs • Time to build data structure also has some importance • We can cheat (but we won’t today) by: • Using binary or ternary content-addressable memories • Using other forms of hardware support DIMACS Tutorial on Algorithms for Next Generation Networks August 6-8 2007

Measuring “algorithm complexity” • Execution cost measured in number of memory accesses to read data structure • Actual data manipulation operations typically very simple • On some platforms we can read wide words • Worst case performance most important • Worst case defined with respect to input, not guards • Caching has been proven ineffective for many settings • Using algorithms with good amortized complexity, but bad worst case requires large buffers DIMACS Tutorial on Algorithms for Next Generation Networks August 6-8 2007

Overview • Longest matching prefix • Trie-based algorithms • Uni-bit and multi-bit tries (fixed stride and variable stride) • Leaf pushing • Bitmap compression of multi-bit trie nodes • Tree bitmap representation for multi-bit trie nodes • Binary search on ranges • Binary search on prefix lengths • Classification on multiple fields • Signature matching DIMACS Tutorial on Algorithms for Next Generation Networks August 6-8 2007

Longest matching prefix • Used in routing table lookup (a.k.a. forwarding) for finding the link on which to send a packet • Guard: a bit string of 0 to w bits called IP prefix • Action: a single byte interface identifier • Input: a w-bit string representing the destination IP address of the packet (w is 32 for IPv4,128 for IPv6) • Output: the interface associated with the longest guard matching the input • Size of problem: hundreds of thousands of prefixes DIMACS Tutorial on Algorithms for Next Generation Networks August 6-8 2007

Controlled prefix expansion with stride 3 Leaf pushing Multi-bit trie with fixed stride Routing table Multi-bit trie with variable stride Leaf pushing reduces memory usage but increases update time Uni-bit trie Given a maximum trie height h and a routing table of size n dynamic programming algorithm computes optimal variable stride trie in O(nw2h) DIMACS Tutorial on Algorithms for Next Generation Networks August 6-8 2007

Controlled prefix expansion with stride 3 Leaf pushing Multi-bit trie with fixed stride Routing table Input Multi-bit trie with variable stride Leaf pushing reduces memory usage but increases update time Longest matching prefix Uni-bit trie Given a maximum trie height h and a routing table of size n dynamic programming algorithm computes optimal variable stride trie in O(nw2h) DIMACS Tutorial on Algorithms for Next Generation Networks August 6-8 2007

Lulea bitmap compression Bitmap supporting fast counting Compressed node When the compression bitmaps are large it is expensive to count bits during lookup. The bitmap is divided into chunks and a pre-computed auxiliary array stores the number of bits set before each chunk. The lookup algorithm needs to count only bits set within one chunk. Repeating entries are stored only once in the compressed array. An auxiliary bitmap is needed to find the right entry in the compressed node. It stores a 0 for positions that do not differ from the previous one. Input Representing node as tree bitmap Pointers to children and prefixes are stored in separate structures. Prefixes of all lengths are stored, thus leaf pushing is not needed and update is fast. Bitmaps have 1s corresponding to entries that are not empty. Longest matching prefix 13+0=13 DIMACS Tutorial on Algorithms for Next Generation Networks August 6-8 2007

Binary search on ranges • Divide w-bit address space into maximal continuous ranges covered by same prefix • Build array or balanced (binary) search tree with boundaries of ranges • At lookup time perform O(log(n)) search • Not better than multi-bit tries with compression, but it is not covered by patents DIMACS Tutorial on Algorithms for Next Generation Networks August 6-8 2007

Binary search on prefix lengths • Core idea: for each prefix length represented in the routing table, have a hash table with the prefixes • Can find longest matching prefix after looking up in each hash table the prefix of the address with corresponding length • Binary search on prefix lengths is faster • Simple but wrong algorithm: if you find prefix at length x store it as best match and look for longer matching prefixes, otherwise look for shorter prefixes • Problem: what if there is both a shorter and a longer prefix, but no prefix at length x? • Solution: insert marker at length x when there are longer prefixes. Must store with marker longest matching shorter prefix. Markers lead to moderate increase in memory usage. • Promising algorithm for IPv6 (w=128) DIMACS Tutorial on Algorithms for Next Generation Networks August 6-8 2007

Papers on longest matching prefix • G. Varghese “Network algorithmics an interdisciplinary approach to designing fast networked devices”, chapter 11, Morgan Kaufmann 2005 • V. Srinivasan, G. Varghese “Faster IP lookups using controlled prefix expansion”, ACM Trans. on Comp. Sys., Feb. 1999 • M. Degermark, A. Brodnik, S. Carlsson, S. Pink “Small forwarding tables for fast routing lookups”, ACM SIGCOMM, 1997 • W. Eatherton, Z. Dittia, G. Varghese “Tree Bitmap : Hardware / Software IP Lookups with Incremental Updates”, http://www-cse.ucsd.edu/~varghese/PAPERS/willpaper.pdf • B. Lampson, V. Srinivasan, G. Varghese “IP lookups using multiway and multicolumn search”, IEEE Infocom, 1998 • M. Waldvogel, G. Varghese, J. Turner, B. Plattner, “Scalable high-speed IP lookups”, ACM Trans. on Comp. Sys., Nov. 2001 DIMACS Tutorial on Algorithms for Next Generation Networks August 6-8 2007

Overview • Longest matching prefix • Classification on multiple fields • Solution for two-dimensional case: grid of tries • Bit vector linear search • Cross-producting • Decision tree approaches • Signature matching DIMACS Tutorial on Algorithms for Next Generation Networks August 6-8 2007

Packet classification problem • Required for security, recognizing packets with quality of service requirements • Guard: prefixes or ranges for k header fields • Typically source and destination prefix, source and destination port range, and exact value or * for protocol • All fields must match for rule to apply • Action: drop, forward, map to a certain traffic class • Input: a tuple with the values of the k header fields • Output: the action associated with the first rule that matches the packet (rules are strictly ordered) • Size of problem: thousands of classification rules DIMACS Tutorial on Algorithms for Next Generation Networks August 6-8 2007

Example of classification rule set External time server TO Router that filters traffic Mail gateway M Internet Net Internal time server TI Secondary name server S DIMACS Tutorial on Algorithms for Next Generation Networks August 6-8 2007

A geometric view of packet classification R3 R1 R1 • In theory number of regions defined can be much larger than number of rules • Any algorithm that guarantees O(n) space for all rule sets of size n needs O(log(n)k-1) time for classification R3 Destination address space R2 R2 Source address space Source address space DIMACS Tutorial on Algorithms for Next Generation Networks August 6-8 2007

The two dimensional case: source and destination IP addresses • For each destination prefix in rule set, link to corresponding node in destination IP trie a trie with source prefixes of rules using this destination prefix • Matching algorithm must use backtracking to visit all source tries • Grid of tries: by pre-computing “switch pointers” in destination tries and propagating some information about more general rules, matching may proceed without backtracking • Memory used proportional to number of rules • Matching time O(w) with constant depending on stride • Extended grid of tries handles 5 fields and has good run time and memory in practice DIMACS Tutorial on Algorithms for Next Generation Networks August 6-8 2007

R8 • Bit vector approaches do linear search through rule set • For each field we pre-compute a structure (e.g. trie) to find most specific prefix or range distinguished by rule set • For each rule, a single bit represents whether a given most specific prefix matches rule or not • We associate with each range a bitmap of size n encoding which of the rules may match a packet in that prefix or range • Classification algorithm first computes for each field of the packet the most specific prefix/range it belongs to • By then AND-ing together the k bitmaps of size n we find matching rules • Works well for hardware solutions that allow wide memory reads • Scales poorly to large rule sets DIMACS Tutorial on Algorithms for Next Generation Networks August 6-8 2007

3*120+3*30+4*6+1*3+1=478 Cross-producting performs longest prefix matching separately for all fields and combines the results in a single step by looking up the matching rule in a pre-computed table explicitly listing the first matching rule for each element of the cross-product. The size of this table is the product of the numbers of recognized prefixes/ranges for the individual fields. Due to its memory requirements this method is not feasible. … 4*4*5*2*3=480 Equivalenced cross-producting (a.k.a. recursive flow classification or RFC) combines the results of the per-field longest matching prefix operations two by two. The pairs of values are grouped in equivalence classes and in general there are much fewer equivalence classes than pairs of values. This leads to significant memory savings as compared to simple cross-producting. This algorithm provides fast packet classification, but compared to other algorithms, the memory requirements are relatively large (but feasible in some settings). 16 entries, 8 distinct classes Dest IP Src IP Dest Port Src Port Proto DIMACS Tutorial on Algorithms for Next Generation Networks August 6-8 2007 Final result

Decision tree approaches • At each node of the tree test a bit in a field or perform a range test • Large fan-out leads to shallow trees and fast classification • Leaves contain a few rules traversed linearly • Interior nodes may contain rules that match also • Tests may look at bits from multiple fields • A rule may appear in multiple nodes of the decision tree – this can lead to increased memory usage • Tree built using heuristics that pick fields to compare on that divide remaining rules relatively evenly among descendants • Fast and compact on rule sets used today DIMACS Tutorial on Algorithms for Next Generation Networks August 6-8 2007

Papers on packet classification • G. Varghese “Network algorithmics …”, chapter 12 • V. Srinivasan, G. Varghese, S. Suri, M. Waldvogel, “Fast and Scalable Layer Four Switching”, ACM SIGCOMM, Sep. 1998 • F. Baboescu, S. Singh, G. Varghese, “Packet classification for core routers: Is there an alternative to CAMs?”, IEEE Infocom, 2003 • P. Gupta, N. McKeown, “Packet classification on multiple fields”, ACM SIGCOMM 1999 • T. Woo, “A modular approach to packet classification: Algorithms and results”, IEEE Infocom, 2000 • S. Singh, F. Baboescu, G. Varghese, “Packet classification using multidimensional cutting”, SIGCOMM, 2003 DIMACS Tutorial on Algorithms for Next Generation Networks August 6-8 2007

Overview • Longest matching prefix • Classification on multiple fields • Signature matching • String matching • Regular expression matching w/ DFAs and D2FAs DIMACS Tutorial on Algorithms for Next Generation Networks August 6-8 2007

Signature matching • Used in intrusion prevention/detection, application classification, load balancing • Guard: a byte string or a regular expression • Action: drop packet, log alert, set priority, direct to specific server • Input: byte string from the payload of packet(s) • Hence the name “deep packet inspection” • Output: the positions at which various signatures match or the identifier of the “highest priority” signature that matches • Size of problem: hundreds of signatures per protocol DIMACS Tutorial on Algorithms for Next Generation Networks August 6-8 2007

String matching • Most widely used early form of deep packet inspection, but the more expressive regular expressions have superceded strings by now • Still used as pre-filter to more expensive matching operations by popular open source IDS/IPS Snort • Matching multiple strings a well-studied problem • A. Aho, M. Corasick. “Efficient string matching: An aid to bib- liographic search”, Communications of the ACM, June 1975 • Many hardware-based solutions published in last decade • Matching time independent of number of strings, memory requirements proportional to sum of their sizes DIMACS Tutorial on Algorithms for Next Generation Networks August 6-8 2007

Regular expression matching • Deterministic and non-deterministic finite automata (DFAs and NFAs) can match regular expressions • NFAs more compact but require backtracking or keeping track of sets of states during matching • Both representations used in hardware and software solutions, but only DFA based solutions can guarantee throughput in software • DFAs have a state space explosion problem • From DFAs recognizing individual signatures we can build a DFA that recognizes entire signature set in a single pass • Size of combined DFA much larger than sum of sizes for DFAs recognizing individual signatures • Multiple combined DFAs are used to match signature set DIMACS Tutorial on Algorithms for Next Generation Networks August 6-8 2007

S. Kumar, S. Dharmapurikar, F. Yu, P. Crowley, J. Turner, “Algorithms to Accelerate Multiple Regular Expressions Matching for Deep Packet Inspection”, ACM SIGCOMM, September 2006 Delayed Input DFA (D2FA) Deterministic finite automaton (DFA) State 0 State 1 State 2 State 3 State 0 State 1 State 2 State 3 Default transitions … … Input If the “current state” variable meets an acceptance condition (e.g. whether the state identifier is larger than a given threshold), the automaton raises an alert. Crt. state D2FAs build on the observation that for many pairs of states, the transition tables are very similar and it is enough to store the differences. The lookup algorithm may need to follow multiple default transitions until it finds a state that explicitly stores a pointer to the next state it needs to transition to. Since this is a throughput concern, the algorithm for constructing D2FAs allows the user to set a limit on the length of the maximum default path. The memory columns report the ratio between the number of transitions used by the D2FA and the corresponding DFA. DIMACS Tutorial on Algorithms for Next Generation Networks August 6-8 2007

Conclusions • Networking devices implement more and more complex data plane processing to better control traffic • The algorithms and data structures used have big performance impact • Often set of rules to be matched against has specific structure • Algorithms exploiting this structure may give good performance even if it is impossible to find an algorithm that gives good performance on all possible rule sets DIMACS Tutorial on Algorithms for Next Generation Networks August 6-8 2007

That’s all folks! DIMACS Tutorial on Algorithms for Next Generation Networks August 6-8 2007

Data plane algorithms in routers

Data plane algorithms in routers

Presentation Transcript

Routers

The IP Data Plane: Packets and Routers

Data Plane Summary

Data Plane Measurements

The Data Plane

BGP Data-Plane Benchmarking

Nvo3 Data Plane Requirements

Space Data Routers for Exploiting Space Data

BGP Data-Plane Benchmarking

BGP Data-Plane Benchmarking Applicable to Modern Routers

Data Grid Plane

Customizing Data-plane Processing in Edge Routers

Data Plane Verification

Approximation algorithms for TSP with neighborhoods in the plane

Data Plane Measurements

ROUTERS

Handout # 9: Packet Forwarding in Data Plane

Data Plane Algorithms in Network Processing Systems