1 / 20

Network Processing Systems Design & Implementation Principles II - Chapter 3 Outline

This chapter outlines implementation principles in network processing system design, including examples such as TCAM updating and cautionary questions to consider. It covers principles related to avoiding waste, shifting computation, leveraging off-system components, and more.

knightj
Download Presentation

Network Processing Systems Design & Implementation Principles II - Chapter 3 Outline

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ECE 526 – Network Processing Systems Design System Implementation Principles II Varghese Chapter 3

  2. Outline • Review Principle 1-7 • Implementation principles • Reflect what we learned • Example: TCAM updating • Cautionary Questions

  3. Reviews • P1: Avoid Obvious Waste • Example: copy packet pointer instead of packet • P2: Shift Computation in Time • precompute (table lookup), • evaluate lazily (network forensics) • Share Expenses (batch processing) • P3: Relax Subsystem Requirements • Trade certainty for time (random sampling); • Trade accuracy for time (hashing, bloom filter); • Shift computation in space (fast path/slow path)

  4. Reviews • P4:Leverage Off-System Components • Examples: Onboard Address Recognition & Filtering, cache • P5: Add Hardware to Improve Performance • Use memory interleaving, pipelining (= parallelism); • Use Wide-word parallelism (save memory accesses) • Combine SRAM, DRAM (low-order bits each counter in SRAM for a large number of counters) • P6: Replace inefficient general routines with efficient specialized ones • Examples: NAT using forwarding and reversing tables • P7: Avoid Unnecessary Generality • Examples: RISC, microengine

  5. P8: Don't be tied to reference implementations • Key Concept: • Implementations are sometimes given (e.g. by manufacturers) as a way to make the specification of an interface precise, or show how to use a device • These do not necessarily show the right way to think about the problem—they are chosen for conceptual clarity! • Examples: • Using parallel packet classification instead of sequential demultiplexing in TCP/IP protocols

  6. P9: Pass hints across interfaces • Key Concept: if the caller knows something the callee will have to compute, pass it (or something that makes it easier to compute) as an argument! • "hint" = something that makes the recipient's life easier, but may not be correct • "tip" = hint that is guaranteed to be correct • Caveat: callee must either trust caller, or verify (probably should do both) • Example • Active message, the message carry the address of interrupt handler for fast dispatching

  7. P10: Pass hints in protocol headers • Key Concept: If sender knows something receiver will have to compute, pass it in the header • Example: • Tag switching, packet contains extra information beside the destination address for fast lookup

  8. P11: Optimize the Expected Case • Key Concept: If 80% of the cases can be handled similarly, optimize for those cases • P11a: Use Caches • A form of using state to improve performance • Example: • TCP input "header prediction" • If an incoming packet is in order and does what is expected, can process in small number of instructions

  9. P12: Add or Exploit State to Gain Speed • Key Concept: Remember things to make it easier to compute them later • P12a: Compute incrementally • Here the idea is to "accumulate" as you go, rather than computing all-at-once at the end • Example: • Incremental computation of IP checksum

  10. P13: Optimize Degrees of Freedom • Key Concept: be aware of variables under one’s control and evaluation criteria used determine good performance • Example: memory-based string matching algorithm • possible transitions from each state for a character is 256 (2^^8, ASCII coding using 8 bit); • Bit-split algorithm using 8 machines, each machine only check for one bit, the total possible transitions for a character is 16 (2^^1 * 8)

  11. P14: Use special techniques for finite universes (e.g. small integers) • Key Concept: when the domain of a function is small, techniques like bucket sorting, bitmaps, etc. become feasible. • Example: • bucket sorting for NAT table lookup • NAT table is very sparse • Each bucket is accessed by hashing • Bucket sort • Partitioning an array into a finite number of bucket • Each bucket is sorted individually

  12. P15: Use algorithmic techniques to create efficient data structures • Key Concept: once P1-P14 have been applied, think about how to build an ingenious data structure that exploits what you know • Examples • IP forwarding lookups • PATRICIA trees (data structure) were first • A special trie, with each edge of patricia tree labled with sequences of characters. • Then many other more-efficient approaches

  13. TCAM • Ternary: 0, 1 and *(wildcard) • TCAM: specified length of key and associated actions • TCAM lookup: compare the query with all keys in parallel, output (in one cycle) the lowest memory location whose key matches the input • IP forward uses longest-prefix matching • DIP 010001 matches both 010001* and 01* • Using TCAM for IP forwarding, requires put all longer prefixes occur before any shorter ones.

  14. IP Lookup • All prefixes with the same length are group together • the shortest prefix 0* are in the highest memory address • The packet with DIP: 110001 matches prefix of both P3 and P5 • P5 is chosen due to longest-prefix matches

  15. Routing Table Update • 11* with P1 needed to insert to routing table • Naïve: create space in group of length-2 prefix, and pushing up one position all prefixes of length-2 and higher • Core routing table have 100, 000 entries  100, 000 memory accesses

  16. Routing Table Update • P13: understand the exploit degrees of freedom -- we can add 11* at any position of group 2, not required after 10*. • We can add boundary of group 2 and group 3.

  17. Clever Routing Table Updating • the maximum memory accesses is 32 – i.

  18. Cautionary Questions • Q1: Is improvement really needed? • Q2: Is this really the bottleneck? • Q3: What impact will change have on rest of system? • Q4: Does BoE-analysis indicate significant improvement? • Q5: Is it worth adding custom hardware? • Q6: Can protocol change be avoided? • Q7: Do prototypes confirm the initial promise? • Q8: Will performance gains be lost if environment changes?

  19. Summary • P1-P5: System-oriented Principles • These recognize/leverage the fact that a system is made up of components • Basic idea: move the problem to somebody else’s subsystem • P6-P10: Improve efficiency without destroying modularity • “Pushing the envelope” of module specifications • Basic engineering: system should satisfy spec but not do more • P11-P15: Local optimization techniques • Speeding up a key routine • Apply these after you have looked at the big picture

  20. Reminder

More Related