1 / 57

SPP V2 Router Design

SPP V2 Router Design. John DeHart and Mike Wilson. Revision History. 3 June 2008 Initial release, presentation 25 June 2008 Updates on feedback from presentation. SPP Versions. SPP Version 0: What we used for SIGCOMM Paper SPP Version 1:

deanne
Download Presentation

SPP V2 Router Design

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SPP V2 RouterDesign John DeHart and Mike Wilson

  2. Revision History • 3 June 2008 • Initial release, presentation • 25 June 2008 • Updates on feedback from presentation

  3. SPP Versions • SPP Version 0: • What we used for SIGCOMM Paper • SPP Version 1: • Bare minimum we would need to release something to PlanetLab Users • SPP Version 2: • What we would REALLY like to release to PlanetLab users.

  4. Objectives for SPP-NPE version 2 • Deal with constraints imposed by switch • can send to only one NPU; can receive from only one NPU • split processing across NPUs • parsing, lookup on one; queueing on other • Provide more resources for slice-specific processing • Decouple QM schedulers from links • collection of largely independent schedulers • may use several to send to the same link • e.g. separate rate classes (1-10M, 10-100M, 100-100M) • optionally adjust scheduler rates dynamically • Provide support for multicast • requires addition of next-hop IP address after queueing • Enable single slice to operate at 10 Gb/s • Support “slow” code options • Use separate rate classes to limit rate to slow code options • LCI QMs for Parse, NPUB QMs for HdrFmt

  5. SPP Version 2 System Architecture Default Data Path GPE Blade GPE Blade LC Ingress Decap Parse Lookup AddShim NPUA 1 10Gb/s OR 10 1Gb/s SPI Switch SPI Switch Switch Blade RTM FIC FIC Copy QM HdrFormat LC Egress NPUB NPE 7010 Blade LC 7010 Blade

  6. SPP Version 2 System Architecture Fast-Path Data GPE Blade GPE Blade LC Ingress Decap Parse Lookup AddShim NPUA 1 10Gb/s OR 10 1Gb/s SPI Switch SPI Switch Switch Blade RTM FIC FIC Copy QM HdrFormat LC Egress NPUB NPE 7010 Blade LC 7010 Blade

  7. SPP Version 2 System Architecture Exception Data PathLocal Delivery GPE Blade GPE Blade LC Ingress Decap Parse Lookup AddShim NPUA 1 10Gb/s OR 10 1Gb/s SPI Switch SPI Switch Switch Blade RTM FIC FIC Copy QM HdrFormat LC Egress NPUB NPE 7010 Blade LC 7010 Blade

  8. NPE Version 2 Block Diagram NPUA SRAM SRAM GPE RxA (2 ME) TxA (2 ME) Decap(1 ME) Parse (8 ME) LookupA (1 ME) AddShim (1 ME) StatsA (1 ME) SRAM TCAM SPI Switch SPI Switch Switch Blade StatsB (1 ME) SRAM/3 SRAM/0 TxB (2 ME) HdrFmt/ SubEncap (4 MEs) QueueManager (4 MEs) LookupB&Copy (2 ME) RxB (2 ME) SRAM Scr2NN/Freelist (1 ME) SRAM NPUB

  9. NPE Version 2 Block Diagram NPUA SRAM SRAM GPE RxA (2 ME) TxA (2 ME) Decap(1 ME) Parse (8 ME) LookupA (1 ME) AddShim (1 ME) StatsA (1 ME) SRAM TCAM SPI Switch SPI Switch Switch Blade StatsB (1 ME) SRAM SRAM TxB (2 ME) HdrFmt/ SubEncap (4 MEs) QueueManager (4 MEs) LookupB&Copy (2 ME) RxB (2 ME) SRAM Scr2NN/Freelist (1 ME) SRAM NPUB

  10. PlanetLab NPE Input Frame from LC • Ethernet Header: • DstAddr: MAC address of NPE • SrcAddr: MAC address of LC • VLAN: One VLAN per MR (MR == Slice) • Only use lower 12 bits of Vlan Tag • IP Header: • Dst Addr: IP address of this node • How many IP Addresses can a NODE have? • Src Addr: IP address of previous hop • Protocol: UDP • UDP Header: • Dst Port: Identifies input tunnel • Src Port: with IP Src Addr identifies sending entity DstAddr (6B) Ethernet Header SrcAddr (6B) Type=802.1Q (2B) VLAN (2B) Type=IP (2B) Ver/HLen/Tos/Len (4B) ID/Flags/FragOff (4B) TTL (1B) Protocol = UDP (1B) Hdr Cksum (2B) Src Addr (4B) IP Header Dst Addr (4B) IP Options (0-40B) Src Port (2B) UDP Header Dst Port (2B) UDP length (2B) UDP checksum (2B) UDP Payload (MN Packet) PAD (nB) Ethernet Trailer CRC (4B) Indicates 8-Byte Boundaries Assuming no IP Options

  11. V 1 Rsv (3b) Intf (4b) Buffer Handle(24b) NPE Version 2 Block Diagram NPUA SRAM SRAM GPE RxA (2 ME) TxA (2 ME) Decap(1 ME) Parse (8 ME) LookupA (1 ME) AddShim (1 ME) StatsA (1 ME) SRAM TCAM SPI Switch SPI Switch Switch Blade StatsB (1 ME) SRAM Eth. Frame Len (16b) Reserved (12b) Port (4b) SRAM TxB (2 ME) HdrFmt/ SubEncap (4 MEs) QueueManager (4 MEs) LookupB&Copy (2 ME) RxB (2 ME) SRAM Scr2NN/Freelist (1 ME) SRAM NPUB

  12. RxA • No change from V1

  13. V 1 V 1 Rsv (3b) Rsv (3b) Intf (4b) Intf (4b) Buffer Handle(24b) Buffer Handle(24b) NPE Version 2 Block Diagram NPUA SRAM SRAM GPE RxA (2 ME) TxA (2 ME) Decap(1 ME) Parse (8 ME) LookupA (1 ME) AddShim (1 ME) StatsA (1 ME) SRAM TCAM SPI Switch SPI Switch Switch Blade StatsB (1 ME) SRAM Eth. Frame Len (16b) Reserved (12b) Port (4b) MN Frm Length(16b) MN Frm Offset (16b) SRAM Slice ID (VLAN) (16b) Rx UDP DPort (16b) TxB (2 ME) HdrFmt/ SubEncap (4 MEs) Rx IP SAddr (32b) QueueManager (4 MEs) LookupB&Copy (2 ME) RxB (2 ME) Rx UDP SPort (16b) Reserved (12b) Code (4b) Slice Data Ptr (32b) SRAM Scr2NN/Freelist (1 ME) SRAM NPUB

  14. Decap • Inputs: • Packet from RxA • Outputs: • Meta-frame (handle, offset and length) • Slice ID (VLAN tag) • …or is this lower 12b of VLAN tag and lower 4b of RX DA in? • Metainterface (Rx Saddr, Rx Sport, Rx Dport) • Code Option (4b, only 16 available) • Slice data pointer • Initialization: • VLAN table • Functionality: • Read VLAN tag from DRAM, determine correct code option. • Validate packet. Drop invalid, unmatched packets. • IP Options for NPE dropped in LC, should never arrive here! • Enqueue valid packets to SRAM ring. • Update stats • Status: • Change dl_sink from NN to SRAM. • No longer need to update buffer descriptor. • …except for min-sized packets, which RxA does not update fully (pkt len)

  15. VLAN table • code_option = 0 implies invalid slice • “on switch” for a slice in the data plane • SD data is currently only counters • 64B slice data • Only use lower 12b of VLAN tag (4096 VLANs) • Only changes from V1: • No longer need all data on NPUA, drop HF data, per-slice buffer limits

  16. V 1 V 1 Rsv (3b) Rsv (3b) Intf (4b) Intf (4b) Buffer Handle(24b) Buffer Handle(24b) NPE Version 2 Block Diagram NPUA SRAM SRAM GPE RxA (2 ME) TxA (2 ME) Decap(1 ME) Parse (8 ME) LookupA (1 ME) AddShim (1 ME) StatsA (1 ME) SRAM TCAM SPI Switch SPI Switch Switch Blade StatsB (1 ME) SRAM MN Frm Length (16b) MN Frm Offset (16b) MN Frm Length(16b) MN Frm Offset (16b) SRAM Lookup Key[143-112] Type(1b)/Slice ID(15b)/Rx UDP DPort (16b) Slice ID (VLAN) (16b) Rx UDP DPort (16b) Lookup Key[111-80] DA (32b) Rx IP SAddr (32b) TxB (2 ME) HdrFmt/ SubEncap (4 MEs) QueueManager (4 MEs) LookupB&Copy (2 ME) RxB (2 ME) Lookup Key[ 79-48] SA (32b) Rx UDP SPort (16b) Reserved (12b) Code (4b) Lookup Key[ 47-16] Ports (32b) Code (4b) Exception Bits (12b) Lookup Key Proto/TCP_Flags [15- 0] (16b) Slice Data Ptr (32b) SRAM Scr2NN/Freelist (1 ME) SRAM NPUB

  17. Parse • Inputs: • Meta-frame (handle, offset and length) • Slice ID (VLAN tag) • Tunnel ID (Rx Saddr, Rx Sport, Rx Dport) • Code Option (4b, only 16 available) • Slice data pointer • Outputs: • Meta-frame (handle, offset and length) • Lookup key (Includes slice ID, Rx UDP dport) • Change to include lower 4b of RX DA in; shave VLAN bits for the SliceID. • Code Option (4b, only 16 available) • Exception bits (MN-specific) • Initialization: • Slice Data • Functionality: • Slice-specific processing: • Parse meta-frame. • Extract lookup key. • Raise any relevant exceptions. • Can pass slice data to HdrFmt in bytes 16..30 of packet. (0..15 are reserved for AddShim) • Substrate processing: • Add substrate-specific information to lookup key (32b: Lookup type, Slice ID, Rx UDP dport) • Status: • Change to multi-ME synchronization • Read, write to SRAM rings • No longer need all V1 outputs; some have been removed and the rest compacted.(This change is optional, but may remove a memory access) • Slice data pointer, Rx UDP sport, Rx UDP Saddr

  18. Rsvd(16b) MN Frm Length (16b) Stats Index (16b) MN Frm Offset (16b) V 1 V 1 Rsv (3b) Rsv (3b) Intf (4b) Intf (4b) Buffer Handle(24b) Buffer Handle(24b) NPE Version 2 Block Diagram NPUA SRAM SRAM GPE RxA (2 ME) TxA (2 ME) Decap(1 ME) Parse (8 ME) LookupA (1 ME) AddShim (1 ME) StatsA (1 ME) SRAM TCAM SPI Switch SPI Switch Switch Blade StatsB (1 ME) SRAM MN Frm Length (16b) MN Frm Offset (16b) SRAM Slice ID (VLAN) (16b) Exception Bits (12b) Code (4b) Lookup Key[143-112] Type(1b)/Slice ID(15b)/Rx UDP DPort (16b) Result Index (32b) Lookup Key[111-80] DA (32b) TxB (2 ME) HdrFmt/ SubEncap (4 MEs) QueueManager (4 MEs) LookupB&Copy (2 ME) RxB (2 ME) Lookup Key[ 79-48] SA (32b) Lookup Key[ 47-16] Ports (32b) Code (4b) Exception Bits (12b) Lookup Key Proto/TCP_Flags [15- 0] (16b) SRAM Scr2NN/Freelist (1 ME) SRAM NPUB

  19. LookupA • Inputs: • Meta-frame (handle, offset and length) • Lookup key (Includes slice ID, Rx UDP dport) • Slice ID (VLAN tag) • Code Option (4b, only 16 available) • Outputs: • Meta-frame (handle, offset and length) • Lookup Result (Index into SRAM table on NPUB) • 32b is overkill; some of these bits are reserved. • Slice ID (VLAN tag) • Code Option (4b, only 16 available) • Exception bits (from Parse) • Stats Index (from TCAM) • Initialization: • Filters set in TCAM by control • Functionality: • Look up key in TCAM • On miss, drop the packet • Status: • Local Delivery is now handled at LookupB in SRAM table • Lookup result is now just a 32b index • No longer need all V1 input/outputs; some have been removed and the rest compacted.(This change is optional)

  20. Rsvd(16b) MN Frm Length (16b) Stats Index (16b) MN Frm Offset (16b) V 1 V 1 Rsv (3b) Rsv (3b) Intf (4b) Intf (4b) Buffer Handle(24b) Buffer Handle(24b) NPE Version 2 Block Diagram NPUA SRAM SRAM GPE RxA (2 ME) TxA (2 ME) Decap(1 ME) Parse (8 ME) LookupA (1 ME) AddShim (1 ME) StatsA (1 ME) SRAM TCAM SPI Switch SPI Switch Switch Blade StatsB (1 ME) SRAM SRAM Slice ID (VLAN) (16b) Exception Bits (12b) Code (4b) Result Index (32b) TxB (2 ME) HdrFmt/ SubEncap (4 MEs) QueueManager (4 MEs) LookupB&Copy (2 ME) RxB (2 ME) SRAM Scr2NN/Freelist (1 ME) SRAM NPUB

  21. AddShim • Inputs: • Meta-frame (handle, offset and length) • Lookup Result (Index into SRAM table on NPUB) • Slice ID (VLAN tag) • Code Option (4b, only 16 available) • Exception bits (from Parse) • Stats Index (from TCAM) • Outputs: • Shim Packet (buffer handle) • Buffer descriptor contains updated offset and length, if needed • Initialization: • None. • Functionality: • Prepend shim header to preserve packet annotations across NPU’s • Overwrite the existing ethernet header (Up to 18B) with: • Slice ID (16b) • Code Option (4b) • Exception Bits (12b) • MN Frame Offset (16b) • Result Index (32b) • Stats Index (16b) [This is the same on NPUA, NPUB] • 32B for opaque slice data. • Proper memory alignment required • This is written by Parse, not AddShim! • Status: • New. Stub version is written.

  22. V 1 Rsv (3b) Intf (4b) Buffer Handle(24b) NPE Version 2 Block Diagram NPUA SRAM SRAM GPE RxA (2 ME) TxA (2 ME) Decap(1 ME) Parse (8 ME) LookupA (1 ME) AddShim (1 ME) StatsA (1 ME) SRAM TCAM SPI Switch SPI Switch Switch Blade StatsB (1 ME) SRAM SRAM TxB (2 ME) HdrFmt/ SubEncap (4 MEs) QueueManager (4 MEs) LookupB&Copy (2 ME) RxB (2 ME) SRAM Scr2NN/Freelist (1 ME) SRAM NPUB

  23. TxA • Sends shim packet to NPUB. • Unmodified 10 Gbps Tx 2×ME.

  24. SPP Version2 NPUA to NPUB Frame • SHIM (16B) • Slice ID (16b) • Code Option (4b) • Exception Bits (12b) • Result Index (32b) • Stats Index (16b) • Offset of MN Packet (16b) • Memory Alignment Padding (4B) • IP Header, UDP Header may be overwritten by: • opaque slice data, written in Parse SHIM (16B) Type=IP (2B) Ver/HLen/Tos/Len (4B) ID/Flags/FragOff (4B) TTL (1B) Protocol = UDP (1B) Hdr Cksum (2B) Src Addr (4B) IP Header Dst Addr (4B) IP Options (0-40B) Src Port (2B) UDP Header Dst Port (2B) UDP length (2B) UDP checksum (2B) UDP Payload (MN Packet) PAD (nB) Ethernet Trailer CRC (4B) Indicates 8-Byte Boundaries Assuming no IP Options

  25. Reserved (8b) Buffer Handle(24b) Eth. Frame Len (16b) Reserved (12b) Port (4b) NPE Version 2 Block Diagram NPUA SRAM SRAM GPE RxA (2 ME) TxA (2 ME) Decap(1 ME) Parse (8 ME) LookupA (1 ME) AddShim (1 ME) StatsA (1 ME) SRAM TCAM SPI Switch SPI Switch Switch Blade StatsB (1 ME) SRAM SRAM TxB (2 ME) HdrFmt/ SubEncap (4 MEs) QueueManager (4 MEs) LookupB&Copy (2 ME) RxB (2 ME) SRAM Scr2NN/Freelist (1 ME) SRAM NPUB

  26. RxB • No change from V1

  27. Reserved (8b) Reserved (8b) Buffer Handle(24b) Buffer Handle(24b) Eth. Frame Len (16b) Reserved (12b) Port (4b) QM 2b Sch 3b PerSchedQID (15b) NPE Version 2 Block Diagram NPUA SRAM SRAM GPE RxA (2 ME) TxA (2 ME) Decap(1 ME) Parse (8 ME) LookupA (1 ME) AddShim (1 ME) Reserved (12b) StatsA (1 ME) SRAM TCAM Frame Length (16b) Stats Index (16b) SPI Switch SPI Switch Switch Blade StatsB (1 ME) SRAM SRAM TxB (2 ME) HdrFmt/ SubEncap (4 MEs) QueueManager (4 MEs) LookupB&Copy (2 ME) RxB (2 ME) SRAM Scr2NN/Freelist (1 ME) SRAM NPUB

  28. LookupB/Copy • Inputs: • Shim packet (buffer handle, frame length) • Outputs: • Packet (buffer handle, frame length) • QueueID (QM, Scheduler, Queue ID) • Stats Index • Initialization: • ResultTable • Functionality (Overview) • Copy shim header into buffer descriptor • Look up routing information from result index • If multicast, make the copies • Enqueue to correct QM (from ResultTable)

  29. LookupB/Copy – Code Sketch if not currently processing mcast packet read packet from SRAM ring extract shim load ResultTable value fill buffer descriptor if unicast if per-slice packet limit permits update per-slice packet count write to SRAM ring for correct QM. (By qmschedID in result table value). else drop buffer else start mcast processing if per-slice packet limit permits update per-slice packet count fetch first header buffer descriptor if payload length ≠ 0 write ref count into payload descriptor else drop payload buffer else drop buffer finish mcast processing else (Currently processing buffer, have empty header buffer handle) fill header buffer descriptor only chain if payload buffer is not empty if still making copies fetch next header buffer descriptor else finish mcast processing write current header buffer handle to SRAM ring for correct QM. (By qmschedID). signal next ME

  30. Per Sched Entry: IP SAddr (32b) Eth DA (48b) ResultTable – Unicast • Data needed to enqueue, rewrite packet: • QID • QMID, SchedID, QID (20b) (Lookup Result) • Src MI: • IP Saddr (32b) (Per SchedID Table) • UDP Sport (16b) (Lookup Result) • Tunnel Next Hop • IP DAddr (32b) (Lookup Result) • IP DPort (16b)(Lookup Result) • Chassis Addressing • Ethernet Dst MAC (48b) (Per SchedID Table) • Slice Specific Lookup Result Data (?) (Lookup Result) • Ethernet Src MAC • Should be constant across all pkts. Results Entry: QID (20b) IP DAddr (32b) UDP DPort (16b) UDP SPort (16b) HFIndex (16b)

  31. Per Sched Entry: IP SAddr (32b) Eth DA (48b) ResultTable – Multicast • Fanout gives the number of copies (0..15) • Data needed per copy on NPUB: • QID • QMID, SchedID, QID (20b) (Lookup Result) • Src MI: • IP Saddr (32b) (Per SchedID Table) • UDP Sport (16b) (Lookup Result) • Tunnel Next Hop • IP DAddr (32b) (Lookup Result) • IP DPort (16b)(Lookup Result) • Chassis Addressing • Ethernet Dst MAC (48b) (Per SchedID Table) • Slice Specific Lookup Result Data (?) (Lookup Result) • Ethernet Src MAC • Should be constant across all pkts. • Support Multicast but optimize for Unicast Results Entry: Fanout (4b) QID (20b) IP DAddr (32b) ×16 UDP DPort (16b) UDP SPort (16b) HFIndex (16b)

  32. Reserved (8b) Buffer Handle(24b) Reserved (12b) Frame Length (16b) Stats Index (16b) QM 2b Sch 3b PerSchedQID (15b) V 1 Rsv (3b) Intf (4b) Buffer Handle(24b) NPE Version 2 Block Diagram NPUA SRAM SRAM GPE RxA (2 ME) TxA (2 ME) Decap(1 ME) Parse (8 ME) LookupA (1 ME) AddShim (1 ME) StatsA (1 ME) SRAM TCAM SPI Switch SPI Switch Switch Blade StatsB (1 ME) SRAM SRAM TxB (2 ME) HdrFmt/ SubEncap (4 MEs) QueueManager (4 MEs) LookupB&Copy (2 ME) RxB (2 ME) SRAM Scr2NN/Freelist (1 ME) SRAM NPUB

  33. QM • No change from V1 • Incorporates recent change to limit queues by #pkts • Some changes in how control allocates bandwidth • Need to ensure that slow HdrFmt blocks can’t tie up the system • Currently looking at worst-case engineering • (everyone runs at slowest block speed)

  34. V 1 V 1 Rsv (3b) Rsv (3b) Intf (4b) Intf (4b) Buffer Handle(24b) Buffer Handle(24b) NPE Version 2 Block Diagram NPUA SRAM SRAM GPE RxA (2 ME) TxA (2 ME) Decap(1 ME) Parse (8 ME) LookupA (1 ME) AddShim (1 ME) StatsA (1 ME) SRAM TCAM SPI Switch SPI Switch Switch Blade StatsB (1 ME) SRAM SRAM TxB (2 ME) HdrFmt/ SubEncap (4 MEs) QueueManager (4 MEs) LookupB&Copy (2 ME) RxB (2 ME) SRAM Scr2NN/Freelist (1 ME) SRAM NPUB

  35. HdrFmt / SubEncap • Inputs: • Buffer Handle • Remaining inputs from Buffer Descriptor: • Multicast or Unicast (from buffer_next) • Frame length, offset • HFIndex (index into HFTable, a slice-specific table) • QMSchedID (for per-sched lookup in ResultTable) • Outputs: • Packet (buffer handle) • Buffer descriptor contains updated offset and length • Initialization: • HFTable, containing slice-specific data. For IPv4, this contains next-hop information (for both multicast and unicast traffic). • Functionality: • Substrate level: • read buffer descriptor and pass frame offset, length, HFIndex, mcast/ucast to slice-specific HdrFmt • Slice level: arbitrary processing. • For IPv4, this writes the next-hop information. • Returns new offset, length of frame. • Substrate level: • Encapsulate for output tunnel (from ResultTable) • Status: • Significant re-write at substrate level • Slice-specific code should change very little (add multicast support)

  36. V 1 V 1 Rsv (3b) Rsv (3b) Intf (4b) Intf (4b) Buffer Handle(24b) Buffer Handle(24b) NPE Version 2 Block Diagram NPUA SRAM SRAM GPE RxA (2 ME) TxA (2 ME) Decap(1 ME) Parse (8 ME) LookupA (1 ME) AddShim (1 ME) StatsA (1 ME) SRAM TCAM SPI Switch SPI Switch Switch Blade StatsB (1 ME) SRAM SRAM TxB (2 ME) HdrFmt/ SubEncap (4 MEs) QueueManager (4 MEs) LookupB&Copy (2 ME) RxB (2 ME) SRAM Scr2NN/Freelist (1 ME) SRAM NPUB

  37. Scr2NN/FreelistMgr • Inputs: • Buffer Handle (possibly chained) • Outputs: • Buffer Handle (possibly chained) • Initialization: • None • Functionality: • Combines Freelist Manager with Scr2NN glue • FM: Read from scratch ring. Free buffers, correctly handling chained buffers and reference counts. • Scr2NN: Read from Scratch, write to NN. • Status: • Both blocks exist, but combining them is not straight-forward. • Open question: how should we prioritize among these tasks? The author should ensure that no deadlock is possible. (TxB writes to FM; if FM ring is full, TxB stalls. If Scr2NN is writing to TxB, it stalls. Gridlock.)

  38. V 1 Rsv (3b) Intf (4b) Buffer Handle(24b) NPE Version 2 Block Diagram NPUA SRAM SRAM GPE RxA (2 ME) TxA (2 ME) Decap(1 ME) Parse (8 ME) LookupA (1 ME) AddShim (1 ME) StatsA (1 ME) SRAM TCAM SPI Switch SPI Switch Switch Blade StatsB (1 ME) SRAM SRAM TxB (2 ME) HdrFmt/ SubEncap (4 MEs) QueueManager (4 MEs) LookupB&Copy (2 ME) RxB (2 ME) SRAM Scr2NN/Freelist (1 ME) SRAM NPUB

  39. TxB • Must support chained buffers • Multicast uses header buffers and payload buffers • Headers are slice-specific; we can’t rely on known, static lengths as we did in ONL. • Sends header from one buffer, payload from chained buffer. • Can TX do this? Comments in the code seem to imply that chained (non-SOP) buffers must start at offset 0. Our payloads usually won’t. • According to DZar, this will probably take some TX modification, but there’s no reason why it won’t work. Might have a performance penalty, of course….

  40. Offset (16b) SPP V2 SideB SRAM Buffer Descriptor Buffer_Next (32b) LW0 Buffer_Size (16b) LW1 Packet_Size (16b) Free_list 0000 (4b) Reserved (4b) Ref_Cnt (8b) LW2 Stats Index (16b) Reserved (4b) Slice ID(xsid)(12b) LW3 HFIndex (16b) MR Exception Bits (16b) LW4 ResultIndex (32b) LW5 MR Bits (optional) (32b) LW6 Packet_Next (32b) • HFIndex is an index into the HFTable. For IPv4, this provides Next Hop information. • ResultIndex is used to get tunnel header info from the ResultTable LW7

  41. Design Questions • Small hole for abuse in HdrFmt • QM rate limits on payload length • HdrFmt (after QM) can vastly increase packet length • Should the LookupB table give the padding size for each entry? Enforced in SubEncap? • ANSWER: No, we will resort to our control of HdrFmt to force it to behave. (We write all of the code options right now.) • What are the best places to update stats on NPUB? • ANSWER: Post-Q only • Is there any remaining reason that NPUB would need the source tunnel information? • ANSWER: No. If a code option needs it, put it into opaque slice data. • Still working out remaining data areas.

  42. Extra Slides • The rest of the slides are old or for extra information

  43. Questions/Issues • 4/28/08: • How many code options? • Limit of 16? • To handle slow Code Options: • LCI Queues would control traffic to Fast/Slow Parse Code • Classes of code options defined by how long their Parse code takes. • Scheduler assigned to a class of code option. • NPE Queues would control traffic to Fast/Slow HF Code • LCE Queues control the output rate to Interfaces. • Multicast Problems: • Impact of multicast traffic overloading Lookup/Copy and becoming a bottleneck. • Rx on SideB, can it use SRAM output ring? • All our other 10G Rx’s have NN output ring. • Option for HF to send out additional pkts? • How to pass MR and substrate hdrs to TxB? • Through Ring or through Hdr Buffer associated with Hdr Buffer descriptor. • If the latter then what are the constraints in Tx for buffer chaining?

  44. Meeting Notes • 1/15/08: • QM: Add Pkt count to Queue Params, change limit from QLen to PktCount • Add Per Slice Pkt limit to NPUA and NPUB • Limit Fanout to 16 • MCast: Control will allocate all 16 entries for a multicast result entry, result entry will be typed as multicast or unicast and will not transition from one to the other. • What happens to pkts in queues when there is a route change that sends that flow’s pkts to a different interface and queue? Pkt ordering problems?

  45. NPE Version 2 Block Diagram slice#, resultIndx, etc, passed in shim Lookup produces resultIndx, statsIndx SRAM SRAM NPUA GPE Decap, Parse, LookupA, AddShim (8 MEs) RxA (2 ME) TxA (2 ME) Stats (1 ME) SRAM TCAM SPI Switch SPI Switch Switch Blade Stats (1 ME) flow control? SRAM TxB (2 ME) HdrFmt (4 MEs) QueueManager (4 MEs) LookupB&Copy (2 ME) RxB (2 ME) for unicast, resultIndx replaced by QiD; allowing output side to skip lookup SRAM NPUB SRAM Lookup on <slice#, resultIndx>yields fanout, list of QiDs;copy to queues, adding copy#;(slice#, resultIndx remain in packet buffer) use slice# to select slice to format packet; use resultIndx to get next-hop

  46. Questions/Issues • Where are exit and entry points for packets sent to and from the GPE for exception processing? • Parse (NPUA) and LookupA (NPUA) are where most exceptions are generated: • IP Options • No Route • Etc. • HdrFormat (NPUB) is where we do ethernet header processing • What needs to be in the SHIM going from NPUA to NPUB? • ResultIndex (32b) • Exception Bits (12b) • StatsIndex (16b) • Slice# (12b) • ??? • Will we support multi-copy in a way similar to the ONL Router? • How big can the fanout be? • How many QIDs need to be stored with the LookupB Result? • Is there some encoding for the QIDs that can take into account support for multicast and the copy#? For example: • Multicast QID(20b) • Multicast (1b): 1 • Copy# (4b) • PerMulticast QID(15b): One PerMulticast QID allocated for each Multicast • Unicast QID(20b) • Unicast (1b): 0 • QID (19b) • Are there timing/synchronization issues with adding, deleting or changing lookup entries between the two NPUs databases? • Do we need flow control between TxA and RxB?

  47. NPE Version 2 Block Diagram • NPUA: • RxA:Same as Version 0 • TxA: New 10Gb/s • Decap: Same as Version 0 • Parse: Same as Version 0 • New code options? • LookupA: Results will be different from Version 0 • AddSim: New SRAM SRAM NPUA GPE Decap, Parse, LookupA, AddShim (8 MEs) RxA (2 ME) TxA (2 ME) Stats (1 ME) SRAM TCAM SPI Switch SPI Switch Switch Blade Stats (1 ME) flow control? SRAM TxB (2 ME) HdrFmt (4 MEs) QueueManager (4 MEs) LookupB&Copy (2 ME) RxB (2 ME) SRAM NPUB SRAM

  48. NPE Version 2 Block Diagram • NPUB: • RxB:Same as Version 0 • TxB: New 10Gb/s • with L2 Header coming in on input ring? • LookupB: New • Copy: New, may be able to use some code from ONL Copy • QM: New, decoupled from Links • HF: New, may use some code from Version 0 SRAM SRAM NPUA GPE Decap, Parse, LookupA, AddShim (8 MEs) RxA (2 ME) TxA (2 ME) Stats (1 ME) SRAM TCAM SPI Switch SPI Switch Switch Blade Stats (1 ME) flow control? SRAM TxB (2 ME) HdrFmt (4 MEs) QueueManager (4 MEs) LookupB&Copy (2 ME) RxB (2 ME) SRAM NPUB SRAM

  49. NPE Version 2 Block Diagram NPUA Sram2NN (1 ME) SRAM SRAM GPE Decap, Parse, LookupA, AddShim (8 MEs) RxA (2 ME) TxA (2 ME) StatsA (1 ME) FreeList MgrA (1 ME) SRAM TCAM SPI Switch SPI Switch Switch Blade flow control? FreeList MgrB (1 ME) StatsB (1 ME) SRAM SRAM TxB (2 ME) HdrFmt (4 MEs) QueueManager (4 MEs) LookupB&Copy (2 ME) RxB (2 ME) NPUB has 17 MEs currently spec’ed SRAM Scr2NN (1 ME) SRAM NPUB

  50. SPP V2: MR Specific Code • Where does the MR Specific Code reside in V2: • Parse • HdrFormat • What about LookupA and LookupB? • Lookup is a “service” provided to the MRs by the Substrate. • No MR specific code needed in LookupA or LookupB • What about SideA AddShim? • The Exception bits that go in the shim are MR Specific but they should be passed to AddShim and it will write them into the Shim. • No MR Specific code needed in AddShim. • What about SideB Copy? • Is there anything MR specific about setting up multiple copies of a packet? • There shouldn’t be. We will have the Copy block allocate a new hdr buffer descriptor and link it to the existing data buffer descriptor and take care of reference counts. • The actual building of the new header(s) for the copies will be left to HF. • No MR Specific code needed in Copy.

More Related