1 / 102

An NP-Based Router for the Open Network Lab

An NP-Based Router for the Open Network Lab. John DeHart. Schedule. April 10: Header Format (Mike) April 17: Parse,Lookup and Copy (Jing and John) April 24: Stats and FreelistMgr (Dave and John) May 1: Mux (Mart) May 8: XScale (Charlie) May 15: Plugins (Charlie and Shakir).

scottcox
Download Presentation

An NP-Based Router for the Open Network Lab

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An NP-Based Router for the Open Network Lab John DeHart

  2. Schedule • April 10: Header Format (Mike) • April 17: Parse,Lookup and Copy (Jing and John) • April 24: Stats and FreelistMgr (Dave and John) • May 1: Mux (Mart) • May 8: XScale (Charlie) • May 15: Plugins (Charlie and Shakir)

  3. Svn with remote server • To connect to our current svnserve configuration from a remote machine outside of WU, you will need to use an SSH tunnel.

  4. Svn with remote server

  5. Svn with remote server • Via cygwin then use this command: • svn checkout svn://localhost:7071/techX

  6. Notes from 3/27/07 • Schedule? • Bank1: check SRAM access bw with QM and Rings • QID: Use 1-5 for port numbers in top 3 bits so that a DG qid of 0 will not result in a full QID=0 • NSP: Should we allow users to assign traffic to a DG queue • Do we need the Drop bit or is it sufficient to have a copy_vector of 0 to indicate a drop? • Define how/where sampling filters are implemented and how the sampling happens. • QIDs and Plugins are still fuzzy. • Should there be a restricted set of QIDs that the Plugins can allocate that the RLI and XScale are not allowed to allocate? • Can we put Output Port in data going to the XScale? • Buffer Offset: Should it point to the start of the ethernet header or start of the IP pkt? • Length fields in the buffer descriptor are they ethernet frame length or IP packet length? • Add Buffer Chaining slides.

  7. JST: Objectives for ONL Router • Reproduce approximately same functionality as current hardware router • routes, filters (including sampling filters), stats, plugins • Extensions • multicast, explicit-congestion marking • Use each NPU as separate 5 port router • each responsible for half the external ports • xScale on each NPU implements CP functions • access to control variables, memory-resident statistics • updating of routes, filters • interaction with plugins through shared memory • simple message buffer interface for request/response

  8. JST: Unicast, ARP and Multicast • Each port has Ethernet header with fixed source MAC address – several cases for destination MAC address • Case 1 – unicast packet with destination on attached subnet • requires ARP to map dAdr to MAC address • ARP cache holds mappings – issue ARP request on cache miss • Case 2 – other unicast packets • lookup must provide next-hop IP address • then use ARP to obtain MAC address, as in case 1 • Case 3 – Multicast packet • lookup specifies copy-vector and QiD • destination MAC address formed from IP multicast address • Could avoid ARP in some cases • e.g. point-to-point link • but little advantage, since ARP mechanism required anyway • Do we learn MAC Addresses from received pkts?

  9. JST: Proposed Approach • Lookup does separate route lookup and filter lookup • at most one match for route, up to two for filter (primary, aux) • combine route lookup with ARP cache lookup • xScale adds routes for multi-access subnets, based on ARP • Route lookup • for unicast, stored keys are (rcv port)+(dAdr prefix) • lookup key is (rcv port)+(dAdr) • result includes Port/Plugin, QiD, next-hop IP or MAC address, valid next-hop bit • for multicast, stored keys are (rcv port)+(dAdr)+(sAdr prefix) • lookup key is (rcv port)+(dAdr)+(sAdr) • result includes 10 bit copy vector, QiD • Filter lookup • stored key is IP 5-tuple + TCP flags – arbitrary bit masks allowed • lookup key is IP 5-tuple + flags if applicable • result includes Port/Plugin or copy vector, QiD, next-hop IP or MAC address, valid next-hop bit, primary-aux bit, priority • Destination MAC address passed through QM • via being written in the buffer descriptor? • Do we have 48 bits to spare? • Yes, we actually have 14 free bytes. Enough for a full (non-vlan) ethernet header.

  10. JST: Lookup Processing • On receiving unicast packet, do route & filter lookups • if MAC address returned by route (or higher priority primary filter) is valid, queue the packet and continue • else, pass packet to xScale, marking it as no-MAC • leave it to xScale to generate ARP request, handle reply, insert route and re-inject packet into data path • On receiving multicast packet, do route & filter lookups • take higher priority result from route lookup or primary filter • format MAC multicast address • copy to queues specified by copy vector • if matching auxiliary filter, filter supplies MAC address

  11. ONL NP Router (Jon’s Original) xScale xScale add largeSRAM ring TCAM SRAM HdrFmt (1 ME) Rx (2 ME) Parse, Lookup, Copy (3 MEs) Mux (1 ME) QueueManager (1 ME) Tx (2 ME) largeSRAM ring Stats (1 ME) • Each output has common set of QiDs • Multicast copies use same QiD for all outputs • QiD ignored for plugin copies Plugin Plugin Plugin Plugin Plugin SRAM xScale largeSRAM ring

  12. ONL NP Router (JDD) SRAM Ring xScale xScale Scratch Ring TCAM Assoc. Data ZBT-SRAM SRAM NN Ring NN 64KW Parse, Lookup, Copy (3 MEs) Rx (2 ME) Mux (1 ME) QM (1 ME) HdrFmt (1 ME) Tx (1 ME) NN Mostly Unchanged 64KW SRAM 64KW Each New NN NN NN NN Plugin3 Plugin4 Plugin0 Plugin1 Plugin2 SRAM xScale Needs A Lot Of Mod. Needs Some Mod. Tx, QM Parse Plugin XScale Stats (1 ME) FreeList Mgr (1 ME) QM Copy Plugins SRAM

  13. Project Assignments • XScale daemons, etc: Charlie • With Design and Policy help from Fred and Ken • PLC (Parse, Lookup and Copy): Jing and JohnD • With consulting from Brandon • QM: Dave and JohnD • Rx: Dave • Tx: Dave • Stats: Dave • Header Format: Mike • Mux: Mart? • Freelist_Mgr: JohnD • Plugin Framework: Charlie and Shakir • With consulting from Ken • Dispatch loop and utilities: All • Dl_sink_to_Stats, dl_sink_to_freelist_mgr • These should take in a signal and not wait • Documentation: Ken • With help from All • Test cases and test pkt generation: Brandon

  14. Project Level Stuff • Upgrade to IXA SDK 4.3.1 • Techx/Development/IXP_SDK_4.3/{cd1,cd2,4-3-1_update} • Project Files • We’re working on them right now. • C vs. uc • Probably any new blocks should be written in C • Existing code (Rx, Tx, QM, Stats) can remain as uc. • Freelist Mgr might go either way. • Stubs: • Do we need them this time around? • SRAM rings: • We need to understand the implications of using them. • No way to pre-test for empty/full? • Subversion • Do we want to take this opportunity to upgrade? • Current version: • Cygwin (my laptop): 1.3.0-1 • Linux (bang.arl.wustl.edu): 1.3.2 • Available: • Cygwin: 1.4.2-1 • subversion.tigris.org: 1.4.3

  15. Hardware • Promentum™ ATCA-7010 (NP Blade): • Two Intel IXP2850 NPs • 1.4 GHz Core • 700 MHz Xscale • Each NPU has: • 3x256MB RDRAM, 533 MHz • 3 Channels • Address space is striped across all three. • 4 QDR II SRAM Channels • Channels 1, 2 and 3 populated with 8MB each running at 200 MHz • 16KB of Scratch Memory • 16 Microengines • Instruction Store: 8K 40-bit wide instructions • Local Memory: 640 32-bit words • TCAM: Network Search Engine (NSE) on SRAM channel 0 • Each NPU has a separate LA-1 Interface • Part Number: IDT75K72234 • 18Mb TCAM • Rear Transition Module (RTM) • Connects via ATCA Zone 3 • 10 1GE Physical Interfaces • Supports Fiber or Copper interfaces using SFP modules.

  16. Hardware ATCA Chassis NP Blade RTM

  17. NP Blades

  18. ONL Router Architecture / 5x1Gb/s NPUA / 5x1Gb/s SPI • Each NPU is one 5-port Router • ONL Chassis has no switch Blade • 1Gb/s Links on RTM connect to external ONL switch(es) NPUB RTM 7010 Blade

  19. Performance • What is our performance target? • To hit 5 Gb rate: • Minimum Ethernet frame: 76B • 64B frame + 12B InterFrame Spacing • 5 Gb/sec * 1B/8b * packet/76B = 8.22 Mpkt/sec • IXP ME processing: • 1.4Ghz clock rate • 1.4Gcycle/sec * 1 sec/ 8.22 Mp = 170.3 cycles per packet • compute budget: (MEs*170) • 1 ME: 170 cycles • 2 ME: 340 cycles • 3 ME: 510 cycles • 4 ME: 680 cycles • latency budget: (threads*170) • 1 ME: 8 threads: 1360 cycles • 2 ME: 16 threads: 2720 cycles • 3 ME: 24 threads: 4080 cycles • 4 ME: 32 threads: 5440 cycles

  20. Tx, QM Parse Plugin XScale FreeList Mgr (1 ME) Stats (1 ME) QM Copy Plugins SRAM ONL NP Router xScale xScale TCAM Assoc. Data ZBT-SRAM SRAM 64KW HdrFmt (1 ME) Parse, Lookup, Copy (3 MEs) Rx (2 ME) Mux (1 ME) QM (1 ME) Tx (1 ME) NN 64KW SRAM 64KW Each SRAM Ring NN NN NN NN Plugin3 Plugin4 Plugin0 Plugin1 Plugin2 SRAM xScale Scratch Ring NN Ring NN

  21. Inter Block Rings • Scratch Rings (sizes in 32b Words: 128, 256, 512, 1024) • XScale  MUX • 3 Word per pkt • 256 Word Ring • 256/3 pkts • PLC  XScale • 3 Word per pkt • 256 Word Ring • 256/3 pkts • MUX  PLC • 3 Word per pkt • 256 Word Ring • 256/3 pkts •  QM • 3 Words per pkt • 1024 Word Ring • 1024/3 Pkts • HF  TX • 5 Word per pkt • 256 Word Ring • 256/5 pkts •  Stats • 1 Word per pkt • 256 Word Ring • 256 pkts •  Freelist Mgr • 1 Word per pkt • 256 Word Ring • 256 pkts • Total Scratch Size: 4KW (16KB) • Total Used in Rings: 2.5 KW

  22. Inter Block Rings • SRAM Rings (sizes in 32b KW: 0.5, 1, 2, 4, 8, 16, 32, 64) • RX  MUX • 2 Words per pkt • 64KW Ring • 32K Pkts • PLC  Plugins (5 of them) • 3 Words per pkt • 64KW Rings • ~21K Pkts • Plugins  MUX (1 serving all plugins) • 3 Words per pkt • 64KW Ring • ~21K Pkts • NN Rings (128 32b words) • QM HF • 1 Word per pkt • 128 Pkts • Plugin N  Plugin N+1 (for N=1 to N=4) • Words per pkt is plugin dependent

  23. ONL SRAM Buffer Descriptor • Problem: • With the use of Filters, Plugins and recycling back around for reclassification, we can end up with an arbitrary number of copies of one packet in the system at a time. • Each copy of a packet could end up going to an output port and need a different MAC DAddr from all the other packets • Having one Buffer Descriptor per packet regardless of the number of copies will not be sufficient. • Solution: • When there are multiple copies of the packet in the system, each copy will need a separate Header buffer descriptor which will contain the MAC DAddr for that copy. • When the Copy block gets a packet that it only needs to send one copy to QM, it will read the current reference count and if this copy is the ONLY copy in the system, it will not prepend the Header buffer descriptor. • SRAM buffer descriptors are the scarce resource and we want to optimize their use. • Therefore: We do NOT want to always prepend a header buffer descriptor • Otherwise, Copy will prepend a Header buffer descriptor to each copy going to the QM. • Copy does NOT need to prepend a Header buffer descriptor to copies going to plugins • Copy does NOT need to prepend a Header buffer descriptor to a copy going to the XScale • The Header buffer descriptors will come from the same pool (freelist 0) as the PacketPayload buffer descriptors. • There is no advantage to associating these Header buffer descriptors with small DRAM buffers. • DRAM is not the scarce resource • SRAM buffer descriptors are the scarce resource. • We want to avoid getting a descriptor coming in to PLC for reclassification with and the Header buffer descriptor chained in front of the payload buffer descriptor. • Plugins and XScale should append a Header Buffer descriptor when they are sending something that has copies that is going directly to the QM or to Mux and PLC for PassThrough.

  24. Offset (16b) Ref_Cnt (8b) ONL SRAM Buffer Descriptor Buffer_Next (32b) LW0 Buffer_Size (16b) LW1 Packet_Size (16b) Free_list 0000 (4b) Reserved (4b) Ref_Cnt (8b) LW2 MAC DAddr_47_32 (16b) Stats Index (16b) LW3 MAC DAddr_31_00 (32b) LW4 EtherType (16b) Reserved (16b) LW5 Reserved (32b) LW6 Packet_Next (32b) LW7 1 Written by Rx, Added to by Copy Decremented by Freelist Mgr Written by Freelist Mgr Written by Rx Written by Copy Written by Rx and Plugins Written by QM

  25. Buffer_Next (32b) Buffer_Size (16b) Offset (16b) Packet_Size (16b) Free_list 0000 (4b) Reserved (4b) Ref_Cnt (8b) MAC DAddr_47_32 (16b) Stats Index (16b) MAC DAddr_31_00 (32b) EtherType (16b) Reserved (16b) Reserved (32b) Packet_Next (32b) ONL DRAM Buffer and SRAM Buffer Descriptor 0x000 Empty • SRAM Buffer Descriptor Fields: • Buffer_Next: ptr to next buffer in a multi-buffer packet • Buffer_Size: number of bytes in the associated DRAM buffer • Packet_Size: total number of bytes in the pkt • QM (dequeue) uses this to decrement qlength • Offset: byte offset into DRAM buffer where packet (ethernet frame) starts. From RX: • 0x180: Constant offset to start of Ethernet Hdr • 0x18E: Constant offset to start of IP/ARP/etc hdr • However, Plugins can do ANYTHING so we cannot depend on the constant offsets. • The following slides will, however, assume that nothing funny has happened. • Freelist: Id of freelist that this buffer came from and should be returned to when it is freed • Ref_Cnt: Number of copies of this buffer currently in the system • MAC_DAddr: Ethernet MAC Destination Address that should be used for this packet • Stats Index: Index into statistics counters that should be used for this packet • EtherType: Ethernet Type filed that should be used for this packet • Packet_Next: ptr to next packet in the queue when this packet is queued by the QM 0x180 Ethernet Hdr 0x18E IP Packet 0x800

  26. Buffer_Next (32b) Buffer_Size (16b) Offset (16b) Packet_Size (16b) Free_list 0000 (4b) Reserved (4b) Ref_Cnt (8b) MAC DAddr_47_32 (16b) Stats Index (16b) MAC DAddr_31_00 (32b) EtherType (16b) Reserved (16b) Reserved (32b) Packet_Next (32b) ONL DRAM Buffer and SRAM Buffer Descriptor 0x000 Empty • Normal Unicast case: • One copy of packet being sent to one output port • SRAM Buffer Descriptor Fields: • Buffer_Next: NULL • Buffer_Size: IP_Pkt_Length • Packet_Size: IP_Pkt_Length • Offset: 0x18E • Freelist: 0 • Ref_Cnt: 1 • MAC_DAddr: <result of lookup> • Stats Index: <from lookup result> • EtherType: 0x0800 (IP) • Packet_Next: <as used by QM> 0x180 Ethernet Hdr 0x18E IP Packet 0x800

  27. Buffer_Next (32b) Buffer_Next (32b) Buffer_Size (16b) Buffer_Size (16b) Offset (16b) Offset (16b) Packet_Size (16b) Packet_Size (16b) Free_list 0000 (4b) Free_list 0000 (4b) Reserved (4b) Reserved (4b) Ref_Cnt (8b) Ref_Cnt (8b) MAC DAddr_47_32 (16b) MAC DAddr_47_32 (16b) Stats Index (16b) Stats Index (16b) MAC DAddr_31_00 (32b) MAC DAddr_31_00 (32b) EtherType (16b) EtherType (16b) Reserved (16b) Reserved (16b) Reserved (32b) Reserved (32b) Packet_Next (32b) Packet_Next (32b) ONL DRAM Buffer and SRAM Buffer Descriptor Header Buf Descriptor Payload Buf Descriptor • Multi-copy case: • >1 copy of packet in system • This copy going from Copy to QM to go out on an output port 0x000 0x000 Empty Empty 0x180 0x180 Empty Ethernet Hdr 0x18E 0x18E Empty IP Packet 0x800 0x800

  28. Buffer_Next (32b) Buffer_Next (32b) Buffer_Size (16b) Buffer_Size (16b) Offset (16b) Offset (16b) Packet_Size (16b) Packet_Size (16b) Free_list 0000 (4b) Free_list 0000 (4b) Reserved (4b) Reserved (4b) Ref_Cnt (8b) Ref_Cnt (8b) MAC DAddr_47_32 (16b) MAC DAddr_47_32 (16b) Stats Index (16b) Stats Index (16b) MAC DAddr_31_00 (32b) MAC DAddr_31_00 (32b) EtherType (16b) EtherType (16b) Reserved (16b) Reserved (16b) Reserved (32b) Reserved (32b) Packet_Next (32b) Packet_Next (32b) ONL DRAM Buffer and SRAM Buffer Descriptor • Multi-copy case (continued): • >1 copy of packet in system • This copy going from Copy to QM to go out on an output port • Header Buf Descriptor: • SRAM Buffer Descriptor Fields: • Buffer_Next: ptr to payload buf desc • Buffer_Size: 0 (Don’t Care) • Packet_Size: IP_Pkt_Length • Offset: 0 (Don’t Care) • Freelist: 0 • Ref_Cnt: 1 • MAC_DAddr: <result of lookup> • Stats Index: <from lookup result> • Different copies of the same packet may actually have different Stats Indices • EtherType: 0x0800 (IP) • Packet_Next: <as used by QM> Header Buf Descriptor Payload Buf Descriptor 0x000 0x000 Empty Empty 0x180 0x180 Empty Ethernet Hdr 0x18E 0x18E Empty IP Packet 0x800 0x800

  29. Buffer_Next (32b) Buffer_Size (16b) Offset (16b) Offset (16b) Packet_Size (16b) Free_list 0000 (4b) Reserved (4b) Ref_Cnt (8b) MAC DAddr_47_32 (16b) Stats Index (16b) MAC DAddr_31_00 (32b) EtherType (16b) Reserved (16b) Reserved (32b) Packet_Next (32b) ONL DRAM Buffer and SRAM Buffer Descriptor • Multi-copy case (continued): • >1 copy of packet in system • This copy going from Copy to QM to go out on an output port • Payload Buf Descriptor: • SRAM Buffer Descriptor Fields: • Buffer_Next: NULL • Buffer_Size: IP_Pkt_Length • Packet_Size: IP_Pkt_Length • Offset: 0x18E • Freelist: 0 • Ref_Cnt: <number of copies currently in system> • MAC_DAddr: <don’t care> • Stats Index: <should not be used> • EtherType: <don’t care> • Packet_Next: <should not be used> Header Buf Descriptor Payload Buf Descriptor Buffer_Next (32b) Buffer_Size (16b) Packet_Size (16b) Free_list 0000 (4b) Reserved (4b) Ref_Cnt (8b) MAC DAddr_47_32 (16b) Stats Index (16b) MAC DAddr_31_00 (32b) EtherType (16b) Reserved (16b) Reserved (32b) Packet_Next (32b) 0x000 0x000 Empty Empty 0x180 0x180 Empty Ethernet Hdr 0x18E 0x18E Empty IP Packet 0x800 0x800

  30. ONL SRAM Buffer Descriptor • Rx writes: • Buffer_size  ethernet frame length • Packet_size  ethernet frame length • Offset  0x180 • Freelist  0 • Mux Block writes: • Buffer_size  (frame length from Rx) -14 • Packet_size  (frame length from Rx) -14 • Offset  0x18E • Freelist  0 • Ref_cnt  1 • Copy Block initializes a newly allocated Hdr desc: • Buffer_Next to point to original payload buffer • Buffer_size  0 (don’t care, noone should be using this field) • Packet_size  IP Pkt Length (should be length from input ring) • Offset  0 (don’t care, noone should be using this field) • Freelist  0 • Ref_cnt  1 • Stats_Index  from lookup result • MAC DAddr  from lookup result (or calculated for Mcast) • EtherType  0x0800 IP • If copy is making copies then we must have done a classification so it must have been an IP packet • Packet_Next  0 • The QM will now be using the IP Pkt length for its qlength increments and decrements.

  31. SRAM Usage • What will be using SRAM? • Buffer descriptors • Current MR supports 229,376 buffers • 32 Bytes per SRAM buffer descriptor • 7 MBytes • Queue Descriptors • Current MR supports 65536 queues • 16 Bytes per Queue Descriptor • 1 MByte • Queue Parameters • 16 Bytes per Queue Params (actually only 12 used in SRAM) • 1 MByte • QM Scheduling structure: • Current MR supports 13109 batch buffers per QM ME • 44 Bytes per batch buffer • 576796 Bytes • QM Port Rates • 4 Bytes per port • Plugin “scratch” memory • How much per plugin? • Large inter-block rings • Rx  Mux •  Plugins •  Plugins • Stats/Counters • Currently 64K sets, 16 bytes per set: 1 MByte • Lookup Results

  32. SRAM Bank Allocation • SRAM Banks: • Bank0: • 4 MB total, 2MB per NPU • Same interface/bus as TCAM • Bank1-3 • 8 MB each • Criteria for how SRAM banks should be allocated? • Size: • SRAM Bandwidth: • How many SRAM accesses per packet are needed for the various SRAM uses? • QM needs buffer desc and queue desc in same bank

  33. Proposed SRAM Bank Allocation • SRAM Bank 0: • TCAM • Lookup Results • SRAM Bank 1 (2.5MB/8MB): • QM Queue Params (1MB) • QM Scheduling Struct (0.5 MB) • QM Port Rates (20B) • Large Inter-Block Rings (1MB) • SRAM Rings are of sizes (in Words): 0.5K, 1K, 2K, 4K, 8K, 16K, 32K, 64K • Rx  Mux (2 Words per pkt): 64KW (32K pkts): 128KB •  Plugin (3 Words per pkt): 64KW each (21K Pkts each): 640KB •  Plugin (3 Words per pkt): 64KW (21K Pkts): 256KB • SRAM Bank 2 (8MB/8MB): • Buffer Descriptors (7MB) • Queue Descriptors (1MB) • SRAM Bank 3 (6MB/8MB): • Stats Counters (1MB) • Global Registers (256 * 4B) • Plugin “scratch” memory (5MB, 1MB per plugin)

  34. Queues and QIDs • Assigned Queues vs. Datagram Queues • A flow or set of flows can be assigned to a specific Queue by assigning a specific QID to its/their filter(s) and/or route(s) • A flow can be assigned to use a Datagram queue by assigning QID=0 to its filter(s) and/or route(s) • There are 64 datagram queues • If it sees a lookup result with a QID=0, the PLC block will calculate the datagram QID for the result based on the following hash function: • DG QID = SA[9:8] SA[6:5] DA[6:5] • Concatenate IP src addr bits 9 and 8, IP src addr bits 6 and 5, IP dst addr bits 6 and 5 • Who/What assigns QIDs to flows? • The ONL User can assign QIDs to flows or sets of flows using the RLI • The XScale daemon can assign QIDs to flows on behalf of the User/RLI if so requested: • User indicates that they want an assigned QID but they want the system to pick it for them and report it back to them. • The ONL User indicates that they want to use a datagram queue and the data path (Copy block) calculates the QID using a defined hash fct • Using the same QID for all copies of a multicast does not work • The QM does not partition QIDs across ports • We cannot assume that the User will partition the QIDs so we will have to enforce a partitioning.

  35. Queues and QIDs (continued) • Proposed partitioning of QIDs: • QID[15:13]: Port Number 0-4 (numbered 1-5) • Copy block will add these bits • QID[12: 0] : per port queues • 8128 Reserved queues per port • 64 datagram queues per port • yyy1 0000 00xx xxxx: Datagram queues for port <yyy> • QIDs 64-8191: per port Reserved Queues • QIDs 0-63: per port Datagram Queues • With this partitioning, only 13 bits of the QID should be made available to the ONL User.

  36. Lookups • How will lookups be structured? • Three Databases: • Route Lookup: Containing Unicast and Multicast Entries • Unicast: • Port: Can be wildcarded • Longest Prefix Match on DAddr • Routes should be shorted in the DB with longest prefixes first. • Multicast • Port: Can be wildcarded? • Exact Match on DAddr • Longest Prefix Match on SAddr • Routes should be sorted in the DB with longest prefixes first. • Primary Filter • Filters should be sorted in the DB with higher priority filters first • Auxiliary Filter • Filters should be sorted in the DB with higher priority filters first • Priority between Primary Filter and Route Lookup • A priority will be stored with each Primary Filter • A priority will be assigned to RLs (all routes have same priority) • PF priority and RL priority compared after result is retrieved. • One of them will be selected based on this priority comparison. • Auxiliary Filters: • If matched, cause a copy of packet to be sent out according to the Aux Filter’s result.

  37. Route Lookup • Route Lookup Key (72b) • Port (3b): Can be a wildcard (for Unicast, probably not for Multicast) • Value of 111b in Port field can be used to denote a packet that originated from the XScale • Value of 110b in Port field can be used to denots a packet that originated from a Plugin • Ports numbered 0-4 • PluginTag (5b): Can be a wildcard (for Unicast, probably not for Multicast) • Plugins numberd 0-4 • DAddr (32b) • Prefixed for Unicast • Exact Match for Multicast • SAddr (32b) • Unicast entries always have this and its mask set to 0 • Prefixed for Multicast • Route Lookup: Result (96b) • Unicast/Multicast Fields (determined by IP_MCast_Valid bit (1:MCast, 0:Unicast) (13b) • IP_MCast Valid (1b) • MulticastFields (12b) • Plugin/Port Selection Bit (1b): • 0: Send pkt to both Port and Plugin. Does it get the MCast CopyVector? • 1: Send pkt to all Plugin bits set, include MCast CopyVector in data going to plugins • MCast CopyVector (11b) • One bit for each of the 5 ports and 5 plugins and one bit for the XScale, to drop a MCast, set MCast CopyVector to all 0’s • UnicastFields (8b) • Drop Bit (1b) • 0: handle normally • 1: Drop Unicast pkt • Plugin/Port Selection Bit (1b): • 0: Send packet to port indicated by Unicast Output Port field • 1: Send packet to plugin indicated by Unicast Output Plugin field. Unicast Output Port, QID, Stats Index, and NH fields also get sent to plugin • Unicast Output Port (3b): Port or XScale • 0: Port0, 1: Port1, 2: Port2, 3: Port3, 4: Port4 • Unicast Output Plugin (3b): • 0: Plugin0, 1: Plugin1, 2: Plugin2, 3: Plugin3, 4: Plugin4 • 5: XScale (treated like a plugin) • QID (16b) • Stats Index (16b) • NH_IP/NH_MAC (48b): At most one of NH_IP or NH_MAC should be valid • Valid Bits (3b): At most one of the following three bits should be set • IP_MCast Valid (1b) (Also included above) • NH_IP_Valid (1b) • NH_MAC_Valid (1b)

  38. D (1b) D (1b) D (1b) H (1b) H (1b) H (1b) M H (1b) M H (1b) M H (1b) Res (8b) Res (8b) Prio (8b) Address (21b) Address (21b) Address (21b) NH_MAC (48b) NH_MAC (48b) NH_MAC (48b) Res (16b) Res (16b) Res (16b) NH_IP (32b) NH_IP (32b) NH_IP (32b) Entry Valid (1b) NH IP Valid (1b) NH MAC Valid (1b) PPS (1b) IP MC Valid (1b) Multicast Copy Vector (11b) If IP MC Valid = 1 Reserved (4b) D (1b) PPS (1b) UCast Out Port (3b) UCast Out Plugin (3b) If IP MC Valid = 0 Lookup Key and Results Formats IP DAddr (32b) IP SAddr (32b) P Tag (5b) P (3b) Proto (8b) DPort (16b) SPort (16b) Exceptions (16b) TCP Flags (12b) 140 Bit Key: Port RL Plugin Tag PF and AF 32 Bit Result in TCAM Assoc. Data SRAM: 96 Bit Result in QDR SRAM Bank0: V (4b) UCast MCast (12b) QID (16b) Stats Index (16b) PF V (4b) S B (2b) R e s (2b) Uni Cast (8b) QID (16b) Stats Index (16b) AF V (4b) UCast MCast (12b) QID (16b) Stats Index (16b) RL TCAM Ctrl Bits: D:Done H:HIT MH:Multi-Hit

  39. Reserved (3b) IP MCV (1b) PPS (1b) Multicast Copy Vector (11b) 15 12 11 9 7 3 1 10 8 2 0 Reserved (3b) IP MCV (1b) Reserved (4b) D (1b) PPS (1b) UCast Out Port (3b) UCast Out Plugin (3b) 15 12 11 9 7 3 1 10 8 2 0 Route Lookup Multicast: IP MCV = 1 • Format of the UCast/MCast fields in Ring data going to XScale and Plugins: Unicast: IP MCV = 0

  40. Primary Filter • Primary Filter Lookup Key (140b) • Port (3b): Can be a wildcard (for Unicast, probably not for Multicast) • Value of 111b in Port field to denote coming from the XScale • Ports numbered 0-4 • PluginTag (5b): Can be a wildcard (for Unicast, probably not for Multicast) • Plugins numberd 0-4 • DAddr (32b) • SAddr (32b) • Protocol (8b) • DPort (16b) • Sport (16b) • TCP Flags (12b) • Exception Bits (16b): Allow for directing of packets based on defined exceptions • Primary Filter Result (104b) • Unicast/Multicast Fields (determined by IP_MCast_Valid bit (1:MCast, 0:Unicast) (13b) • IP_MCast Valid (1b) • MulticastFields (12b) • Plugin/Port Selection Bit (1b): • 0: Send pkt to ports and plugins indicated by MCast Copy Vector. • 1: Send pkt to plugin(s) indicated by MCast Copy Vector but not ports and send Plugin(s) the MuticastFields bits • MCast CopyVector (11b) • One bit for each of the 5 ports and 5 plugins and one bit for the XScale, to drop a MCast, set MCast CopyVector to all 0’s • UnicastFields (8b) • Drop Bit (1b) • 0: handle normally • 1: Drop pkt • Plugin/Port Selection Bit (1b): • 0: Send packet to port indicated by Unicast Output Port field • 1: Send packet to plugin indicated by Unicast Output Plugin field. Unicast Output Port, QID, Stats Index, and NH fields also get sent to plugin • Unicast Output Port (3b): Port or XScale • 0: Port0, 1: Port1, 2: Port2, 3: Port3, 4: Port4 • Unicast Output Plugin (3b): • 0: Plugin0, 1: Plugin1, 2: Plugin2, 3: Plugin3, 4: Plugin4 • 5: XScale (treated like a plugin) • QID (16b) • Stats Index (16b) • NH IP(32b)/MAC(48b) (48b): At most one of NH_IP or NH_MAC should be valid • Valid Bits (3b): At most one of the following three bits should be set • IP_MCast Valid (1b) (also included above) • NH IP Valid (1b) • NH MAC Valid (1b) • Priority (8b)

  41. Auxiliary Filter • Auxiliary Filter Lookup Key (140b) • Port (3b): Can be a wildcard (for Unicast, probably not for Multicast) • Value of 111b in Port field to denote coming from the XScale • Ports numbered 0-4 • PluginTag (5b): Can be a wildcard (for Unicast, probably not for Multicast) • Plugins numberd 0-4 • DAddr (32b) • SAddr (32b) • Protocol (8b) • DPort (16b) • Sport (16b) • TCP Flags (12b) • Exception Bits (16b) • Allow for directing of packets based on defined exceptions • Can be wildcarded. • Auxiliary Filter Lookup Result (92b) • Unicast Fields (7b): (No Multicast fields) • Plugin/Port Selection Bit (1b): • 0: Send packet to port indicated by Unicast Output Port field • 1: Send packet to plugin indicated by Unicast Output Plugin field. Unicast Output Port, QID, Stats Index, and NH fields also get sent to plugin • Unicast Output Port (3b): Port or XScale • 0: Port0, 1: Port1, 2: Port2, 3: Port3, 4: Port4 • Unicast Output Plugin (3b): • 0: Plugin0, 1: Plugin1, 2: Plugin2, 3: Plugin3, 4: Plugin4 • 5: XScale • QID (16b) • Stats Index (16b) • NH IP(32b)/MAC(48b) (48b): At most one of NH_IP or NH_MAC should be valid • Valid Bits (3b): At most one of the following three bits should be set • NH IP Valid (1b) • NH MAC Valid (1b) • IP_MCast Valid (1b): Should always be 0 for AF Result • Sampling bits (2b) : For Aux Filters only • 00: “Sample All” • 01: Use Random Number generator 1 • 10: Use Random Number generator 2 • 11: Use Random Number generator 3

  42. TCAM Operations for Lookups • Five TCAM Operations of interest: • Lookup (Direct) • 1 DB, 1 Result • Multi-Hit Lookup (MHL) (Direct) • 1 DB, <= 8 Results • Simultaneous Multi-Database Lookup (SMDL) (Direct) • 2 DB, 1 Result Each • DBs must be consecutive! • Care must be given when assigning segments to DBs that use this operation. There must be a clean separation of even and odd DBs and segments. • Multi-Database Lookup (MDL) (Indirect) • <= 8 DB, 1 Result Each • Simultaneous Multi-Database Lookup (SMDL) (Indirect) • 2 DB, 1 Result Each • Functionally same as Direct version but key presentation and DB selection are different. • DBs need not be consecutive. • Care must be given when assigning segments to DBs that use this operation. There must be a clean separation of even and odd DBs and segments.

  43. Lookups: Proposed Design • Use SRAM Bank 0 (2 MB per NPU) for all Results • B0 Byte Address Range: 0x000000 – 0x3FFFFF • 22 bits • B0 Word Address Range: 0x000000 – 0x3FFFFC • 20 bits • Two trailing 0’s • Use 32-bit Associated Data SRAM result for Address of actual Result: • Done: 1b • Hit: 1b • MHit: 1b • Priority: 8b • Present for Primary Filters, for RL and Aux Filters should be 0 • SRAM B0 Word Address: 21b • 1 spare bit • Use Multi-Database Lookup (MDL) Indirect for searching all 3 DBs • Order of fields in Key is important. • Each thread will need one TCAM context • Route DB: • Lookup Size: 68b (3 32b words transferred across QDR intf) • Core Size: 72b • AD Result Size: 32b • SRAM B0 Result Size: 78b (3 Words) • Primary DB: • Lookup Size: 136b (5 32b words transferred across QDR intf) • Core Size: 144b • AD Result Size: 32b • SRAM B0 Result Size: 82b (3 Words) • Priority not included in SRAM B0 result because it is in AD result

  44. Block Interfaces • The next set of slides show the block interfaces • These slides are still very much a work in progress

  45. Tx, QM Parse Plugin XScale FreeList Mgr (1 ME) Stats (1 ME) QM Copy Plugins SRAM ONL NP Router xScale xScale TCAM Assoc. Data ZBT-SRAM SRAM 64KW HdrFmt (1 ME) Parse, Lookup, Copy (3 MEs) Rx (2 ME) Mux (1 ME) QM (1 ME) Tx (1 ME) NN 64KW SRAM 64KW Each SRAM Ring NN NN NN NN Plugin3 Plugin4 Plugin0 Plugin1 Plugin2 SRAM xScale Scratch Ring NN Ring NN

  46. Tx, QM Parse Plugin XScale FreeList Mgr (1 ME) Stats (1 ME) QM Copy Plugins SRAM Buf Handle(32b) Eth. Frame Len (16b) Reserved (12b) InPort (4b) ONL NP Router xScale xScale TCAM Assoc. Data ZBT-SRAM SRAM 64KW HdrFmt (1 ME) Parse, Lookup, Copy (3 MEs) Rx (2 ME) Mux (1 ME) QM (1 ME) Tx (1 ME) NN 64KW SRAM 64KW Each SRAM Ring NN NN NN NN Plugin3 Plugin4 Plugin0 Plugin1 Plugin2 SRAM xScale Scratch Ring NN Ring NN

  47. Tx, QM Parse Plugin XScale FreeList Mgr (1 ME) Stats (1 ME) QM Copy Plugins SRAM Reserved (5b) Src (2b) PT (1b) 7 3 1 2 0 ONL NP Router xScale xScale TCAM Assoc. Data ZBT-SRAM SRAM 64KW HdrFmt (1 ME) Parse, Lookup, Copy (3 MEs) Rx (2 ME) Mux (1 ME) QM (1 ME) Tx (1 ME) NN Flags: Src: Source (2b): 00: Rx 01: XScale 10: Plugin 11: Undefined PT(1b): PassThrough(1)/Classify(0) Reserved (5b) Rsv (4b) Out Port (4b) Buffer Handle(24b) 64KW SRAM 64KW Each L3 (IP, ARP, …) Pkt Length (16b) QID(16b) Plugin Tag (5b) In Port (3b) Flags (8b) Stats Index (16b) SRAM Ring NN NN NN NN Plugin3 Plugin4 Plugin0 Plugin1 Plugin2 SRAM xScale Scratch Ring NN Ring NN

  48. Tx, QM Parse Plugin XScale FreeList Mgr (1 ME) Stats (1 ME) QM Copy Plugins SRAM Rsv (4b) Out Port (4b) Buffer Handle(24b) QID(16b) L3 (IP, ARP, …) Pkt Length (16b) ONL NP Router xScale xScale TCAM Assoc. Data ZBT-SRAM SRAM 64KW HdrFmt (1 ME) Parse, Lookup, Copy (3 MEs) Rx (2 ME) Mux (1 ME) QM (1 ME) Tx (1 ME) NN 64KW SRAM QM will not do any Stats Operations so it does not Need the Stats Index. 64KW Each SRAM Ring NN NN NN NN Plugin3 Plugin4 Plugin0 Plugin1 Plugin2 SRAM xScale Scratch Ring NN Ring NN

  49. Tx, QM Parse Plugin XScale FreeList Mgr (1 ME) Stats (1 ME) QM Copy Plugins SRAM V 1 Rsv (3b) Port (4b) Buffer Handle(24b) ONL NP Router xScale xScale TCAM Assoc. Data ZBT-SRAM SRAM 64KW HdrFmt (1 ME) Parse, Lookup, Copy (3 MEs) Rx (2 ME) Mux (1 ME) QM (1 ME) Tx (1 ME) NN 64KW SRAM 64KW Each SRAM Ring NN NN NN NN Plugin3 Plugin4 Plugin0 Plugin1 Plugin2 SRAM xScale Scratch Ring NN Ring NN

  50. Tx, QM Parse Plugin XScale FreeList Mgr (1 ME) Stats (1 ME) QM Copy Plugins SRAM V 1 Rsv (3b) Port (4b) Buffer Handle(24b) Ethernet DA[47-16] (32b) Ethernet DA[15-0](16b) Ethernet SA[47-32](16b) Ethernet SA[31-0] (32b) Ethernet Type(16b) Reserved (16b) ONL NP Router xScale xScale TCAM Assoc. Data ZBT-SRAM SRAM 64KW HdrFmt (1 ME) Parse, Lookup, Copy (3 MEs) Rx (2 ME) Mux (1 ME) QM (1 ME) Tx (1 ME) NN 64KW SRAM 64KW Each SRAM Ring NN NN NN NN Plugin3 Plugin4 Plugin0 Plugin1 Plugin2 SRAM xScale Scratch Ring NN Ring NN

More Related