1 / 116

Lecture on High Performance Processor Architecture ( CS05162 )

Lecture on High Performance Processor Architecture ( CS05162 ). TLP Architecture Case Study: Network Processors. An Hong han@ustc.edu.cn Fall 2007 University of Science and Technology of China Department of Computer Science and Technology. Outline . NP Overview What

leanna
Download Presentation

Lecture on High Performance Processor Architecture ( CS05162 )

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture on High Performance Processor Architecture(CS05162) TLP Architecture Case Study: Network Processors An Hong han@ustc.edu.cn Fall 2007 University of Science and Technology of China Department of Computer Science and Technology

  2. Outline • NP Overview • What • NP Functions, Objects, Evolution, Speeds • Network Processor Applications,Workload, and Benchmark • Categorization:Control and data planes • Characteristics • Requirements • Benchmarks • NP Architecture Modeling and Simulating CS of USTC AN Hong

  3. Outline • NP Architecture Case Study • Overview current products • Special Purpose Hardware Comparison • Pipelining Model Architecture • Multiprocessing Model Architecture • NP Architecture Characteristics and Core Technologies • Key characteristics of the NP architecture • Architectural approaches • ISA • Parallel • Memory • Programming Model CS of USTC AN Hong

  4. NP Overview CS of USTC AN Hong

  5. NP Overview • What: Network Processor(NP) is a programmable device that has been designed and highly optimized to perform networking functions. • NP Functions:Specially for network applications • Pattern matching(lookup addresses, bit-wise) • Data manipulation(TTL, CRC, SAR) • Queue and Buffer Management (QoS, rate, priority, ToS) • Statistics Gathering • NP Objects • Replace expensive ASIC in network device buildup • Provide platform solutions through programmability • Extending product life time through software update CS of USTC AN Hong

  6. Flexibility GPP Based NP Based ASIC Based Performance NP Overview • NP evolution • GPP(General-purpose Processor) • Programmable, Not optimized for networking applications • ASIC(Application Specific Integrated Circuit) • High processing capacity,High design complexity, long time to develop, Lack the flexibility) • NP(Network Processor) • ASIC’s performance + GPP’s flexibility • Cheaper than GPP • ~30 companies offering network processors; 350 design wins CS of USTC AN Hong

  7. Router interface card Host processor card Fabric/ I/O Fabric I/O RISC CPU as host processor and Packet processor Backplane/bus MAC/ framer PHY Memory History of Packet Processing • The Classic Router • Centralized CPU router architecture CS of USTC AN Hong

  8. Router interface card Host processor card Backplane/bus RISC CPU as host processor Fabric/ I/O Fabric I/O RISC CPU as host processor MAC/ framer PHY Memory History of Packet Processing • Emergence of Fast and Slow Path Processing • Distributed CPU router architecture CS of USTC AN Hong

  9. Switch interface card Host processor card Backplane/bus ASIC as host processor Fabric/ I/O Fabric I/O RISC CPU as host processor MAC/ framer PHY Memory History of Packet Processing • Hybridization of Routers and Switches • Layer-2 switch based on distributed packet processing using ASICs CS of USTC AN Hong

  10. NP Overview • Why GPP cannot keep up? • Moore’s law can NOT keep up with the network processing speed requirement! • NP speeds • 1994-1996 OC-3 (155Mbps, ns) • 1997-1999 OC-12 (625Mbps, 640ns) • 2000-2001 OC-48 (2.5Gbps, 160ns) • 2002-2003 OC-192 (10Gbps, 40ns) • 2003-2005 OC-768 (40Gbps, 10ns) CS of USTC AN Hong

  11. NP Overview • Why ASICs are not the answer? • Four factors preventing ASIC-centered designs • IP-based protocols are still evolving • Layer-2 protocols are in a greater degree of flux than ever • Increasing Packet Processing Complexity • Time-to-Market Pressures • NP is to address such a need • Time to market(TTM) • Time in market(TIM) • Expanded functionality • Leverage third-party development of applications CS of USTC AN Hong

  12. NP Overview • Where do NPs fit in a system? • A networking device can be broken down into four overall functions: • Host processing • PHY(physical) layer processing • Switching • Packet processing • Framing • Parsing/Classification • Modification • Encryption/compression • Queuing CS of USTC AN Hong

  13. Host processing(slow path and/or control functions) Packet processing PHY layer Framing Classification Modification Encryption/ compression Queuing Switching NP Overview • Packet processing architecture CS of USTC AN Hong

  14. Components of a Generic Router CS of USTC AN Hong

  15. NP Overview • NP in a router application Switch Other line cards Line interface, conditioning, framing NP Memories,CAMs, special functions Line card Host control processor CS of USTC AN Hong

  16. Packet Processing in an IP router 1. Accept packet arriving on an incoming link. 2. Lookup packet destination address in the forwarding table to identify outgoing port(s). 3. Edit packet header: e.g., decrement TTL, update header checksum. 4. Send packet to the outgoing port(s). 5. Buffer packet in the queue. 6. Transmit packet onto outgoing link. CS of USTC AN Hong

  17. Another View of an IP Router Routing Protocols Control Plane Routing Table Datapath per-packet processing Forwarding Table Switching CS of USTC AN Hong

  18. Forwarding Table Dest-network Port 65.0.0.0/8 3 128.9.0.0/16 1 149.12.0.0/19 7 Packet Forwarding Engine Packet header payload Router Routing Lookup Data Structure Destination Address Outgoing Port CS of USTC AN Hong

  19. Number of Prefixes 10,000/year 95 96 97 98 99 00 Year Size of the Forwarding Table Source: http://www.telstra.net/ops/bgptable.html CS of USTC AN Hong

  20. Lookup Rate Required • 应用对网络处理器的性能要求(平均包大小设为典型值64字节) CS of USTC AN Hong

  21. Performance Estimation • 10Gbps Core Router • Functions: transport packets @ OC-192 • Running @ 200Mhz = 200MIPS • Assumption: 1MIPS for 1MBits I/O and 1Mbytes Mem. • Estimation: • #uP = 10G/200 = 50 !!! • Memory: 10GBytes !!! • Solutions: • Coprocessors: • IP forwarding , Classification, and CRC and checksum • Multithreading • Memory hierarchy CS of USTC AN Hong

  22. NP Design Challenges • As GPP and ASIC • External memory bandwith • Power dissipation • Pin limitations • Packaging • Verification • NP special • Line speed • Real-time, link-rate processing • Application complexity • Applications that operate on individual packet headers(e.g., routing and forwarding) • Applications that operate principally on individual packet payloads(e.g., transcoding) • Applications that operate across multiple packets within a single flow(e.g., certain encryption algorithms) or across multiple flows(e.g., QoS and traffic shaping). A “flow” is considered to be a single source-destination session CS of USTC AN Hong

  23. NP Design Challenges • Other NP special • Port density • High-level of device integration(on-chip interfaces and controllers for external memories, switch fabrics, co-processors, network interfaces, etc.) • Management of critical shared resources in a chip-multiprocessor environment(e.g., shared program state, memory interfaces); • Compiler and software design for high-performance, real-time, parallel, and heterogeneous systems • Real-time system verification CS of USTC AN Hong

  24. NP Design Techniques • Application-specific Architectures • Extending the RISC instruction set • Use of customized on-chip or off-chip hardware assists • Parallelism • Thread-level parallelism • Instruction-level parallelism • Microarchitectures • Multiple processors • Pipelined processors CS of USTC AN Hong

  25. NP Application,Workload,and Benchmark CS of USTC AN Hong

  26. Application • Need to understand applications before understanding “application-specific” devices • Kernels • Control processing: Encompasses a large number of different tasks that usually do not need to be performance at wire speed. • Pattern matching:Header parsing • Packet classification:indentification of the packet type and attributes • Lookup: based on a key to find a specific entry in a table • Data manipulation:modifies the packet header • Field computation:Chechsum, CRC, time-to-live field decrement, data encryption • Queue management:Scheduling and storage of incoming and outgoing packet data units CS of USTC AN Hong

  27. Application Categorization • NP Applications • Carrier-class metro and core • Multi-service edge and access network • Enterprise and Ethernet edge • Storage Networks • Network Security • NP Application system • Routers • Switches • Firewalls • …… CS of USTC AN Hong

  28. Application Categorization • Tasks and services • Routing table lookup • Determine the next hop for incoming packets • Packet Classification • classify packets using header fields against a set of rules • URL-based Switching • Distribute HTTP requests based on URLs • Transcoding • Encryption/Decryption, intrusion detection, firewall, access control checking, denial-of-service CS of USTC AN Hong

  29. Policy Applications Network Management Control Plane Signaling Topology Management Queuing / Scheduling Data Transformation Data Plane Classification Data Parsing Media Access Control Physical Layer Application Categorization Processing Tasks • All tasks required for control and manament of the NPU. For example • Tables maintenance(classification tables, routing tables, QoS tables...) • Ports state • Timing & signaling to all components: Pes, switch-fabric, Queues… • Traffic management • queuing, scheduling & Policing • Transformation of packet data between layers(protocols) • Identify packets aginst a criteria: flow, QoS … • Parsing packets heather to extract protocol information • Low-level protocol implementation: Ethernet, ATM… CS of USTC AN Hong

  30. Application Categorization • Control-Plane tasks • Less time-critical • Control and management of device operation • Table maintenance, port states, etc. • Data-Plane tasks • Operations occurring real-time on “packet path” • Core device operations • Receive, process and transmit packets CS of USTC AN Hong

  31. Data Plane Tasks • Media Access Control • Low-level protocol implementation • Ethernet, SONET framing, ATM cell processing, etc. • Data Parsing • Parsing cell or packet headers for address or protocol information • Classification • Identify packet against a criteria (filtering / forwarding decision, QoS, accounting, etc.) • Data Transformation • Transformation of packet data between protocols • Traffic Management • Queuing, scheduling and policing packet data CS of USTC AN Hong

  32. Data Plane operations -- examples • Priority based QoS mechanism • Supports different levels of QoS for each output port • Contains QoS policy table prioritizing packets • Ingress operations • Applies QoS policy on the packet received • Gets the packet priority from its heather content • Place the packet in the appropriate output queue • Egress operations • Identifies & schedules highest priority packet for transmission • Transmits the identified packet on to the output port • Security • Encryption/Decryption, intrusion detection, access control checking, denial of-service service CS of USTC AN Hong

  33. Data Plane operations -- examples • Monitoring • Capturing usage patterns, time information • Load Balancing • Distribution of traffic among servers according to the server load, content and client credentials load, co CS of USTC AN Hong

  34. Protocol Processing Characteristics • Protocol processing requires intensive memory operations. Memory speed determines the system performance. • Protocol processing requires powerful bit manipulation. • Layer2-4 protocols require error detection (Computation).e.g. CRC and checksum • Multi-service (multi-protocol) coexist CS of USTC AN Hong

  35. Packet Application Characteristics • Packet coverage: – Header only, or Header + Payload • Packet parsing: –Is the data location known/static? • QoS • Classification and Queuing • States are maintained between packets – Statefull analysis CS of USTC AN Hong

  36. IPv4 Routing table lookup • Routers determine next hop and forward packets P P P B A Router C CS of USTC AN Hong

  37. Routing Table Lookup is a Searching Extensive Task • Search operation is not an exact match • Direct lookup needs 4G entries (32 bits IP address) • Longest prefix match • Tries • Hashing table • Balanced tree CS of USTC AN Hong

  38. Trie-based Routing Table Lookup • Trie block keeps pointers to route entry and other trie blocks • Destination IP address bits are examined group by group (4-bit) rt_ptr trie_ptr 0 Trie block Next hop 3 Next hop 4 15 Next hop 1 Next hop 2 CS of USTC AN Hong

  39. Example rt_ptr trie_ptr 0 Next hop 3 Next hop 4 15 Next hop 1 Next hop 2 Packet destination IP address =0x13fe2233 (0001,0011,1111,1110,…) CS of USTC AN Hong

  40. Encoding prefixes as ranges 1* -> [100000,111111] 101* -> [101000,101111] 10101* -> [101010,101011] 1 0 0 0 0 0 1 0 1 0 0 0 1 0 1 0 1 0 1 0 1 0 1 1 1 0 1 1 1 1 1 1 1 1 1 1 101011 101110 111110 L L L H H L L,H H H Narrowest enclosing Range containing A A IP Lookups using Multi-way Multi-column Search Illustration of the idea with 6-bit address Prefixes: 1* 101 * 10101* 1 0 0 0 0 0 1 0 1 0 0 0 1 0 1 0 1 0 101011 101110 111110 Binary search does not work with variable length strings. • end up far away from the matching prefix • Multiple addresses that match to different prefix, end up in the same region CS of USTC AN Hong

  41. 3-way tree for 8 keys P1: 1* -> [100000,111111] P2: 101* -> [101000,101111] P3: 10101* -> [101010,101011] k3 k6 Any region in the binary search between two consecutive numbers corresponds to a unique prefix k1 k2 k4 k5 k7 k8 > = p1) 1 0 0 0 0 0 p1 p1 p2) 1 0 1 0 0 0 p2 p2 p3) 1 0 1 0 1 0 p3 p3 1 0 1 0 1 1p2p3 1 0 1 1 1 1p1p2 1 1 1 1 1 1 - p1 info info info info info info info info info Multi-column search is used for wide address such as IPv6 W/M words A C E A D C A M W B M W B N X B N Y Probe 1 2 3 Multi-way multi-column search CS of USTC AN Hong

  42. Packet forwarding tasks • Header parsing • This consists of pattern matching of bits in the header field • Packet classification • Identification of the packet type (e.g. IP, MPLS,ATM)and attributed(e.g. quality of service requirement, encryption type ) • Lookup Consists of looking up data based on a key. It is mostly used in conjunction with pattern matching to find a specific entry in a table • Computation • This varies widely by application. Examples include checksum,CRC, time-to-live field decrement, and data encryption CS of USTC AN Hong

  43. Packet forwarding tasks • Data manipulation • Any function that modifies the packet header • Queue management • Scheduling and storage of incoming and outgoing packet data units • Control processing • Encompasses a large number of different tasks that usually do not need to be performed at wire speed. These are usually performed on a standard RISC processor linked to the NPU. CS of USTC AN Hong

  44. Packet Classification • Routers are required to distinguish packets for • Flow identification • Fair sharing of bandwidth • QoS • Security • Accounting, billing • etc • Packets are classified by rules • Src IP, Dest IP, src port #, dest port # etc • Classification Algorithm Metrics • Search speed • Storage cost • Scalability • Updates • Etc. CS of USTC AN Hong

  45. Classification: Hierarchical tries • Extension of the one dimensional radix trie • Construct trie recursively: • Contruct F1-trie on the set of prefix {Rj1} • For each prefix p in F1-trie, we recursively construct (d-1) dimensional hierarchical trie on rules where {Rj:Rj1=p} Pankaj Gupta and Nick McKeown,"Algorithms for Packet Classification", IEEE Network Special Issue, March/April 2001, vol. 15, no. 2, pp 24-32. CS of USTC AN Hong

  46. Classification: Bitmap-intersection The set of rules S that a packet matches is the intersection of d sets, Si Where Si is the set of rules that match the packet in the i-th dimension alone. 0 0 Pankaj Gupta and Nick McKeown,"Algorithms for Packet Classification", IEEE Network Special Issue, March/April 2001, vol. 15, no. 2, pp 24-32. CS of USTC AN Hong

  47. URL-based switching • Increase efficiency • Tasks • Traverse the packet data (request) for each arriving packet and classify it: • Contains ‘.jpg’ -> to image server • Contains ‘cgi-bin/’ -> to application server www.yahoo.com Internet Image Server IP TCP APP. DATA Application Server Switch GET /cgi-bin/form HTTP/1.1 Host: www.yahoo.com… HTML Server Source: Network Processor Tutorial in Micro 34 - Mangione-Smith & Memik CS of USTC AN Hong

  48. Corporate Network Mpeg encoder Video-on-demand server Transcoder Internet Media Player Transcoders • Two important requirements • If the receiver is not capable of interpreting the stored data (multimedia transcoders) • wireless receivers, hand-held devices, etc. • Compression for bandwidth and storage efficiency Source: Network Processor Tutorial in Micro 34 - Mangione-Smith & Memik CS of USTC AN Hong

  49. NP Workloads and Benchmarks • Available: • NPBench • 10 applications • CommBench[4] • 8 applications • http://ccrc.wustl.edu/~wolf/cb/ • NetBench[3] • 10 applications • http://cares.icsl.ucla.edu/NetBench • MiBench[2] • EEMBC • http://www.eembc.org/benchmark • MediaBench • Transcoders • Some communications applications CS of USTC AN Hong

  50. 三个主要的Benchmark CS of USTC AN Hong

More Related