1 / 18

February 2005

A Virtual-Machine Approach to Creating Complex NPU Applications in the Blink of an Eye. February 2005. Empowering Network Processors. Very-high-level packet processing language Virtual machine abstracting NPU details Built-in functionality for deep packet processing.

thimba
Download Presentation

February 2005

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Virtual-Machine Approach to Creating Complex NPU Applications in the Blink of an Eye February 2005

  2. Empowering Network Processors Very-high-level packet processing language Virtual machine abstracting NPU details Built-in functionality for deep packet processing • Premise: network processors will be a core building block of next-generation networking equipment • Programmability • Versatility • Performance PPL application Virtual machine Virtualized packet processor IP Fabrics Application in NPU microcode Network processor But major obstacle is difficulty of programming network processors And code is architecture and model specific Intel and others Value proposition: • Faster time to market • Lower development and lifetime costs • Scaleable to new silicon • Portable to different architectures • Enable a larger community to use NPUs ASICs Merchant silicon Gen purpose processors

  3. The NPU Catch Special form of the SRAM inst 41 55 45 Local_CSR inst ME (Micro-engine) 8 512 (128 x 4) CAP inst “Run Of the mill instruc-tions” 266 128 8 17 CAP inst MSF inst 256 59 PCI inst 256x16 Access to the transfer and local CSR registers of any other ME 12 128 41x16 What makes NPUs so powerful as solutions to networking-systems design is also what makes their software development a significant challenge • Application software needs to manage interconnect, memory overlap, caching, etc • C programs still very low level, highly machine dependent Massive register resources (in IXP2800, 15,452 software visible registers not counting local mem and CAMs; 25,948 counting these) Parallelism (microengines, hardware threads) Multiple memory types Small program memory space OS Cipher units, hash, CAMs, rings, signals, etc No OS underneath Pages of programming architecture quirks, errata

  4. Virtual Machine Approach to NPU Software Group1: Policy PATTERNS DATABASE($idslist) FIND(Rr1,0,Fuf) Intruderset: Policy ASSOCIATE NUMBER(10000) SEARCHKEYS(IP_SOURCE) TIMEOUT(10000) Intruders: Policy RECALL SEARCHKEYS(IP_SOURCE) LINKED(Intruderset) Secure1: Policy CRYPTO TRANSFORM(3DES,SHA) TIMEOUT(3000) TUNNEL(10.0.42.32) Diversion: Policy PACKET INSERT(PREP,header_size,0) Rule EQ(TCP_SYN,1) EQ(TCP_RST,1) DROP # Protocol anomaly Rule EQ(TCP_SYN,1) EQ(TCP_FIN,1) DROP # Another protocol anomaly Rule EQ(IP_SOURCE,MYIPADDR) DROP # Source spoofed packet Rule EQ(IP_SOURCE,public) APPLY(Intruders) Rule EQ(ac,0) DROP # Previously detected intruder Rule NE(IP_DEST,MYIPADDR) EQ(ICMP_TYPE,ECHO) DROP # no pings to the inside Rule EQ(IP_PROT,ICMP) EQ(IP_MF,1) DROP # fragmented ICMP is DoS attack Rule SCAN(”|0D0A5B52504C5D3030320D0A|”) JUMP(found_subseven_trojan) Rule EQ(IP-DEST/24,190.10.10.0) SET(R0,192.68.0.0) ADD(R0,IP_DEST/0.0.0.255) SET(IP_DEST,R0) #Xlate 190.10.10.X to 192.68.0.X Rule EQ(IP_DEST/24,boston_gateway) EQ(IP_SOURCE,portland_gateway/24) APPLY(secure1) FORWARD(1) Rule EQ(IP_DEST/24,190.10.10.0) APPLY(Group1 FORWARD(2) . . . Dynamic peephole firewall SIP proxy/ offload Packetcable layer 7 traffic management Content specific filters (e.g., email spam) Lawful content listening Session Border Controller Encrypted content switch Content specific DoS attacks Two-way encryption gateway Layer 7 bandwidth monitoring Layer 7 protocol specific firewall Dynamic intrusion blocking TCP offload Intrusion signature scans IPSec VPN Basic firewall Layer 4 load balance Layer 7 content switch Layer 3,4 DoS attacks PPL compiler PPL virtual machine NPU

  5. Two Routes N P U ? ? DRAM transfer registers N P U Virtual Machine Context arbitration 16 microengines PPL Language • PPL: A very-high-level functional language to express packet processing • Virtual machine on NPU fully exploits parallelism while hiding it • PPL also includes very powerful primitives, e.g., • Scan packet payload • Match payload to regular expression • Encryption/authentication • Manage connections (e.g., TCP, SIP) • Manage “superpackets” • High-speed multi-pattern matching PPL Language 128 hardware threads Thread signals Errata Instruction sequence restrictions Inter-instruction timing Next neighbor registers 640 word local memories Dispatch loops Scratch rings A and B register banks Processor synchronization ALU instructions Aligned accesses only Byte index register No OS Register scope SRAM transfer registers Register lifetime 90% of time spent on underlying tools, devices, details 10% of time spent on application value Very specific to NPU model and family 90% of time spent on application value Scaleable Portable

  6. PPL – a Fundamentally Different Approach Time/$ spent on application value Tools to help you write and debug microcode. And far removed from the world of packet processing. You still need to understand the NPU’s microcode environment, create the microcode, debug it, maintain it. Application machine. You think about packet processing and express your application in a very-high-level application language. R&D focus is on the value-add in the application, not the many many details of the NPU. NPU tools PPL virtual-machine environment Time/$ spent on underlying tools/devices Time/$ spent on applica-tion value Therefore huge benefits in • Time to market • Life cycle software costs • Number of NPU experts needed • Scalability to new silicon (up and down)

  7. Comer Bump in the Wire Example Complete PPL program (the only code you write) is Define port80counter=”Rg20” Event(0) Rule EQ(IP_PROT,TCP) EQ(L4_DPORT,80) ADD(port80counter,1) Rule FORWARD Write the data-plane code that examines each IP packet to determine if it is TCP and destined for port 80 (HTTP). Count them. And forward all packets. • A major undertaking if you sit down to attempt this in an assembly-language or C program. • The closest thing we know about (Agere’s FPL) was 76 FPL lines in Agere’s submission to Comer’s web site, and we found two serious bugs in Agere’s code that don’t exist in the PPL code: • If a packet is a fragment, the Agere code can mistake it for something with a TCP header • If a packet’s layer 3 or 4 headers are malformed or malicious, behavior is unpredictable

  8. PPL PPL program Policy Policy … Event Rule Rule … Event Rule Rule … Event Rule Rule … Event Rule Rule … Event Rule … Logical port 82 • Powerful, easy to use, functional (not procedural) language • Main elements - rules, policies, events • Rule expression(s) action(s) • Event: rules that are processed together • Policies: major algorithms and state machines • Defines strong concurrency, yet hides all parallelism in the NPU • All rules are evaluated concurrently. The actions of true rules in an event are processed sequentially. • Events are processed concurrently (i.e., rules in separate events are processed concurrently). • Multiple instances of the same event also process concurrently. Rules apply policies Logical ports 4-7 Logical ports 0,1 Exceptions Start up

  9. Example of a Rule Rule EQ(IP_DEST/16,iptable(1)) EQ(TCP_SYNONLY,1) APPLY(tcpconn) Means: If the upper 16 bits of the IP destination address match entry 1 in array iptable, and if the packet is a TCP packet with only the SYN flag set, apply the policy labeled tcpconn

  10. Easy and Powerful • Highly robust – prevents many errors and security holes • Layer-2 interfaces are built in • Ethernet, PoS, ATM, SPI4, CSIX, PCI • Many powerful packet-processing elements built in, e.g., • Payload scanning (absolute and regular expression) • Automatic connection lookup/tracking (e.g., TCP, SIP) • Content-addressable tables • Rate computation • Encryption/authentication • High-speed, large database, multipattern matching • Header insertion/stripping • Management of, and operations on, superpackets • Interface to non-PPL programs in data-, control-, or mgmt plane

  11. PPL Rules Rule expression expression … action action … Expression examples Value examples (used in expressions, actions, policies) Action examples

  12. PPL Policies

  13. Complete Example Define myregex = “re “”GET.*?redirect.html[[:space:]]*?HTTP/1.*?Cookie:””” Source_track: Policy ASSOCIATE NUMBER(100000) SEARCHKEYS(IP_SOURCE) Event(0) Rule EQ(IP_PROT,TCP) EQ(L4_DPORT,80) SCAN(myregex) APPLY(Source_track) Rule Forward Stop This is the complete program – i.e., this is the entirety of what you’d have to write for the data plane of the Intel IXP 2xxx Application: Examine all packets going to TCP port 80 to see if they are a GET HTTP transaction with a URL ending with ‘redirect.html’ and containing a session cookie. For each that is found, store its IP source address in a table (unless it previously exists in the table). Then forward the packet.

  14. PPL DeviceMap Statement How one describes their hardware to the virtual machine and controls configuration and mappings. DeviceMap NPU(2850,1400) AVAILABLE_PROCESSORS(1,15) PPL_PROCESSORS(ER(10%),AE(70%)) PACKET_MEM(DRAM,128000) CONNECTIONS_MEM(DRAM,16000) ARRAY_MAP(SERVLIST,0,ext_$$pdkserv) LINK(0,inout,GE_ON_SPI,0,1518,0,0, 0,0,IXF1010,0) LINK(156,out,PCI) PROG(excep_recorder,CONTROL) NPU is IXP2850 with clock speed of 1400 MHz Microengines 1-15 are available to PPL virtual machine (meaning 0 is being reserved for something else) Follow suggestion of allocating 10% of microengine cycles to Ethernet receive, 70% to PPL action processing, and best use of remaining 20% Allow 128 MB for packet memory in DRAM. Allow 16 MB for connection tables in DRAM. For the array SERVLIST in the PPL program, physically map it to control-plane symbol ext_$$pdkserv) Define a network interface as logical port 0; it is GigE SPI-4 port 0 and port 0 in MAC IXF1010 Define logical port 156 as an output only port over PXD Define a control-plane interface name to which the PPL PROGRAM policy can invoke

  15. Interfacing to Outside Programs • FORWARD packet • PROGRAM to invoke XScale program (RPC) • Share memory • FORWARD packet • PROGRAM to invoke remote program Intel Portability Framework and NPF APIs Software on an IA host processor Software on XScale control plane PPL program • Share memory • Enqueue on PPL VM input ring • Send packet to PPL event • Send packet to anywhere PPL program can • Invoke PPL event (RPC) • Enqueue on a ring • Share memory • Share memory • Enqueue on PPL VM input ring Custom or customer NPU microcode

  16. PPL Summary • Powerful, easy to use, functional (not procedural) language • Main elements are rules, policies, events • Defines strong concurrency, yet hides all parallelism in the NPU • Highly robust – prevents many errors and security holes • Many powerful packet-processing elements built in, e.g., • Payload scanning (absolute and regular expression) • Automatic connection lookup/tracking (e.g., TCP, SIP) • Content-addressable tables • Rate computation • Encryption/authentication • High-speed, large database, multipattern matching • Header insertion/stripping • Management of, and operations on, superpackets • Interface to non-PPL programs in data-, control-, or mgmt plane

  17. Complete Software Solution PPL debug GUI • Be running in, literally, days • No need to use Intel SDK, Intel microcode, learn the IXP programming details, etc unless you want to write low-level microcode PPL compiler PPL transactor Windows or Linux computer Customer PPL PPL applications e.g., signature analysis, IPv4/v6 translation, layer 7 content switch, encryption gateway, … Customer control plane software PPL virtual machine Control plane interfaces (ie,NPF APIs) Customer mgmt plane software Receivers/transmitters for Ethernet, CSIX, PCI, POS/PPP, … Extensions for high-speed multi-pattern searching, IPSec, superpackets, PXD, etc PPL system initialization, PPL debug, logging, stats PXD high-speed packet interface NPU data-plane microengines XScale “Pentium”

  18. Translated to Time and Cost Time to Market Develop NPU hardware and data-plane software from scratch Deploy off-the-shelf NPU hardware and PPL for data-plane software Months Functional, measurable, live prototype available NPU Software Development Cost $ million NPU Software Life-Cycle Cost* Subscription and royalty * includes maintenance, product enhancement, one port to different NPU model $ million

More Related