1 / 17

A High Performance PlanetLab Node

This project aims to create a high-performance PlanetLab node that is compatible with the current version of PlanetLab. The goal is to substantially improve performance by implementing a phased development process, with long-term goals of single node appearance and dynamic code configuration.

Download Presentation

A High Performance PlanetLab Node

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A High Performance PlanetLab Node Jon Turnerjon.turner@wustl.edu http://www.arl.wustl.edu/arl

  2. Objectives • Create system that is essentially compatible with current version of PlanetLab. • secure buy-in of PlanetLab users and staff • provide base on which resource allocation features can be added • Substantially improve performance. • NP blade with simple fast path can forward at 10 Gb/s for minimum size packets • standard PlanetLab node today forwards 100 Mb/s with large packets • multiple GPEs per node allow more resources per user • Phased development process • long term goals include appearance of single PlanetLab node and dynamic code configuration on NPs • phased development provides useful intermediate steps that defer certain objectives • Limitations • does not fix PlanetLab’s usage model • each “slice” limited to a Vserver plus a slice of an NP

  3. Development Phases • Phase 0 • node with single GP blade hosting standard PlanetLab software • Line Card and one or more NP blades to host NP-slices • NP-slice configuration server running in privileged Vserver • invoked explicitly by slices running in Vservers • NP blades with static slice code options (2), but dynamic slice allocation • Phase 1 • node with multiple GP blades, each hosting standard PlanetLab software and with own externally visible IP address • separate control processor hosting NP-slice configuration server • expanded set of slice code options • Phase 2 • multiple GPEs in unified node with single external IP address • CP retrieves slice descriptions from PLC and creates local copy for GPEs • CP manages use of external port numbers • transparent login process • dynamic NP code installation

  4. Phase 0 Overview GPE NPE ... Switch LC • System appears like a standard Plab node. • single external IP address • alternatively, single address for whole system • Standard PlanetLab mechanisms control GPE. • Node Manager periodically retrieves slice descriptions from PlanetLab Central • configures Vservers according to slice descriptions • supports user logins to Vservers • Resource Manager (RM) runs in privileged Vserver on GPE and manages NP resources for user slices. • NP slices explicitly requested by user slices • RM assigns slices to NPEs (to balance usage) • reserves port numbers for users • configures Line Cards and NPs appropriately • Line Cards demux arriving packets using port numbers.

  5. Using NP Slices exception packets useinternal port numbers GPE NPE ... ... VS Switch • External NPE packets use UDP/IP. • NPE slice has ≥1 external port. • LC used dport number todirect packet to proper NPE. • NPE uses dport number to directpacket to proper slice. • Parse block of NPE slice gets: • bare slice packet • input meta-interface,source IP addr and sport • Format block of NPE sliceprovides: • bare slice packet • output meta-interface, dest IP addr and dport for next-hop • NPE provides multiple queues/slice. use dportto demux,determine MI map MIto sport LC IPH IPH daddr=thisNode daddr=nextNode slicepkt slicepkt

  6. Managing NP Usage GPE RM VS NPE ... Switch LC • Resource Manager assigns Vservers to NP-slices on request. • user specifies which of several processing code options to use • RM assigns to NP with requested code option • when choices are available, balance load • configure filters in LC based on port numbers • Managing external port numbers. • user may request specific port number from RM when requesting NP-slice • RM opens UDP connection and attempts to binds port number to it • allocated port number returned to VS • Managing port numbers for exception channel. • user Vserver opens UDP port and binds port number to it • port number supplied to RM as part of NP-slice configuration request • Managing per slice filters in NP • requests made through RM, which forwards to NP’s xScale

  7. parse MEs ME1 ME2 ME3 Execution Environment for Parse/Format • Statically configure code for parse and format. • only trusted developers may provide new code options • must ensure that slices cannot interfere with each other • shut down NP & reload ME program store to configure new code option • User specifies option at NP-allocation time. • Demux determines code option and passes it along. • Each slice may have its own static data area in SRAM • For IPv4 code option, user-installed filters determine outgoing MI, daddr, dport of next hop or if packet should go exception channel • To maximize available code space per slice, pipeline MEs. • each ME has code for small set of code options • MEs just propagate packets for which they don’t have code • ok to allow these to be forwarded out of order • each code option should be able to handle alltraffic (5 Gb/s) in one ME • might load-balance over multiple MEs byreplicating code segments

  8. Monitoring NPE Slice Traffic • Three counter blocks per NPE slice • pre-filter counters – parse block specifies counter pair to use • for IPv4 case, associate counters with meta-interface and type (UDP, TCP, ICMP, options, other) • pre-queue counters – format block specifies counter pair • for IPv4 case, extract from filter result • post-queue counters – format block specifies counter pair • for IPv4 case, extract from filter result • xScale interface for monitoring counters • specify groups of counters to poll and polling frequency • counters in a common group are read at the same time and returned with a single timestamp • by placing a pre-queue and post-queue counter pair in the same group, can determine number of packets/bytes queued in a specific category

  9. Queue Management • Bandwidth resources allocated on basis of external physical interfaces. • by default, each slice gets equal share of each external physical interface • NPE has scheduler for each external physical interface it sends to • Each NP-slice has its own set of queues. • each queue is configured for a specific external interface • each slice has a fixed quantum for each external interface, which it may divide among the different queues, as it wishes • mapping of packets to queues is determined by slice code option • may be based on filter lookup result • Dynamic scheduling of physical interfaces • different NPEs (and GPEs) may send to same physical interface • bandwidth of the physical interface must be divided among senders to prevent excessive queueing in LC • use form of distributed scheduling to assign shares • share based on number of backlogged slices waiting on interface

  10. Phase 0 Demonstration GPE NPE Switch 5 LC 4 internet localhosts • Possible NP slice applications. • basic IPv4 forwarder • enhanced IPv4 forwarder • use TOS and queue lengths to make ECN mark decisions and/or discard decisions • metanet with geodesic addressing and stochastic path tagging • Run multiple NP-slices on each NP • On GPE run pair of standard Plab apps, plus exception code for the NP-slices. • select sample Plab apps for which we can get help • What do we show? • ability to add/remove NP-slices • ability to add/remove filters to change routes • performance charts of queueing performance • compare NP-slice to GP-slice and standard PlanetLab slice

  11. Phase 1 GPE NPE ... ... CP Switch LC • New elements • multiple GPE blades, each with external IP address • CP to manage NPE usage • expanded set of code options • NP management divided between Local Resource Manager (LRM) running on GPEs and Global Resource Manager (GRM) on CP • Vservers interact with LRM as before • LRM contacts GRM to allocate NP slices • port number management handled by LRM • LC uses destination IP addr and dport to direct packets to correct NPE or GPE • Code options • multicast-capable IPv4 MR • ???

  12. Phase 2 Overview GPE NPE ... ... CP Switch LC • New elements. • multiple GPEs in unified node • CP manages interaction with PLC • CP coordinates use of external port numbers • transparent login service • dynamic NP code installation • Line Cards demux arriving packets using IP filters and remap port numbers as needed. • requires NAT functionality in LCs to handle outgoing TCP connections, ICMP echo, etc. • other cases handled with static port numbers

  13. GPE NPE ... ... NM CP PLC NM myPLC Switch LC Slice Configuration • Slice descriptions created using standard PlanetLab mechanisms and stored in PLC database. • CP’s Node Manager periodically retrieves slice descriptions and makes local copy of relevant parts in myPLC database. • GPEs’ Node Managers periodically retrieve slice descriptions and update their local configuration.

  14. Managing Resource Usage GPE LRM VS NPE ... CP GRM Switch LC • Vservers request NP-slices from LocalResource Manager (LRM). • LRM relays request to GRM whichassigns NPE slice • LRM configures NPE to handle slice • GRM configure filters in LC • Managing external port numbers. • LRM reserves pool of port numbers by opening connections and binding port numbers • user may request specific port number from LRM pool • used for NP ports and externally visible “server ports” on GPEs • Network Address Translation • to allow outgoing TCP connections to be handled transparently • use LC filter to re-direct TCP control traffic to xScale • address translation created when outgoing connection request intercepted • similar issue for outgoing ICMP echo packets – insert filter to handle later packets with same id (both ways)

  15. Transparent Login Process • Objective – allow users to login to system and configure things, in a similar way to PlanetLab. • currently, they SSH to selected node and SSH server authenticates and forks process to run in appropriate Vserver • seamless handoff, as new process acquires TCP state • Tricky to replicate precisely. • if we SSH to CP and authenticate there, need to transfer session to appropriate GPE and Vserver • need general process migration mechanism to make this seamless • Another approach is to authenticate on CP and use user-level forwarding to give impression of direct connection. • Or, use alternate client that users invoke to access our system. • client contacts CP informing it of slice user wants to login to • CP returns an external port number that is remapped by LC to SSH port on target host • client then opens SSH connection to target host through the provided external port number

  16. Specifying Parse and Format Code • Use restricted fragment of C. • all variables are static, user declares storage type • register, local, SRAM, DRAM • registers and local variables not retained between packets • loops with bounded iterations only • sample syntax: for (<iterator>) :<constant-expression> { loop body } • <constant-expression> can include the pseudo-constant PACKET_LENGTH which refers to the number of bytes in the packet being processed • only non-recursive functions/procedures • no pointers • no floating point • Compiler verifies that worst-case code path has bounded number of instructions and memory accesses • at most C1 + C2*PACKET_LENGTH, where C1 and C2 are constants to be determined • Limited code size (maybe 500-1000 instructions per slice) • Implement as front-end that produces standard C and sends to Intel compiler for code gen. – back-end to verify code path lengths

  17. parse MEs ME1 ME2 ME3 bypass MEused whenreconfiguring MEB Dynamic Configuration of Slices • To add new slice • configure bypass ME with code for old slices and new slice • swap in using scratch rings for input and output • reconfigure original ME with code image and swap back • requires MEs retain no state in local memory between packets • drain packets from “old” ME before accepting packets from new one • Similar process required if MEs used in parallel • configure spare ME with new code image and add to pool • iteratively swap out others and swap back in • for n MEs need n swap operations

More Related