1 / 36

Control Update Focus on PlanetLab integration and booting

Control Update Focus on PlanetLab integration and booting. Fred Kuhns fredk@arl.wustl.edu Applied Research Laboratory Washington University in St. Louis. Documents. Control documentation http://www.arl.wustl.edu/projects/techX/ppt/ This presentation

donnan
Download Presentation

Control Update Focus on PlanetLab integration and booting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Control UpdateFocus on PlanetLab integration and booting Fred Kuhns fredk@arl.wustl.edu Applied Research Laboratory Washington University in St. Louis

  2. Documents • Control documentation http://www.arl.wustl.edu/projects/techX/ppt/ • This presentation • http://www.arl.wustl.edu/projects/techX/ppt/ControlUpdate.ppt • SRM interface • http://www.arl.wustl.edu/projects/techX/ppt/srm.ppt • RMP interface • http://www.arl.wustl.edu/projects/techX/ppt/rmp.ppt • SCD interface (ingress, egress and npe) • http://www.arl.wustl.edu/projects/techX/ppt/scd.ppt • Datapath documentation http://www.arl.wustl.edu/projects/techX/design/SPP/ • NAT overview (Interface??) • http://www.arl.wustl.edu/projects/techX/design/SPP/SPP_V1_NAT_design.ppt • FlowStats (Interface??) • http://www.arl.wustl.edu/projects/techX/design/SPP/FlowStats_Control.ppt

  3. Disk CPU NIC DRAM Traditional View of a PlanetLab Node • Linux OS, vserver • System services • pl_netflow • sirius: brokerage service • stork: environmental service • CoMon: monitoring and discovery • Resource model • focused on PCs with single device instances (CPU, NIC) • standard Linux/UNIX tools to measure utilization • homogeneous environment with single vmm to manage all vm instances on a platform • local node manager interface through loopback interface • User requests slice on a set of distributed nodes • assigned VM instance on each node • Fedora Linux environment • per slice flowstats Planetlab node: site, owner, model, ssh_host_key, groups Host = XXX, Domain = YYY IPAddress = A.B.C.D Node Manager (“root” VM) System Services (VMs) VM1 VMN ... Virtual Machine Monitor (VMM) Hardware Platform (General Purpose PC) host.domain A.B.C.D Internet

  4. vmN:fast path1 An SPP Node SPP/PlanetLab node: site, owner, model ssh_host_key, groups Host = XXX Domain = YYY IPAddress = A.B.C.D GPE1 GPE2 *Node Manager *System Services *Node Manager *System Services VM1 VMX-1 VMX VMN ... ... Virtual Machine Monitor (VMM) Virtual Machine Monitor (VMM) Hardware Platform (General Purpose PC) Hardware Platform (General Purpose PC) CP CPU NIC CPU NIC NIC NIC Disk DRAM CPU NIC Disk DRAM CPU NIC data data control control *Node Manager *System Services HUB: 1GbE Control (Base); 10GbE Data (fabric) data data data NPE NPE Line Card FwdDB/Filters datapath vmX-1:fast path1 vm1:fast path1 NAT vmY:fast path2 vm1:fast path2 ... ... vmX:fast path1 External Interface spp_host.domain A.B.C.D Internet

  5. Challenges • Provide the standard PlanetLab slice environment • configure and boot individual GPEs with standard planetlab software and supporting the standard operational environment • Support standard interfaces • boot manager • node managers internal and external interfaces • resource monitoring • Create interface for allocating and managing fast-paths • allocate/free NPE resources • manage meta-interface mappings to externally visible IP address and UDP port • slice control of allocated fastpath resources

  6. RMP NMP egress SCD NPE SCD ingress SCD ingress SCD NATD System Resource Manager (SRM) and node manager (GNM) SLM sshd* SPP Node External Interfaces ... IP1 IP2 IPN RTM 10x1G/1x10G Boot Files: NPE NPE GPE GPE LC • dhcpd.conf • ethers • tftpboot: • bootcd.img • overlay_gpeX.img • pxelinux.0 • pxelinux.cfg • C0A82031 • C0A82041 • overlay.img: • plnode.txt • plc_config • ethers • spp_conf.txt • spp_netinit.py • server*, certs pl_netflow user slivers ntp ntp ntp ntp NPU-A NPU-B NPU-A NPU-B TCAM TCAM xscale xscale xscale xscale vnet SPI SPI PCI PCI interfaces Fabric Ethernet Switch (10Gbps, data path) Hub Base Ethernet Switch (1Gbps, control) CP FlowStats httpd xmlrpc PLCAPI proxy I2C (IPMI) PXE, dhcpd tftp flowDB sliceDB /var/www/ Resource DB Slice DB nodeconf.xml boot files Shelf manager user info/ home dirs ntpd node DB

  7. Software Components • Control Processor (CP): • Boot and Configuration Control (BCC): Node configuration, management and local state management (DB) • httpd, dhcpd, tftp and PXE server for GPE and NPE boards; maintain config files • Boot CD and distribution file management (overlay images, RPM and tar files) for GPEs and CP • PLCAPI proxy (plc_api) and system level BootManager (part of gnm) • System Resource Manager (SRM): Centralized resource management • responsible for all resource allocation decisions and maintaining dynamic system state • delegates local operations to individual board-level managers • System Node Manager (SNM, aka GNM): “top-half” of the PlanetLab node manager • Slice login manager (SLM) and ssh forwarding (modified sshd) -- Ritun • Flow Statistics (FS): aggregates pl_netflow data and translates NAT records • Set default (static) routes in line card • What about dynamic route management (BGP/OSPF/RIP)? For now assume single next hop router for all routes. • General purpose Processing Element (GPE) • Local Boot Manager (LBM): Modified PlanetLab BootManager running on the GPEs • Resource Manager Proxy (RMP) • Node Manager Proxy (NMP), lower-half of PlanetLab’s node manage • Network Processor Element (NPE) • Substrate Control Daemon (SCD): • manages all NPE resources and provides mappings form slice to global name spaces • Kernel module to read/write memory locations (wumod) • Command interpreter for configuring NPU memory (wucmd) • Line Card, Ingress • Substrate Control Daemon (scd_ingress) • implements interface to srm • manage tcam access for ingress and egress • reads/writes scratch rings for NATD • Network Address Translation daemon (NATD), port only • Line Card Egress: • Substrate Control Daemon (scd_egress) • implements interface to srm • reads/writes scratch rings and communicates with the FS and NATD.

  8. Boot and Configuration Control • Read node configuration DB: currently this is an xml file • Allocate IP subnets and addresses for all boards • Assign external IP addresses to GPE fabric interfaces with default VLAN id • Create per GPE configuration DB: currently this is written to files. • Create dhcp configuration file and start dhcpd, httpd and system sshd • assigns control IP subnets and addresses; assigns internal substrate IP subnet on fabric Ethernet • Start PLCAPI proxy (plc_api) server and system node manager • read node DB for initialization data: currently use static configuration data and/or re-read xml file • Create GPE overlay images: currently this is done manually • Currently the SNM is split between the plc_api server and srm due to not having a DB and not wanting to implement transaction-like interface for the snm. • begin periodic slice updates and gpe assignments, maintain DB • Start SRM and bring up boards as they “report in” • Initialize Line Card to forward “default” (i.e. ssh and icmp) to CP • Initialize Hub: base and fabric switches; Initialize any switches not within the chassis • Start SLM and the ssh daemon • Remove the SLM configuration file for slices, may contain old mappings

  9. b2 b2 f2/0 f2/0 f2/1 f2/1 gnm* natd rmp rmp srm scd scd scd scd nm nm dhcpd fs plc_api httpd Booting SPP1: Example Configuration CP • /tftpboot/ • ramdisk.gz • zImage.ppm10 • bootcd.img • overlay_gpe1.img • overlay_gpe2.img • pxelinux.0 • pxelinux.cfg/ • C0A82031 • C0A82041 Hub • /etc/ • dhcpd.conf • ethers Line Card (Slot 6) Ingress XScale 192.168.32.17 • /var/www/html/boot/ • index.html • bootmanager.sh • bootstrapfs-planetlab-i386.tar.bz2 eth0 b1a lc_b1a = 192.168.32.97/20 drn05.arl.wustl.edu 128.252.153.209 cp_ctrl 192.168.32.1/20 eth2 b1 lc1_data = 171.16.1.6/26 ... the ARL network f1/0 vlan 2 noarp vlan 2 eth0.2 dnr05.arl.wustl.edu Egress XScale f1/0 eth0 cp_data = 171.16.1.1/26 eth0 eth2.2 eth0 b1b lc_b1b = 192.168.32.98/20 128.252.153.78 128.252.153.31 GPE1 (Slot 4) eth0:0 192.168.32.2 128.252.153.31 eth1:0 b1 eth2 gpe1_ctrl = 192.168.32.65/20 IP Routing proxy arp for drn05 noarp vlan 2 eth0.2 dnr05.arl.wustl.edu f1/0 NPE (Slot 5) eth0 gbe1_data = 171.16.1.3/26 myPLC drn06.arl.wustl.edu XScale A f1/1 eth1 gpe1_int = 172.16.1.65/26 Ebony eth0 b1a GPE2 (Slot 3) lc_b1a = 192.168.32.81/20 lc1_data = 171.16.1.5/26 ... f1/0 eth2 b1 gpe2_ctrl = 192.168.32.49/20 noarp XScale B vlan 2 eth0.2 dnr05.arl.wustl.edu f1/0 eth0 gbe2_data = 171.16.1.4/26 eth0 b1b lc_b1b = 192.168.32.82/20 f1/1 eth1 gpe2_int = 172.16.1.66/26

  10. b2 b2 f2/0 f2/0 f2/1 f2/1 gnm* natd rmp rmp srm scd scd scd scd nm nm dhcpd fs plc_api httpd Example Configuration, SPP3 CP • /tftpboot/ • ramdisk.gz • zImage.ppm10 • bootcd.img • overlay_gpe1.img • overlay_gpe2.img • pxelinux.0 • pxelinux.cfg/ • C0A82031 • C0A82041 Hub • /etc/ • dhcpd.conf • ethers Line Card (Slot 6) Ingress XScale the ARL network 192.168.0.17 • /var/www/html/boot/ • index.html • bootmanager.sh • bootstrapfs-planetlab-i386.tar.bz2 eth0 b1a lc_b1a = 192.168.0.97/20 myPLC drn06.arl.wustl.edu spp3.arl.wustl.edu 128.252.153.3 cp_ctrl 192.168.0.1/20 eth2 b1 lc1_data = 171.16.1.6/26 ... f1/0 vlan 2 noarp vlan 2 eth0.2 spp3.arl.wustl.edu Egress XScale f1/0 eth0 cp_data = 171.16.1.1/26 eth0 eth2.2 eth0 b1b lc_b1b = 192.168.0.98/20 128.252.153.34 128.252.153.39 GPE1 (Slot 3) eth0:0 128.252.153.39 eth1:0 eth2 b1 gpe1_ctrl = 192.168.0.49/20 192.168.0.2 IP Routing proxy arp for drn05 noarp vlan 2 eth0.2 spp3.arl.wustl.edu f1/0 NPE (Slot 5) eth0 gbe1_data = 171.16.1.3/26 XScale A f1/1 eth1 gpe1_int = 172.16.1.65/26 cp5.arl.wustl.edu eth0 b1a GPE2 (Slot 4) lc_b1a = 192.168.0.81/20 lc1_data = 171.16.1.5/26 ... f1/0 eth2 b1 gpe2_ctrl = 192.168.0.65/20 noarp XScale B vlan 2 eth0.2 spp3.arl.wustl.edu f1/0 eth0 gbe2_data = 171.16.1.4/26 eth0 b1b lc_b1b = 192.168.0.82/20 f1/1 eth1 gpe2_int = 172.16.1.66/26

  11. bootcd file system • / • bin/ • dev/ • home/ • lib/ • ... • etc/ • init.d/ • pl_bootpl_netinitpl_validateconf pl_sysinit pl_hwinit • ... • ... • root/ • selinux/ • sys/ • usr/ • pl_boot: modified to not use ssl or pgp to retrieve BootManager script from the cp • pl_netinit: sets boot_server to reference the cp • pl_validateconf: added SPP specific variables

  12. overlay image • / • etc/{issue, passwd} • kargs.txt • pl_version • usr/ • isolinux • boot/ • spp_netinit.py ethers spp_conf.txt • boot_server boot_server_portboot_server_path • plnode.txt cacert.pem plc_config pubring.gpg • backup/ • boot_server boot_server_path boot_server_port cacert.pem pubring.gpg • bootme/ • BOOTPORT BOOTSERVER BOOTSERVER_IP ID • cacert/drn06.arl.wustl.edu/cacert.pem • Changed to list cp as boot server and port as 81 • Added SPP initialization script and config files • Changed plnode.txt to list this GPEs mac address for control interface

  13. GPE Configuration file: spp_conf.txt # Config name: spp1.txt [ nserv ] ctrl_ipaddr=192.168.32.1 ctrl_hwaddr=00:1E:C9:FE:76:22 data_ipaddr=172.16.1.1 data_hwaddr=00:1E:C9:FE:76:23 [ domain ] hostname=drn05 domain=arl.wustl.edu dns1=128.252.133.45 dns2=128.252.120.1 gateway=128.252.153.31 [ hosts ] nserv_f1.0=172.16.1.1 nserv=192.168.32.1 nserv_gbl=192.168.48.1 shmgr=192.168.48.2 hub=192.168.32.17 hub1_f1.0=172.16.1.2 hub1_m.0=192.168.48.17 gpe1_f1.0=172.16.1.3 gpe1_f1.1=172.16.1.65 gpe1_b1.0=192.168.32.65 gpe2_f1.0=172.16.1.4 gpe2_f1.1=172.16.1.66 gpe2_b1.0=192.168.32.49 npe1_f1.0=172.16.1.5 npe1_b1.0=192.168.32.81 npe1_m.0=192.168.48.81 npe1_b1.1=192.168.32.82 lc_f1.0=172.16.1.6 lc_b1.0=192.168.32.97 lc_m.0=192.168.48.97 lc_b1.1=192.168.32.98 drn05.arl.wustl.edu=128.252.153.209 [ iface ] __name__=eth0 dev=eth0 name=gpe1_f1.0 hwaddr=00:0e:0c:85:e4:40 type=data lanid=fabric1 port=0 vlan=0 ipaddr=172.16.1.3 ipnet=172.16.1.0 ipbcast=172.16.1.63 ipmask=255.255.255.192 arp=no enable=yes [ iface ] __name__=eth0.2 dev=eth0.2 name=gpe1_f1.0 hwaddr=00:0e:0c:85:e4:40 vlan=2 type=data lanid=fabric1 port=0 ipaddr=128.252.153.209 ipnet=128.252.0.0 ipbcast=128.252.255.255 ipmask=255.255.0.0 arp=no enable=yes [ iface ] __name__=eth1 dev=eth1 name=gpe1_f1.1 hwaddr=00:0e:0c:85:e4:42 type=data lanid=fabric1 port=1 vlan=0 ipaddr=172.16.1.65 ipnet=172.16.1.64 ipbcast=172.16.1.127 ipmask=255.255.255.192 arp=no enable=yes [ iface ] __name__=eth2 dev=eth2 name=gpe1_b1.0 hwaddr=00:0e:0c:85:e4:3e type=control lanid=base1 port=0 vlan=0 ipaddr=192.168.32.65 ipnet=192.168.32.0 ipbcast=192.168.39.255 ipmask=255.255.248.0 arp=yes enable=yes

  14. ethers # ---------------------------------------------------------------------- # Board Type cp, Name cp1, Slot 0 # nserv_f1.0 fabric1/0 00:1E:C9:FE:76:23 172.16.1.1 # nserv base1/0 00:1E:C9:FE:76:22 192.168.32.1 # nserv_gbl maint/0 00:10:18:32:00:76 192.168.48.1 # ---------------------------------------------------------------------- # Board Type shmgr, Name shmgr1, Slot 0 # shmgr maint/0 00:50:C2:3F:D2:74 192.168.48.2 # ---------------------------------------------------------------------- # Board Type hub, Name hub1, Slot 1 # hub base1/0 00:00:50:3D:10:6B 192.168.32.17 # hub1_f1.0 fabric1/0 00:00:50:3D:10:B0 172.16.1.2 # hub1_m.0 maint/0 00:00:50:3D:10:6C 192.168.48.17 # ---------------------------------------------------------------------- # Board Type gpe, Name gpe1, Slot 4 # gpe1_f1.0 fabric1/0 00:0e:0c:85:e4:40 172.16.1.3 # gpe1_f1.1 fabric1/1 00:0e:0c:85:e4:42 172.16.1.65 # gpe1_b1.0 base1/0 00:0e:0c:85:e4:3e 192.168.32.65 # ---------------------------------------------------------------------- # ---------------------------------------------------------------------- # Board Type gpe, Name gpe2, Slot 3 # gpe2_f1.0 fabric1/0 00:0E:0C:85:E6:08 172.16.1.4 # gpe2_f1.1 fabric1/1 00:0E:0C:85:E6:0A 172.16.1.66 # gpe2_b1.0 base1/0 00:0E:0C:85:E6:06 192.168.32.49 # ---------------------------------------------------------------------- # Board Type npe, Name npe1, Slot 5 # npe1_f1.0 fabric1/0 00:00:00:00:00:00 172.16.1.5 # npe1_b1.0 base1/0 00:00:50:3d:07:3e 192.168.32.81 # npe1_m.0 maint/0 00:00:50:3D:07:3C 192.168.48.81 # npe1_b1.1 base1/1 00:00:50:3D:07:3D 192.168.32.82 # ---------------------------------------------------------------------- # Board Type lc, Name lc1, Slot 6 # lc_f1.0 fabric1/0 00:00:50:3d:0b:d4 172.16.1.6 # lc_b1.0 base1/0 00:00:50:3D:08:26 192.168.32.97 # lc_m.0 maint/0 00:00:50:3D:08:24 192.168.48.97 # lc_b1.1 base1/1 00:00:50:3D:08:25 192.168.32.98 # ---------------------------------------------------------------------- # Gateway for drn05 (128.252.153.209), VLAN 2 00:00:50:3d:0b:d4 128.252.153.31

  15. BootAPI calls made by the BootManager • PLCAPI/BootAPI calls • GetSession(node_id, auth, node_ip)returns new session key for node • BootCheckAuthentication(Session)returns true if Session id is valid • GetNodes(Session, node_id, [‘nodegroup_ids’,‘nodenetwork_ids’,‘model’,‘site_id’])returns the indicated parameters for this node (ie. node_id). • GetNodeNetworks(Session, node_id, nodenetwork_ids)returns list of interfaces[ broadcast, network, ip, dns1, dns2, hostname, netmask, gateway, nodenetwork_id, method, mac, node_id, is_primary, type, bwlimit, nodenetwork_settings_ids ] • GetNodes(Session, node_id, ‘nodegroup_ids’)returns list of group ids associated with this node • GetNodeGroups(Session, nodegroup_id, ‘name’)returns the name string for each node group (in out case ‘SPP’) • GetNodeNetworkSettings() • BootUpdateNode(Session, boot_state)Sets node’s boot state at PLC • BootNotifyOwners(Session, “event”, params)causes email to be sent to the list of node owners. • BootUpdateNode(Session, ssh_host_key)records the latest ssh public key for node.

  16. Other PLC/Server interactions • HTTP/HTTPS • Upload alpina boot logs:BOOT_SERVER_URL += /alpina-logs/upload.php • Compatibility step (we don’t use)BOOT_SERVER_URL +=/alpina-BootLVM.tar.gzBOOT_SERVER_URL +=/alpina-PartDisk.tar.gz • Download file system tar file containing basic plab node environmentBOOT_SERVER_URL += /boot/bootstrapfs-”group”-”arch”.tar.bz2 • If not in config file get node idBOOT_SERVER_URL += /boot/getnodeid.php • Get yum update configuration file:BOOT_SERVER_URL += /PlanetLabConf/yum.conf.php

  17. System Initialization: Stage 1 • Use PXE boot and download pxelinux and config file: • boot using basic initial ramdisk, overlay and kernel • Use dhcp, tftp and pxe server on the cp, files stored in the tfptboot directory.pxelinux.o, pxelinux.cfg/<GPE_IPADDR>bootcd.img,overlay_gpeX.img,kernel • The overlay image is modified for each GPE to include it’s configuration file, modified planetlab config files and an spp node python script. • Currently this is a manual step but ultimate (long term) plan is for the gnm daemon to create the individual images • The overlay image contains several files that identify the node and provide the name and address for the PLC and Boot servers. I have modified these to point o the cp. • Just before booting the final kernel I change these values to refer to the “real” plc/api servers.

  18. System initialization: Stage 2 • Boot into basic, intermediate environment • Initial configuration information obtained from the overlay image • Includes spp_conf.txt defines gpe interfaces • Includes ethers file contains mac addresses for static arp entries • Updated plnode.txt with GPE’s control interface mac address • Modified bootserver files listing the cp as the bootserver • Includes spp_netinit.py, a python script to configure the interfaces and update system configuration files. • Enables “primary” interface and key network configuration files such as resolv.conf • Downloads BootManager source from the “boot_server” • In our case we download from the CP • I explicitly disable the use of ssl and certs (the certifictes on the overlay image are for the PLC server and not the CP) • Our assumption is that the control (base) network is “secure” plus within an SPP node we don’t have to worry about authentication issues.

  19. BootManager • Opens connection to PLCAPI on bootserver • Opens connection to our proxy plcapi/bootapi server running on the CP • Get node session key: GetSession(node_id, auth, node_ip) • Since each call to create a session invalidates any existing keys we intercept this call on the cp and use a common session key for all gpes. • Determines node’s configuration • reads plnode.txt for node_id, node_key and the primary interface settings • we use DHCP to configure the control interface but I do not define a dns server • if node_id is not found then reads URL=BootServer/boot/getnodeid.php • Call BootCheckAuthentication(Session) to verify session key • Calls GetNodes to get the boot_state, node_groups, model, site_id • Calls GetNodeNetworks to get configuration information for all interfaces • in our case the call would return the externally visible network parameters, which differ from how each GPE is configured • long term, we can intercept this call and return GPE specific interface config info. • Short term we use a configuration file in the overlay image with similarly formatted information. I have replaced the BootManager code that reads the config info and configures the interfaces. • I had to add support for VLANs and our internal interfaces.

  20. BootManager Continued • Download the nodes final filesystem image from the boot_server • in our case this is the CP, http://CP/boot/bootstrap-planetlab-i386-tar.bz2 • Download yum config file • I am not currently downloading, http://CP/PlanetLabConf/yum.conf • Call BootUpdateNode with new boot_state • we will need to intercept this call and both report and set node state based on all GPEs. • Call BootNotifyOwners with new state • forward to PLC • Update network configuration in new “sysimg” • downloads //BootServer/ PlanetLabConf/plc_config file • In our case I have copied onto the overlay image in the /usr/boot directory. • calls GetNodeNetworkSettings for a list of any additional interface attributes then creates various configuration files: hosts, resolv.conf, network, ifcfg-eth* • I have replaced this step with our own script spp_netinit.py and configuration file spp_conf.txt which I use to create the same config files in both the current environment and the new sysimg. • updates devices and creates the initrd image used for the next stage • finally boots a new kernel using the bootstrap file system

  21. Boot States • The list of boot states is changing as I write this • In our version of the plc the states are shown on the right

  22. PLC Database • The PlanetLab central database keeps a database describing all nodes, slices and users/people. • Slice data base keeps track of all slices and their node bindings • The Node database includes externally visible properties and the ability to associate general attributes with these properties • the current (or next) node state (boot_state) • node identifier (node_id) • list of interface configuration parameters • ip address information, mac address, generic list of attributes • node’s owner • node’s site identifier (site_id) • model, can be used to specify a set of attributes forthe node. For example: minhw, smp • current ssh host key (ssh_host_key) • node groups: I believe this is being depricated in favor of associate a generic set of attributes with a node or its interfaces.

  23. SPP Specific Information • On an SPP node the resource manager needs to know what kind of board is inserted in each slot and its I/O characteristics • Needs to associate interface MAC addresses with boards and interfaces. Or with standalone system connected to an RTM or front panel (for example the CP). • Also need to know which interfaces are connected to the base and which to the fabric switch when bringing up general purpose systems. • There is not a convenient mechanism for determining this at run time so I have a configuration file. • Also need to know what resources are available on each board and allocation policies. • Must also have a list of external links, their addresses and the address of any peers (Ethernet). • Need to keep track of current nodes state (as kept by PLC) as well as the state of each individual board. • Need to share state between different daemons

  24. Node Configuration File <?xml version="1.0" encoding="utf-8" standalone="yes"?> <spp> <code_options> <IPv4 sram="fixed" queues="variable" id="0" fltrs="variable"> <sram> 1024 </sram> </IPv4> <I3 sram="fixed" queues="variable" id="1" fltrs="variable"> <sram> 1024 </sram> </I3> </code_options> <components> <cp name="cp1" slot="0" cat="host" alias="nserv"> <interface name="nserv_f1.0" dev="GigE" lanid="fabric1" assoc="" port="0"> ... </interface> ... </cp> <shmgr name="shmgr1" slot="0" cat="atca" alias="shmgr1"> <interface name="shmgr" dev="GigE" lanid="maint" assoc="" port="0"> ... </interface> ... </shmgr> <hub name="hub1" slot="1" cat="atca" alias="hub1"> <switch lanid="base1"> </switch> <switch lanid="fabric1"> <bw> 10000000000 </bw> </switch> <interface name="hub" dev="GigE" lanid="base1" assoc="" port="0"> ... </interface> ... </hub> <gpe name="gpe1" slot="4" cat="atca" alias="gpe1"> <interface name="gpe1_f1.0" dev="GigE" lanid="fabric1" assoc="" port="0"> ... </interface> ... </gpe> <npe name="npe1" slot="5" cat="atca" alias="npe1"> <product> Radisys_7010 </product> <model> NPEv1 </model> <interface name="npe1_f1.0" dev="10GigE" lanid="fabric1" assoc="" port="0"> ... </interface> ... </npe> <lc name="lc1" slot="6" cat="atca" alias="lc"> <product> Radisys_7010 </product> <model> LCv1 </model> <interface name="lc_f1.0" dev="10GigE" lanid="fabric1" assoc="" port="0"> ... </interface> ... <interface name="drn05" dev="GigE" lanid="external" port="0"> ... <link peering="true" primary="true" dev="GigE"> ... </link> ... </interface></lc> </components> </spp>

  25. CP Record <!-- Interface parameters defined by user in original “xml” file --> <cp name="cp1" slot="0" cat="host" alias="nserv"> <interface name="nserv_f1.0" dev="GigE" lanid="fabric1" assoc="" port="0"> <!-- All internal IP addrs assigned by configuration software based on runtime parameters --> <ipaddr>172.16.1.1</ipaddr> <ipnet>172.16.1.0</ipnet> <ipmask>255.255.255.192</ipmask> <ipbcast>172.16.1.63</ipbcast> <!-- Device parameters and comment set by user in the original “xml” file --> <device> eth0 </device> <hwaddr> 00:1E:C9:FE:76:23 </hwaddr> <desc> Interface connected to HUB's fabric port </desc> </interface> <interface name="nserv" dev="GigE" lanid="base1" assoc="" port="0"> <ipaddr>192.168.32.1</ipaddr> <ipnet>192.168.32.0</ipnet> <ipmask>255.255.248.0</ipmask> <ipbcast>192.168.39.255</ipbcast> <device> eth1 </device> <hwaddr> 00:1E:C9:FE:76:22 </hwaddr> <desc> System control processor's Base Ethernet connection </desc> </interface> <interface name="nserv_gbl" dev="GigE" lanid="maint" assoc="" port="0"> <ipaddr>192.168.48.1</ipaddr> <ipnet>192.168.48.0</ipnet> <ipmask>255.255.248.0</ipmask> <ipbcast>192.168.55.255</ipbcast> <device> eth2 </device> <hwaddr> 00:10:18:32:00:76 </hwaddr> <desc> Connection to the maintenance ports </desc> </interface> </cp>

  26. GPE Record <gpe name="gpe1" slot="4" cat="atca" alias="gpe1"> <interface name="gpe1_f1.0" dev="GigE" lanid="fabric1" assoc="" port="0"> -- IP Address Info -- <device> eth0 </device> <hwaddr> 00:0e:0c:85:e4:40 </hwaddr> (Device Data) <bw> 1000000000 </bw><share> 2 </share> (Resource Policy) <desc> MAC=N+2, Fabric 1/0 or AMC Port 0 </desc></interface> <interface name="gpe1_f1.1" dev="GigE" lanid="fabric1" assoc="" port="1"> -- IP Address Info --<device> eth1 </device> <hwaddr> 00:0e:0c:85:e4:42 </hwaddr> <desc> MAC=N+4, Fabric 1/1 or Maintenance Port 1 </desc></interface> <interface name="gpe1_b1.0" dev="GigE" lanid="base1" assoc="" port="0"> -- IP Address Info --<device> eth2 </device> <hwaddr> 00:0e:0c:85:e4:3e </hwaddr> <desc> MAC=N, Base connection to Primary HUB </desc></interface> <interface name="gpe1_b2.0" dev="GigE" lanid="base2" assoc="" port="0"> -- IP Address Info --<device> eth3 </device> <hwaddr> 00:0e:0c:85:e4:3f </hwaddr> <desc> MAC=N+1, Base connection to alternate HUB </desc></interface> <interface name="gpe1_f2.0" dev="GigE" lanid="fabric2" assoc="" port="0"> -- IP Address Info --<device> eth4 </device> <hwaddr> 00:0e:0c:85:e4:41 </hwaddr> <desc> MAC=N+3, Fabric 2/0 or AMC Port 1 </desc></interface> <interface name="gpe1_f2.1" dev="GigE" lanid="fabric2" assoc="" port="1"> -- IP Address Info --<device> eth5 </device> <hwaddr> 00:0e:0c:85:e4:43 </hwaddr> <desc> MAC=N+5, Fabric 2/1 or Maintenance Port 2 </desc></interface> </gpe>

  27. NPE Record <npe name="npe1" slot="5" cat="atca" alias="npe1"> <product> Radisys_7010 </product> <model> NPEv1 </model> <interface name="npe1_f1.0" dev="10GigE" lanid="fabric1" assoc="" port="0"> -- IP Address Info -- -- Device Data -- -- Resource Policy -- <desc> Fabric interface used for both NPUs </desc></interface> <interface name="npe1_b1.0" dev="GigE" lanid="base1" assoc="npua" port="0"> -- IP Address Info -- -- Device Data -- <desc> Primary control interface associated with NPUA </desc></interface> <interface name="npe1_m.0" dev="GigE" lanid="maint" assoc="npua" port="0"> -- IP Address Info -- -- Device Data -- <desc> NPUA Front Maintenance Port </desc></interface> <interface name="npe1_b1.1" dev="GigE" lanid="base1" assoc="npub" port="1"> -- IP Address Info -- -- Device Data -- <desc> NPUB Front Maintenance Port -- But it's been patched to the Base switch </desc> </interface> </npe>

  28. LC Record <lc name="lc1" slot="6" cat="atca" alias="lc"> <product> Radisys_7010 </product> <model> LCv1 </model> (Model Data) <interface name="lc_f1.0" dev="10GigE" lanid="fabric1" assoc="" port="0"> -- IP Address Info -- -- Device Data -- -- Resource Policy --</interface> <interface name="lc_b1.0" dev="GigE" lanid="base1" assoc="npua" port="0"> -- IP Address Info -- -- Device Data -- </interface> <interface name="lc_m.0" dev="GigE" lanid="maint" assoc="npua" port="0"> -- IP Address Info -- -- Device Data -- </interface> <interface name="lc_b1.1" dev="GigE" lanid="base1" assoc="npub" port="1"> -- IP Address Info -- -- Device Data --</interface> <interface name="drn05" dev="GigE" lanid="external" port="0"> <hwaddr> 00:00:50:29:b1:46 </hwaddr> <link peering="true" primary="true" dev="GigE"> -- Link IP Address Info -- -- Device Data -- -- Resource Policy -- <domain> arl.wustl.edu </domain> <hostname> drn05 </hostname> <dns1> 128.252.133.45 </dns1> <dns2> 128.252.120.1 </dns2> <peerIP> 128.252.153.31 </peerIP> <peerMAC> 00:0F:B5:FB:D8:67 </peerMAC> <vlan> 2 </vlan> <port_pool> <!-- used for NAT --> <udp count="500" start="30000"> </udp> <tcp count="500" start="30000"> </tcp> </port_pool> <desc> p2p link from drn05 to drn06, the plc </desc> </link></interface> </lc>

  29. SRM Interface • NATD to SRM: • [egress_map, ingress_map] • get_sched_map(LinkIP, BoardMAC) • Depricated: original natd interface! • {fid, port} alloc_epmap(map) • status free_epmap(fid) • FS to SRM: • ?? (map vlan to slice id) • RMP to SRM: • Interfaces (Line Card Links): • if_list get_interfaces(plabID) • ifn get_ifn(plabID, ipaddr) • if_entry get_ifattrs(plabID, ifn) : • ipaddr get_ifpeer(plabID, ifn) : • retcode resrv_fpath_ifbw(bw, ifn) • retcode reles_fpath_ifbw(bw, ifn) • To be implemented: • retcode resrv_slice_ifbw(plabID, bw, ifn) • retcode reles_slice_ifbw(plabID, bw, ifn) • EndPoints (local IP and Port number): • NATD changes may have broken these • ep alloc_endpoint(PlabID, ep) • status free_endpoint(PlabID, ipaddr, • port, proto) • Fast Path: • fp_params alloc_fastpath(PlabID, • copt, bwspec,rcnts, mem) • status free_fastpath() • Fast-Path Meta-Interfaces: • [mi, ep] alloc_udp_tunnel(bw, ipaddr, port) • ep get_endpoint(mi) • status free_udp_tunnel(ipaddr, port)

  30. RMP Interface • Prototype completed: • result noop() • version get_version() • result add_slice(plabID, len, name) • result rem_slice(plabID) • ret_t alloc_fastpath(copt, bw, rcnts, mem) • void free_fastpath() • if_list get_interfaces() • ifn get_ifn(ipaddr) • if_entry get_ifattrs(ifn) • ipaddr get_ifpeer(ifn) • retcode alloc_pl_ifbw(ifn, bw) • retcode reles_pl_ifbw(ifn, bw) • retcode alloc_fpath_ifbw(fpid, ifn, bw) • retcode reles_fpath_ifbw(fpid, ifn, bw) • retcode bind_queue(fpid, miid, list_type, qids) • actual_bw set_queue_params(fpid, qid, threshold, bw) • [threshold, bw] get_queue_params(fpid, qid) • [u32 Pkts, u32 Bytes] get_queue_len(fpid, qid) • To do: • ep alloc_endpoint(ep) • status free_endpoint(ipaddr, port, proto) • -- alloc_tunnel -- • -- free_tunnel -- • [mi, ep] alloc_udp_tunnel(fpid, bw, ip, port) • status free_udp_tunnel(ipaddr, port) • ep get_endpoint(fpid, mi) • retcode write_fltr(fpid, fid, fltr) • retcode update_result(fpid, fid, result) • fltr_t get_fltr_bykey(fpid, key) • fltr_t get_fltr_byfid(fpid, fid) • result lookup_fltr(fpid, key) • retcode rem_fltr_bykey(fpid, key) • retcode rem_fltr_byfid(fpid, fid) • stats_t read_stats(fpid, sindx, flags) • result clear_stats(sindx) • handle create_periodic(fp,indx,P,cnt,flags) • retcode delete_periodic(fpid, handle) • retcode set_callback(fpid, handle, xport) • stats_t get_periodic(fpid, handle) • retcode mem_write(fpid, offset[, len], data) • data mem_read(fpid, offset, len)

  31. NPE SCD Interface SRM to SCD status set_fastpath(fpid, copt, VLAN, params, mem) status enable_fastpath(fpid) status disable_fastpath(fpid) status rem_fastpath(fpid) status set_sched_params(sid, ifn, BWmax, BWmin) status set_encap_cb(sid, srcIP, dMAC) status set_fpmi_bw(fpid, sid, miid, bw) status start_mes() status stop_mes() status set_encap_gpe(fpid, gpeIP, npeIP) result write_mem(kpa, len, data) data read_mem(kpa, len) SRM & RMP to SCD ret_t write_fltr(dbid, fid, key, mask, result) ret_t update_result(dbid, fid, result) fltr get_fltr_bykey(dbid, key) fltr get_fltr_byfid(dbid, fid) result lookup_fltr(dbid, key) retcode rem_fltr_bykey(dbid, key);retcode rem_fltr_byfid(dbid, fid) RMP to SCD status set_gpe_info(exPort, ldPort, exQID, ldQID) u32 result bind_queue(u16 miid, u8 list_type, u16[] qid_list) u32 bw set_queue_params(u16 qid, u32 threshold, u32 bw) {u32 threshold, u32 bw} get_queue_params(u16 qid) {u32 pktCnt, u32 byteCnt} get_queue_len(u16 qid) result write_sram(offset, len, data) data read_sram(offset, len) stats = read_stats(sindx, flags) result = clear_stats(sindx) handle create_periodic(sindx, P, cnt, flags) retcode del_periodic(handle) retcode set_callback(handle, udp_port) stats = get_periodic(handle)

  32. LC SCD Interface SRM to SCD status set_sched_params(sid, ifn, BWmax, BWmin) status set_sched_mac(sid, MACdst, MACsrc) u32 result set_queue_sched(u16 qid, u16 sid) result write_mem(kpa, len, data) data read_mem(kpa, len) SRM and RMP to SCD: ret_t write_fltr(dbid, fid, key, mask, result) ret_t update_result(dbid, fid, result) fltr get_fltr_bykey(dbid, key) fltr get_fltr_byfid(dbid, fid) result lookup_fltr(dbid, key) retcode rem_fltr_bykey(dbid, key);retcode rem_fltr_byfid(dbid, fid) RMP to SCD u32 actual_bw set_queue_params(u16 qid, u32 threshold, u32 bw) {u32 threshold, u32 bw} get_queue_params(u16 qid) {u32 pktCnt, u32 byteCnt} get_queue_len(u16 qid) stats = read_stats(sindx, flags) result = clear_stats(sindx) handle create_periodic(sindx, P, cnt, flags) retcode del_periodic(handle) retcode set_callback(handle, udp_port) stats = get_periodic(handle)

  33. Slice Example • Get list of interfaces, their Ip addresses and available bandwidth if_list = {if_entry, ...} if_entry = {u16 ifn, // logical interface number u16 type, // peering or multi-access u32 ipaddr, // interface’s IP address u32 linkBW, // Link’s native BW u32 availBW} // BW available for allocation struct epoint_t {u32 bw, u32 ipaddr; // interface’s IP address u16 port, // UDP port number for meta-interface u32 bw;} // total BW required for meta-interface iflist = get_interfaces(iflist); // return list of all available interfaces • Estimate the computational complexity and memory bandwidth requirements on NPE. bwSpec = {BWmax=totalBW, BWmin=0}; // fast path total BW requirement • max general NPE resource counts for this example I just assume a max number but in general it may be that a user scales it by the number of meta-interfaces they will use. fpCounts = {FLTR_CNT, QID_CNT, BUFF_CNT, STATS_CNT}; • Request substrate to allocate a fastpath instance for the IPv4 code option, assume we will use the default sram buffer sizes. Will also need to listen to returned sockes. [fpid, sockets] = alloc_fastpath(ipv4_copt, bwSpec, fpCnts, {IPV4_SRAM_SZ, 0});

  34. Slice Example - Continued • allocate one meta-interfaces for each external interface and assign our default UDP port number and BW requirement struct mi_t {uint_t mi; epoint_t rp;}; mi_t milist[iflist.len()]; for (indx = 0, mi = 0; indx < len(iflist); ++indx) { if (miBW > iflist[indx].availBW) throw Error; // allocate total BW required on this interface if (alloc_fpath_ifbw(fpid, iflist[indx].ifn, miBW)==-1) throw Error; // Allocate one meta-interface on this interface milist[indx] = alloc_udp_tunnel(fpid, miBW, iflist[indx].ipaddr, myPort) my_bind_queues(milist+indx); my_add_routes(milist+indx); }

  35. natd scd srm scd Test SPP Node Line Card (Slot 6) CP Hub Ingress XScale keystone.arl.wustl.edu 128.252.153.81 • /etc/ • dhcpd.conf • ethers • hosts 0/6 192.168.64.17 • /tftpboot/ • ramdisk.gz • zImage.ppm10 the ARL network 128.252.153.* eth0 b1a lc_b1a = 192.168.64.97/20 dhcpd vlan 2 lc1_data = 171.16.1.6/26 ... 0/6 f1/0 FP 1/6 Egress XScale eth2 b1 cp_ctrl = 192.168.64.1/20 eth0 eth2.2 FP 1/7 noarp 128.252.153.YYY 128.252.153.XXX RTM 3/1 vlan 2 eth0 b1b keystone.arl.wustl.edu lc_b1b = 192.168.64.98/20 eth0.2 eth0:0 f1/0 FP 1/9 128.252.153.XXX eth0 cp_data = 171.16.1.1/26 192.168.64.2 eth1 GPE4 (Slot 5) GPE1 (Slot 2) /etc/{ethers,hosts} /etc/{ethers,hosts} IP Routing proxy arp for keystone /etc/sysconfig/network-scripts/ifcfg-eth* /etc/sysconfig/network-scripts/ifcfg-eth* 0/5 2/1 eth2 b1 b2 eth2 gpe2_ctrl = 192.168.64.81/20 gpe1_ctrl = 192.168.64.33/20 noarp noarp RTM 3/2 “Router” vlan 2 vlan 2 0/5 eth0.2 eth0.2 keystone.arl.wustl.edu keystone.arl.wustl.edu f1/0 f1/1* eth0 eth0 gbe2_data = 171.16.1.5/26 gbe1_data = 171.16.1.2/26 Issue Mounting /opt/crossbuild/* from ebony. Could export dirs form the “Router” host. Or could use ebony rather than “Router”. In that case will need an external switch connecting line cards of spp? to ebony’s eth2.2. eth1 f1/1 f2/1* eth1 gpe2_int = 172.16.1.69/26 gpe1_int = 172.16.1.66/26 GPE3 (Slot 4) GPE2 (Slot 3) /etc/{ethers,hosts} /etc/{ethers,hosts} /etc/sysconfig/network-scripts/ifcfg-eth* /etc/sysconfig/network-scripts/ifcfg-eth* 0/3 0/4 b1 eth2 eth2 b1 gpe2_ctrl = 192.168.64.49/20 gpe2_ctrl = 192.168.64.65/20 noarp noarp vlan 2 vlan 2 0/3 0/4 eth0.2 eth0.2 keystone.arl.wustl.edu keystone.arl.wustl.edu f1/0 f1/0 eth0 eth0 gbe2_data = 171.16.1.3/26 gbe2_data = 171.16.1.4/26 f1/1 eth1 eth1 f1/1 gpe2_int = 172.16.1.67/26 gpe2_int = 172.16.1.68/26

  36. Test Bed Use • Core platform issues: • Can we use the second fabric port on the GPE boards? • The hub does not display stats or mac fwd entries for the slots with GPEs. It used to work. • The radisys shelf manager • does not reliably reset boards • Base1 interface disabled on slot 2 • NAT/Line Card testing • Overall reliability • Add support for aging • Specific issues (jdd) • restarting line card (without reboot) occasionally results in data-path thinking the scratch ring to the xscale is full. • looping iperf test from cp occasionally stalls with no packets getting through LC • Lookup needs fix to not use DONE bit to indicate a tcam lookup is done. • GPE/Intel board testing

More Related