Control update focus on planetlab integration and booting
This presentation is the property of its rightful owner.
Sponsored Links
1 / 36

Control Update Focus on PlanetLab integration and booting PowerPoint PPT Presentation


  • 70 Views
  • Uploaded on
  • Presentation posted in: General

Control Update Focus on PlanetLab integration and booting. Fred Kuhns [email protected] Applied Research Laboratory Washington University in St. Louis. Documents. Control documentation http://www.arl.wustl.edu/projects/techX/ppt/ This presentation

Download Presentation

Control Update Focus on PlanetLab integration and booting

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Control update focus on planetlab integration and booting

Control UpdateFocus on PlanetLab integration and booting

Fred Kuhns

[email protected]

Applied Research Laboratory

Washington University in St. Louis


Documents

Documents

  • Control documentation

    http://www.arl.wustl.edu/projects/techX/ppt/

    • This presentation

      • http://www.arl.wustl.edu/projects/techX/ppt/ControlUpdate.ppt

    • SRM interface

      • http://www.arl.wustl.edu/projects/techX/ppt/srm.ppt

    • RMP interface

      • http://www.arl.wustl.edu/projects/techX/ppt/rmp.ppt

    • SCD interface (ingress, egress and npe)

      • http://www.arl.wustl.edu/projects/techX/ppt/scd.ppt

  • Datapath documentation

    http://www.arl.wustl.edu/projects/techX/design/SPP/

    • NAT overview (Interface??)

      • http://www.arl.wustl.edu/projects/techX/design/SPP/SPP_V1_NAT_design.ppt

    • FlowStats (Interface??)

      • http://www.arl.wustl.edu/projects/techX/design/SPP/FlowStats_Control.ppt


Traditional view of a planetlab node

Disk

CPU

NIC

DRAM

Traditional View of a PlanetLab Node

  • Linux OS, vserver

  • System services

    • pl_netflow

    • sirius: brokerage service

    • stork: environmental service

    • CoMon: monitoring and discovery

  • Resource model

    • focused on PCs with single device instances (CPU, NIC)

    • standard Linux/UNIX tools to measure utilization

    • homogeneous environment with single vmm to manage all vm instances on a platform

    • local node manager interface through loopback interface

  • User requests slice on a set of distributed nodes

    • assigned VM instance on each node

    • Fedora Linux environment

    • per slice flowstats

Planetlab node:

site, owner, model, ssh_host_key, groups

Host = XXX, Domain = YYY

IPAddress = A.B.C.D

Node

Manager

(“root” VM)

System

Services

(VMs)

VM1

VMN

...

Virtual Machine Monitor (VMM)

Hardware Platform (General Purpose PC)

host.domain

A.B.C.D

Internet


An spp node

vmN:fast path1

An SPP Node

SPP/PlanetLab node:

site, owner, model

ssh_host_key, groups

Host = XXX

Domain = YYY

IPAddress = A.B.C.D

GPE1

GPE2

*Node

Manager

*System

Services

*Node

Manager

*System

Services

VM1

VMX-1

VMX

VMN

...

...

Virtual Machine Monitor (VMM)

Virtual Machine Monitor (VMM)

Hardware Platform (General Purpose PC)

Hardware Platform (General Purpose PC)

CP

CPU

NIC

CPU

NIC

NIC

NIC

Disk

DRAM

CPU

NIC

Disk

DRAM

CPU

NIC

data

data

control

control

*Node

Manager

*System

Services

HUB: 1GbE Control (Base); 10GbE Data (fabric)

data

data

data

NPE

NPE

Line Card

FwdDB/Filters

datapath

vmX-1:fast path1

vm1:fast path1

NAT

vmY:fast path2

vm1:fast path2

...

...

vmX:fast path1

External Interface

spp_host.domain

A.B.C.D

Internet


Challenges

Challenges

  • Provide the standard PlanetLab slice environment

    • configure and boot individual GPEs with standard planetlab software and supporting the standard operational environment

  • Support standard interfaces

    • boot manager

    • node managers internal and external interfaces

    • resource monitoring

  • Create interface for allocating and managing fast-paths

    • allocate/free NPE resources

    • manage meta-interface mappings to externally visible IP address and UDP port

    • slice control of allocated fastpath resources


Spp node

RMP

NMP

egress

SCD

NPE

SCD

ingress

SCD

ingress

SCD

NATD

System Resource Manager (SRM)

and node manager (GNM)

SLM

sshd*

SPP Node

External Interfaces

...

IP1

IP2

IPN

RTM

10x1G/1x10G

Boot Files:

NPE

NPE

GPE

GPE

LC

  • dhcpd.conf

  • ethers

  • tftpboot:

    • bootcd.img

    • overlay_gpeX.img

    • pxelinux.0

    • pxelinux.cfg

      • C0A82031

      • C0A82041

  • overlay.img:

    • plnode.txt

    • plc_config

    • ethers

    • spp_conf.txt

    • spp_netinit.py

    • server*, certs

pl_netflow

user slivers

ntp

ntp

ntp

ntp

NPU-A

NPU-B

NPU-A

NPU-B

TCAM

TCAM

xscale

xscale

xscale

xscale

vnet

SPI

SPI

PCI

PCI

interfaces

Fabric Ethernet Switch (10Gbps, data path)

Hub

Base Ethernet Switch (1Gbps, control)

CP

FlowStats

httpd

xmlrpc

PLCAPI

proxy

I2C

(IPMI)

PXE,

dhcpd

tftp

flowDB

sliceDB

/var/www/

Resource DB

Slice DB

nodeconf.xml

boot files

Shelf manager

user info/

home dirs

ntpd

node DB


Software components

Software Components

  • Control Processor (CP):

    • Boot and Configuration Control (BCC): Node configuration, management and local state management (DB)

      • httpd, dhcpd, tftp and PXE server for GPE and NPE boards; maintain config files

      • Boot CD and distribution file management (overlay images, RPM and tar files) for GPEs and CP

      • PLCAPI proxy (plc_api) and system level BootManager (part of gnm)

    • System Resource Manager (SRM): Centralized resource management

      • responsible for all resource allocation decisions and maintaining dynamic system state

      • delegates local operations to individual board-level managers

    • System Node Manager (SNM, aka GNM): “top-half” of the PlanetLab node manager

    • Slice login manager (SLM) and ssh forwarding (modified sshd) -- Ritun

    • Flow Statistics (FS): aggregates pl_netflow data and translates NAT records

    • Set default (static) routes in line card

    • What about dynamic route management (BGP/OSPF/RIP)? For now assume single next hop router for all routes.

  • General purpose Processing Element (GPE)

    • Local Boot Manager (LBM): Modified PlanetLab BootManager running on the GPEs

    • Resource Manager Proxy (RMP)

    • Node Manager Proxy (NMP), lower-half of PlanetLab’s node manage

  • Network Processor Element (NPE)

    • Substrate Control Daemon (SCD):

      • manages all NPE resources and provides mappings form slice to global name spaces

    • Kernel module to read/write memory locations (wumod)

    • Command interpreter for configuring NPU memory (wucmd)

  • Line Card, Ingress

    • Substrate Control Daemon (scd_ingress)

      • implements interface to srm

      • manage tcam access for ingress and egress

      • reads/writes scratch rings for NATD

    • Network Address Translation daemon (NATD), port only

  • Line Card Egress:

    • Substrate Control Daemon (scd_egress)

      • implements interface to srm

      • reads/writes scratch rings and communicates with the FS and NATD.


Boot and configuration control

Boot and Configuration Control

  • Read node configuration DB: currently this is an xml file

    • Allocate IP subnets and addresses for all boards

    • Assign external IP addresses to GPE fabric interfaces with default VLAN id

    • Create per GPE configuration DB: currently this is written to files.

  • Create dhcp configuration file and start dhcpd, httpd and system sshd

    • assigns control IP subnets and addresses; assigns internal substrate IP subnet on fabric Ethernet

  • Start PLCAPI proxy (plc_api) server and system node manager

    • read node DB for initialization data: currently use static configuration data and/or re-read xml file

    • Create GPE overlay images: currently this is done manually

    • Currently the SNM is split between the plc_api server and srm due to not having a DB and not wanting to implement transaction-like interface for the snm.

    • begin periodic slice updates and gpe assignments, maintain DB

  • Start SRM and bring up boards as they “report in”

    • Initialize Line Card to forward “default” (i.e. ssh and icmp) to CP

    • Initialize Hub: base and fabric switches; Initialize any switches not within the chassis

  • Start SLM and the ssh daemon

    • Remove the SLM configuration file for slices, may contain old mappings


Booting spp1 example configuration

b2

b2

f2/0

f2/0

f2/1

f2/1

gnm*

natd

rmp

rmp

srm

scd

scd

scd

scd

nm

nm

dhcpd

fs

plc_api

httpd

Booting SPP1: Example Configuration

CP

  • /tftpboot/

    • ramdisk.gz

    • zImage.ppm10

    • bootcd.img

    • overlay_gpe1.img

    • overlay_gpe2.img

    • pxelinux.0

    • pxelinux.cfg/

      • C0A82031

      • C0A82041

Hub

  • /etc/

    • dhcpd.conf

    • ethers

Line Card (Slot 6)

Ingress XScale

192.168.32.17

  • /var/www/html/boot/

    • index.html

    • bootmanager.sh

    • bootstrapfs-planetlab-i386.tar.bz2

eth0

b1a

lc_b1a = 192.168.32.97/20

drn05.arl.wustl.edu

128.252.153.209

cp_ctrl

192.168.32.1/20

eth2

b1

lc1_data = 171.16.1.6/26

...

the ARL network

f1/0

vlan 2

noarp

vlan 2

eth0.2

dnr05.arl.wustl.edu

Egress XScale

f1/0

eth0

cp_data = 171.16.1.1/26

eth0

eth2.2

eth0

b1b

lc_b1b = 192.168.32.98/20

128.252.153.78

128.252.153.31

GPE1 (Slot 4)

eth0:0

192.168.32.2

128.252.153.31

eth1:0

b1

eth2

gpe1_ctrl = 192.168.32.65/20

IP Routing

proxy arp for drn05

noarp

vlan 2

eth0.2

dnr05.arl.wustl.edu

f1/0

NPE (Slot 5)

eth0

gbe1_data = 171.16.1.3/26

myPLC

drn06.arl.wustl.edu

XScale A

f1/1

eth1

gpe1_int = 172.16.1.65/26

Ebony

eth0

b1a

GPE2 (Slot 3)

lc_b1a = 192.168.32.81/20

lc1_data = 171.16.1.5/26

...

f1/0

eth2

b1

gpe2_ctrl = 192.168.32.49/20

noarp

XScale B

vlan 2

eth0.2

dnr05.arl.wustl.edu

f1/0

eth0

gbe2_data = 171.16.1.4/26

eth0

b1b

lc_b1b = 192.168.32.82/20

f1/1

eth1

gpe2_int = 172.16.1.66/26


Example configuration spp3

b2

b2

f2/0

f2/0

f2/1

f2/1

gnm*

natd

rmp

rmp

srm

scd

scd

scd

scd

nm

nm

dhcpd

fs

plc_api

httpd

Example Configuration, SPP3

CP

  • /tftpboot/

    • ramdisk.gz

    • zImage.ppm10

    • bootcd.img

    • overlay_gpe1.img

    • overlay_gpe2.img

    • pxelinux.0

    • pxelinux.cfg/

      • C0A82031

      • C0A82041

Hub

  • /etc/

    • dhcpd.conf

    • ethers

Line Card (Slot 6)

Ingress XScale

the ARL network

192.168.0.17

  • /var/www/html/boot/

    • index.html

    • bootmanager.sh

    • bootstrapfs-planetlab-i386.tar.bz2

eth0

b1a

lc_b1a = 192.168.0.97/20

myPLC

drn06.arl.wustl.edu

spp3.arl.wustl.edu

128.252.153.3

cp_ctrl

192.168.0.1/20

eth2

b1

lc1_data = 171.16.1.6/26

...

f1/0

vlan 2

noarp

vlan 2

eth0.2

spp3.arl.wustl.edu

Egress XScale

f1/0

eth0

cp_data = 171.16.1.1/26

eth0

eth2.2

eth0

b1b

lc_b1b = 192.168.0.98/20

128.252.153.34

128.252.153.39

GPE1 (Slot 3)

eth0:0

128.252.153.39

eth1:0

eth2

b1

gpe1_ctrl = 192.168.0.49/20

192.168.0.2

IP Routing

proxy arp for drn05

noarp

vlan 2

eth0.2

spp3.arl.wustl.edu

f1/0

NPE (Slot 5)

eth0

gbe1_data = 171.16.1.3/26

XScale A

f1/1

eth1

gpe1_int = 172.16.1.65/26

cp5.arl.wustl.edu

eth0

b1a

GPE2 (Slot 4)

lc_b1a = 192.168.0.81/20

lc1_data = 171.16.1.5/26

...

f1/0

eth2

b1

gpe2_ctrl = 192.168.0.65/20

noarp

XScale B

vlan 2

eth0.2

spp3.arl.wustl.edu

f1/0

eth0

gbe2_data = 171.16.1.4/26

eth0

b1b

lc_b1b = 192.168.0.82/20

f1/1

eth1

gpe2_int = 172.16.1.66/26


Bootcd file system

bootcd file system

  • /

    • bin/

    • dev/

    • home/

    • lib/

    • ...

    • etc/

      • init.d/

        • pl_bootpl_netinitpl_validateconf pl_sysinit pl_hwinit

      • ...

    • ...

    • root/

    • selinux/

    • sys/

    • usr/

  • pl_boot: modified to not use ssl or pgp to retrieve BootManager script from the cp

  • pl_netinit: sets boot_server to reference the cp

  • pl_validateconf: added SPP specific variables


Overlay image

overlay image

  • /

    • etc/{issue, passwd}

    • kargs.txt

    • pl_version

    • usr/

      • isolinux

      • boot/

        • spp_netinit.py ethers spp_conf.txt

        • boot_server boot_server_portboot_server_path

        • plnode.txt cacert.pem plc_config pubring.gpg

        • backup/

          • boot_server boot_server_path boot_server_port cacert.pem pubring.gpg

      • bootme/

        • BOOTPORT BOOTSERVER BOOTSERVER_IP ID

        • cacert/drn06.arl.wustl.edu/cacert.pem

  • Changed to list cp as boot server and port as 81

  • Added SPP initialization script and config files

  • Changed plnode.txt to list this GPEs mac address for control interface


Gpe configuration file spp conf txt

GPE Configuration file: spp_conf.txt

# Config name: spp1.txt

[ nserv ]

ctrl_ipaddr=192.168.32.1

ctrl_hwaddr=00:1E:C9:FE:76:22

data_ipaddr=172.16.1.1

data_hwaddr=00:1E:C9:FE:76:23

[ domain ]

hostname=drn05

domain=arl.wustl.edu

dns1=128.252.133.45

dns2=128.252.120.1

gateway=128.252.153.31

[ hosts ]

nserv_f1.0=172.16.1.1

nserv=192.168.32.1

nserv_gbl=192.168.48.1

shmgr=192.168.48.2

hub=192.168.32.17

hub1_f1.0=172.16.1.2

hub1_m.0=192.168.48.17

gpe1_f1.0=172.16.1.3

gpe1_f1.1=172.16.1.65

gpe1_b1.0=192.168.32.65

gpe2_f1.0=172.16.1.4

gpe2_f1.1=172.16.1.66

gpe2_b1.0=192.168.32.49

npe1_f1.0=172.16.1.5

npe1_b1.0=192.168.32.81

npe1_m.0=192.168.48.81

npe1_b1.1=192.168.32.82

lc_f1.0=172.16.1.6

lc_b1.0=192.168.32.97

lc_m.0=192.168.48.97

lc_b1.1=192.168.32.98

drn05.arl.wustl.edu=128.252.153.209

[ iface ]

__name__=eth0

dev=eth0

name=gpe1_f1.0

hwaddr=00:0e:0c:85:e4:40

type=data

lanid=fabric1

port=0

vlan=0

ipaddr=172.16.1.3

ipnet=172.16.1.0

ipbcast=172.16.1.63

ipmask=255.255.255.192

arp=no

enable=yes

[ iface ]

__name__=eth0.2

dev=eth0.2

name=gpe1_f1.0

hwaddr=00:0e:0c:85:e4:40

vlan=2

type=data

lanid=fabric1

port=0

ipaddr=128.252.153.209

ipnet=128.252.0.0

ipbcast=128.252.255.255

ipmask=255.255.0.0

arp=no

enable=yes

[ iface ]

__name__=eth1

dev=eth1

name=gpe1_f1.1

hwaddr=00:0e:0c:85:e4:42

type=data

lanid=fabric1

port=1

vlan=0

ipaddr=172.16.1.65

ipnet=172.16.1.64

ipbcast=172.16.1.127

ipmask=255.255.255.192

arp=no

enable=yes

[ iface ]

__name__=eth2

dev=eth2

name=gpe1_b1.0

hwaddr=00:0e:0c:85:e4:3e

type=control

lanid=base1

port=0

vlan=0

ipaddr=192.168.32.65

ipnet=192.168.32.0

ipbcast=192.168.39.255

ipmask=255.255.248.0

arp=yes

enable=yes


Ethers

ethers

# ----------------------------------------------------------------------

# Board Type cp, Name cp1, Slot 0

# nserv_f1.0 fabric1/0

00:1E:C9:FE:76:23 172.16.1.1

# nserv base1/0

00:1E:C9:FE:76:22 192.168.32.1

# nserv_gbl maint/0

00:10:18:32:00:76 192.168.48.1

# ----------------------------------------------------------------------

# Board Type shmgr, Name shmgr1, Slot 0

# shmgr maint/0

00:50:C2:3F:D2:74 192.168.48.2

# ----------------------------------------------------------------------

# Board Type hub, Name hub1, Slot 1

# hub base1/0

00:00:50:3D:10:6B 192.168.32.17

# hub1_f1.0 fabric1/0

00:00:50:3D:10:B0 172.16.1.2

# hub1_m.0 maint/0

00:00:50:3D:10:6C 192.168.48.17

# ----------------------------------------------------------------------

# Board Type gpe, Name gpe1, Slot 4

# gpe1_f1.0 fabric1/0

00:0e:0c:85:e4:40 172.16.1.3

# gpe1_f1.1 fabric1/1

00:0e:0c:85:e4:42 172.16.1.65

# gpe1_b1.0 base1/0

00:0e:0c:85:e4:3e 192.168.32.65

# ----------------------------------------------------------------------

# ----------------------------------------------------------------------

# Board Type gpe, Name gpe2, Slot 3

# gpe2_f1.0 fabric1/0

00:0E:0C:85:E6:08 172.16.1.4

# gpe2_f1.1 fabric1/1

00:0E:0C:85:E6:0A 172.16.1.66

# gpe2_b1.0 base1/0

00:0E:0C:85:E6:06 192.168.32.49

# ----------------------------------------------------------------------

# Board Type npe, Name npe1, Slot 5

# npe1_f1.0 fabric1/0

00:00:00:00:00:00 172.16.1.5

# npe1_b1.0 base1/0

00:00:50:3d:07:3e 192.168.32.81

# npe1_m.0 maint/0

00:00:50:3D:07:3C 192.168.48.81

# npe1_b1.1 base1/1

00:00:50:3D:07:3D 192.168.32.82

# ----------------------------------------------------------------------

# Board Type lc, Name lc1, Slot 6

# lc_f1.0 fabric1/0

00:00:50:3d:0b:d4 172.16.1.6

# lc_b1.0 base1/0

00:00:50:3D:08:26 192.168.32.97

# lc_m.0 maint/0

00:00:50:3D:08:24 192.168.48.97

# lc_b1.1 base1/1

00:00:50:3D:08:25 192.168.32.98

# ----------------------------------------------------------------------

# Gateway for drn05 (128.252.153.209), VLAN 2

00:00:50:3d:0b:d4 128.252.153.31


Bootapi calls made by the bootmanager

BootAPI calls made by the BootManager

  • PLCAPI/BootAPI calls

    • GetSession(node_id, auth, node_ip)returns new session key for node

    • BootCheckAuthentication(Session)returns true if Session id is valid

    • GetNodes(Session, node_id, [‘nodegroup_ids’,‘nodenetwork_ids’,‘model’,‘site_id’])returns the indicated parameters for this node (ie. node_id).

    • GetNodeNetworks(Session, node_id, nodenetwork_ids)returns list of interfaces[ broadcast, network, ip, dns1, dns2, hostname, netmask, gateway, nodenetwork_id, method, mac, node_id, is_primary, type, bwlimit, nodenetwork_settings_ids ]

    • GetNodes(Session, node_id, ‘nodegroup_ids’)returns list of group ids associated with this node

    • GetNodeGroups(Session, nodegroup_id, ‘name’)returns the name string for each node group (in out case ‘SPP’)

    • GetNodeNetworkSettings()

    • BootUpdateNode(Session, boot_state)Sets node’s boot state at PLC

    • BootNotifyOwners(Session, “event”, params)causes email to be sent to the list of node owners.

    • BootUpdateNode(Session, ssh_host_key)records the latest ssh public key for node.


Other plc server interactions

Other PLC/Server interactions

  • HTTP/HTTPS

    • Upload alpina boot logs:BOOT_SERVER_URL += /alpina-logs/upload.php

    • Compatibility step (we don’t use)BOOT_SERVER_URL +=/alpina-BootLVM.tar.gzBOOT_SERVER_URL +=/alpina-PartDisk.tar.gz

    • Download file system tar file containing basic plab node environmentBOOT_SERVER_URL += /boot/bootstrapfs-”group”-”arch”.tar.bz2

    • If not in config file get node idBOOT_SERVER_URL += /boot/getnodeid.php

    • Get yum update configuration file:BOOT_SERVER_URL += /PlanetLabConf/yum.conf.php


System initialization stage 1

System Initialization: Stage 1

  • Use PXE boot and download pxelinux and config file:

    • boot using basic initial ramdisk, overlay and kernel

    • Use dhcp, tftp and pxe server on the cp, files stored in the tfptboot directory.pxelinux.o, pxelinux.cfg/<GPE_IPADDR>bootcd.img,overlay_gpeX.img,kernel

    • The overlay image is modified for each GPE to include it’s configuration file, modified planetlab config files and an spp node python script.

      • Currently this is a manual step but ultimate (long term) plan is for the gnm daemon to create the individual images

      • The overlay image contains several files that identify the node and provide the name and address for the PLC and Boot servers. I have modified these to point o the cp.

      • Just before booting the final kernel I change these values to refer to the “real” plc/api servers.


System initialization stage 2

System initialization: Stage 2

  • Boot into basic, intermediate environment

  • Initial configuration information obtained from the overlay image

    • Includes spp_conf.txt defines gpe interfaces

    • Includes ethers file contains mac addresses for static arp entries

    • Updated plnode.txt with GPE’s control interface mac address

    • Modified bootserver files listing the cp as the bootserver

    • Includes spp_netinit.py, a python script to configure the interfaces and update system configuration files.

  • Enables “primary” interface and key network configuration files such as resolv.conf

  • Downloads BootManager source from the “boot_server”

    • In our case we download from the CP

    • I explicitly disable the use of ssl and certs (the certifictes on the overlay image are for the PLC server and not the CP)

    • Our assumption is that the control (base) network is “secure” plus within an SPP node we don’t have to worry about authentication issues.


Bootmanager

BootManager

  • Opens connection to PLCAPI on bootserver

    • Opens connection to our proxy plcapi/bootapi server running on the CP

  • Get node session key: GetSession(node_id, auth, node_ip)

    • Since each call to create a session invalidates any existing keys we intercept this call on the cp and use a common session key for all gpes.

  • Determines node’s configuration

    • reads plnode.txt for node_id, node_key and the primary interface settings

      • we use DHCP to configure the control interface but I do not define a dns server

    • if node_id is not found then reads URL=BootServer/boot/getnodeid.php

  • Call BootCheckAuthentication(Session) to verify session key

  • Calls GetNodes to get the boot_state, node_groups, model, site_id

  • Calls GetNodeNetworks to get configuration information for all interfaces

    • in our case the call would return the externally visible network parameters, which differ from how each GPE is configured

    • long term, we can intercept this call and return GPE specific interface config info.

    • Short term we use a configuration file in the overlay image with similarly formatted information. I have replaced the BootManager code that reads the config info and configures the interfaces.

    • I had to add support for VLANs and our internal interfaces.


Bootmanager continued

BootManager Continued

  • Download the nodes final filesystem image from the boot_server

    • in our case this is the CP, http://CP/boot/bootstrap-planetlab-i386-tar.bz2

  • Download yum config file

    • I am not currently downloading, http://CP/PlanetLabConf/yum.conf

  • Call BootUpdateNode with new boot_state

    • we will need to intercept this call and both report and set node state based on all GPEs.

  • Call BootNotifyOwners with new state

    • forward to PLC

  • Update network configuration in new “sysimg”

    • downloads //BootServer/ PlanetLabConf/plc_config file

      • In our case I have copied onto the overlay image in the /usr/boot directory.

    • calls GetNodeNetworkSettings for a list of any additional interface attributes then creates various configuration files: hosts, resolv.conf, network, ifcfg-eth*

      • I have replaced this step with our own script spp_netinit.py and configuration file spp_conf.txt which I use to create the same config files in both the current environment and the new sysimg.

    • updates devices and creates the initrd image used for the next stage

    • finally boots a new kernel using the bootstrap file system


Boot states

Boot States

  • The list of boot states is changing as I write this

  • In our version of the plc the states are shown on the right


Plc database

PLC Database

  • The PlanetLab central database keeps a database describing all nodes, slices and users/people.

  • Slice data base keeps track of all slices and their node bindings

  • The Node database includes externally visible properties and the ability to associate general attributes with these properties

    • the current (or next) node state (boot_state)

    • node identifier (node_id)

    • list of interface configuration parameters

      • ip address information, mac address, generic list of attributes

    • node’s owner

    • node’s site identifier (site_id)

    • model, can be used to specify a set of attributes forthe node. For example: minhw, smp

    • current ssh host key (ssh_host_key)

    • node groups: I believe this is being depricated in favor of associate a generic set of attributes with a node or its interfaces.


Spp specific information

SPP Specific Information

  • On an SPP node the resource manager needs to know what kind of board is inserted in each slot and its I/O characteristics

  • Needs to associate interface MAC addresses with boards and interfaces. Or with standalone system connected to an RTM or front panel (for example the CP).

  • Also need to know which interfaces are connected to the base and which to the fabric switch when bringing up general purpose systems.

  • There is not a convenient mechanism for determining this at run time so I have a configuration file.

  • Also need to know what resources are available on each board and allocation policies.

  • Must also have a list of external links, their addresses and the address of any peers (Ethernet).

  • Need to keep track of current nodes state (as kept by PLC) as well as the state of each individual board.

  • Need to share state between different daemons


Node configuration file

Node Configuration File

<?xml version="1.0" encoding="utf-8" standalone="yes"?>

<spp>

<code_options>

<IPv4 sram="fixed" queues="variable" id="0" fltrs="variable"> <sram> 1024 </sram> </IPv4>

<I3 sram="fixed" queues="variable" id="1" fltrs="variable"> <sram> 1024 </sram> </I3> </code_options>

<components>

<cp name="cp1" slot="0" cat="host" alias="nserv">

<interface name="nserv_f1.0" dev="GigE" lanid="fabric1" assoc="" port="0"> ... </interface> ... </cp>

<shmgr name="shmgr1" slot="0" cat="atca" alias="shmgr1">

<interface name="shmgr" dev="GigE" lanid="maint" assoc="" port="0"> ... </interface> ... </shmgr>

<hub name="hub1" slot="1" cat="atca" alias="hub1">

<switch lanid="base1"> </switch> <switch lanid="fabric1"> <bw> 10000000000 </bw> </switch>

<interface name="hub" dev="GigE" lanid="base1" assoc="" port="0"> ... </interface> ... </hub>

<gpe name="gpe1" slot="4" cat="atca" alias="gpe1">

<interface name="gpe1_f1.0" dev="GigE" lanid="fabric1" assoc="" port="0"> ... </interface> ... </gpe>

<npe name="npe1" slot="5" cat="atca" alias="npe1">

<product> Radisys_7010 </product> <model> NPEv1 </model>

<interface name="npe1_f1.0" dev="10GigE" lanid="fabric1" assoc="" port="0"> ... </interface> ... </npe>

<lc name="lc1" slot="6" cat="atca" alias="lc">

<product> Radisys_7010 </product>

<model> LCv1 </model>

<interface name="lc_f1.0" dev="10GigE" lanid="fabric1" assoc="" port="0"> ... </interface> ...

<interface name="drn05" dev="GigE" lanid="external" port="0"> ...

<link peering="true" primary="true" dev="GigE"> ... </link> ... </interface></lc>

</components>

</spp>


Cp record

CP Record

<!-- Interface parameters defined by user in original “xml” file -->

<cp name="cp1" slot="0" cat="host" alias="nserv">

<interface name="nserv_f1.0" dev="GigE" lanid="fabric1" assoc="" port="0">

<!-- All internal IP addrs assigned by configuration software based on runtime parameters -->

<ipaddr>172.16.1.1</ipaddr> <ipnet>172.16.1.0</ipnet>

<ipmask>255.255.255.192</ipmask> <ipbcast>172.16.1.63</ipbcast>

<!-- Device parameters and comment set by user in the original “xml” file -->

<device> eth0 </device> <hwaddr> 00:1E:C9:FE:76:23 </hwaddr>

<desc> Interface connected to HUB's fabric port </desc>

</interface>

<interface name="nserv" dev="GigE" lanid="base1" assoc="" port="0">

<ipaddr>192.168.32.1</ipaddr> <ipnet>192.168.32.0</ipnet>

<ipmask>255.255.248.0</ipmask> <ipbcast>192.168.39.255</ipbcast>

<device> eth1 </device> <hwaddr> 00:1E:C9:FE:76:22 </hwaddr>

<desc> System control processor's Base Ethernet connection </desc>

</interface>

<interface name="nserv_gbl" dev="GigE" lanid="maint" assoc="" port="0">

<ipaddr>192.168.48.1</ipaddr> <ipnet>192.168.48.0</ipnet>

<ipmask>255.255.248.0</ipmask> <ipbcast>192.168.55.255</ipbcast>

<device> eth2 </device> <hwaddr> 00:10:18:32:00:76 </hwaddr>

<desc> Connection to the maintenance ports </desc>

</interface>

</cp>


Gpe record

GPE Record

<gpe name="gpe1" slot="4" cat="atca" alias="gpe1">

<interface name="gpe1_f1.0" dev="GigE" lanid="fabric1" assoc="" port="0">

-- IP Address Info --

<device> eth0 </device> <hwaddr> 00:0e:0c:85:e4:40 </hwaddr> (Device Data)

<bw> 1000000000 </bw><share> 2 </share> (Resource Policy)

<desc> MAC=N+2, Fabric 1/0 or AMC Port 0 </desc></interface>

<interface name="gpe1_f1.1" dev="GigE" lanid="fabric1" assoc="" port="1">

-- IP Address Info --<device> eth1 </device> <hwaddr> 00:0e:0c:85:e4:42 </hwaddr>

<desc> MAC=N+4, Fabric 1/1 or Maintenance Port 1 </desc></interface>

<interface name="gpe1_b1.0" dev="GigE" lanid="base1" assoc="" port="0">

-- IP Address Info --<device> eth2 </device> <hwaddr> 00:0e:0c:85:e4:3e </hwaddr>

<desc> MAC=N, Base connection to Primary HUB </desc></interface>

<interface name="gpe1_b2.0" dev="GigE" lanid="base2" assoc="" port="0">

-- IP Address Info --<device> eth3 </device> <hwaddr> 00:0e:0c:85:e4:3f </hwaddr>

<desc> MAC=N+1, Base connection to alternate HUB </desc></interface>

<interface name="gpe1_f2.0" dev="GigE" lanid="fabric2" assoc="" port="0">

-- IP Address Info --<device> eth4 </device> <hwaddr> 00:0e:0c:85:e4:41 </hwaddr>

<desc> MAC=N+3, Fabric 2/0 or AMC Port 1 </desc></interface>

<interface name="gpe1_f2.1" dev="GigE" lanid="fabric2" assoc="" port="1">

-- IP Address Info --<device> eth5 </device> <hwaddr> 00:0e:0c:85:e4:43 </hwaddr>

<desc> MAC=N+5, Fabric 2/1 or Maintenance Port 2 </desc></interface>

</gpe>


Npe record

NPE Record

<npe name="npe1" slot="5" cat="atca" alias="npe1">

<product> Radisys_7010 </product> <model> NPEv1 </model>

<interface name="npe1_f1.0" dev="10GigE" lanid="fabric1" assoc="" port="0">

-- IP Address Info --

-- Device Data --

-- Resource Policy --

<desc> Fabric interface used for both NPUs </desc></interface>

<interface name="npe1_b1.0" dev="GigE" lanid="base1" assoc="npua" port="0">

-- IP Address Info --

-- Device Data --

<desc> Primary control interface associated with NPUA </desc></interface>

<interface name="npe1_m.0" dev="GigE" lanid="maint" assoc="npua" port="0">

-- IP Address Info --

-- Device Data --

<desc> NPUA Front Maintenance Port </desc></interface>

<interface name="npe1_b1.1" dev="GigE" lanid="base1" assoc="npub" port="1">

-- IP Address Info --

-- Device Data --

<desc> NPUB Front Maintenance Port -- But it's been patched to the Base switch </desc>

</interface>

</npe>


Lc record

LC Record

<lc name="lc1" slot="6" cat="atca" alias="lc">

<product> Radisys_7010 </product> <model> LCv1 </model> (Model Data)

<interface name="lc_f1.0" dev="10GigE" lanid="fabric1" assoc="" port="0">

-- IP Address Info -- -- Device Data -- -- Resource Policy --</interface>

<interface name="lc_b1.0" dev="GigE" lanid="base1" assoc="npua" port="0">

-- IP Address Info -- -- Device Data -- </interface>

<interface name="lc_m.0" dev="GigE" lanid="maint" assoc="npua" port="0">

-- IP Address Info -- -- Device Data -- </interface>

<interface name="lc_b1.1" dev="GigE" lanid="base1" assoc="npub" port="1">

-- IP Address Info -- -- Device Data --</interface>

<interface name="drn05" dev="GigE" lanid="external" port="0">

<hwaddr> 00:00:50:29:b1:46 </hwaddr>

<link peering="true" primary="true" dev="GigE">

-- Link IP Address Info -- -- Device Data -- -- Resource Policy --

<domain> arl.wustl.edu </domain> <hostname> drn05 </hostname>

<dns1> 128.252.133.45 </dns1> <dns2> 128.252.120.1 </dns2>

<peerIP> 128.252.153.31 </peerIP> <peerMAC> 00:0F:B5:FB:D8:67 </peerMAC>

<vlan> 2 </vlan>

<port_pool> <!-- used for NAT -->

<udp count="500" start="30000"> </udp>

<tcp count="500" start="30000"> </tcp> </port_pool>

<desc> p2p link from drn05 to drn06, the plc </desc></link></interface>

</lc>


Srm interface

SRM Interface

  • NATD to SRM:

    • [egress_map, ingress_map]

    • get_sched_map(LinkIP, BoardMAC)

    • Depricated: original natd interface!

    • {fid, port} alloc_epmap(map)

    • status free_epmap(fid)

  • FS to SRM:

  • ?? (map vlan to slice id)

  • RMP to SRM:

  • Interfaces (Line Card Links):

    • if_list get_interfaces(plabID)

    • ifn get_ifn(plabID, ipaddr)

    • if_entry get_ifattrs(plabID, ifn) :

    • ipaddr get_ifpeer(plabID, ifn) :

    • retcode resrv_fpath_ifbw(bw, ifn)

    • retcode reles_fpath_ifbw(bw, ifn)

    • To be implemented:

    • retcode resrv_slice_ifbw(plabID, bw, ifn)

    • retcode reles_slice_ifbw(plabID, bw, ifn)

  • EndPoints (local IP and Port number):

    • NATD changes may have broken these

    • ep alloc_endpoint(PlabID, ep)

    • status free_endpoint(PlabID, ipaddr,

    • port, proto)

  • Fast Path:

    • fp_params alloc_fastpath(PlabID,

    • copt, bwspec,rcnts, mem)

    • status free_fastpath()

  • Fast-Path Meta-Interfaces:

    • [mi, ep] alloc_udp_tunnel(bw, ipaddr, port)

    • ep get_endpoint(mi)

    • status free_udp_tunnel(ipaddr, port)


Control update focus on planetlab integration and booting

RMP Interface

  • Prototype completed:

    • result noop()

    • version get_version()

    • result add_slice(plabID, len, name)

    • result rem_slice(plabID)

    • ret_t alloc_fastpath(copt, bw, rcnts, mem)

    • void free_fastpath()

    • if_list get_interfaces()

    • ifn get_ifn(ipaddr)

    • if_entry get_ifattrs(ifn)

    • ipaddr get_ifpeer(ifn)

    • retcode alloc_pl_ifbw(ifn, bw)

    • retcode reles_pl_ifbw(ifn, bw)

    • retcode alloc_fpath_ifbw(fpid, ifn, bw)

    • retcode reles_fpath_ifbw(fpid, ifn, bw)

    • retcode bind_queue(fpid, miid, list_type, qids)

    • actual_bw set_queue_params(fpid, qid, threshold, bw)

    • [threshold, bw] get_queue_params(fpid, qid)

    • [u32 Pkts, u32 Bytes] get_queue_len(fpid, qid)

  • To do:

    • ep alloc_endpoint(ep)

    • status free_endpoint(ipaddr, port, proto)

    • -- alloc_tunnel --

    • -- free_tunnel --

    • [mi, ep] alloc_udp_tunnel(fpid, bw, ip, port)

    • status free_udp_tunnel(ipaddr, port)

    • ep get_endpoint(fpid, mi)

    • retcode write_fltr(fpid, fid, fltr)

    • retcode update_result(fpid, fid, result)

    • fltr_t get_fltr_bykey(fpid, key)

    • fltr_t get_fltr_byfid(fpid, fid)

    • result lookup_fltr(fpid, key)

    • retcode rem_fltr_bykey(fpid, key)

    • retcode rem_fltr_byfid(fpid, fid)

    • stats_t read_stats(fpid, sindx, flags)

    • result clear_stats(sindx)

    • handle create_periodic(fp,indx,P,cnt,flags)

    • retcode delete_periodic(fpid, handle)

    • retcode set_callback(fpid, handle, xport)

    • stats_t get_periodic(fpid, handle)

    • retcode mem_write(fpid, offset[, len], data)

    • data mem_read(fpid, offset, len)


Npe scd interface

NPE SCD Interface

SRM to SCD

status set_fastpath(fpid, copt, VLAN, params, mem)

status enable_fastpath(fpid)

status disable_fastpath(fpid)

status rem_fastpath(fpid)

status set_sched_params(sid, ifn, BWmax, BWmin)

status set_encap_cb(sid, srcIP, dMAC)

status set_fpmi_bw(fpid, sid, miid, bw)

status start_mes()

status stop_mes()

status set_encap_gpe(fpid, gpeIP, npeIP)

result write_mem(kpa, len, data)

data read_mem(kpa, len)

SRM & RMP to SCD

ret_t write_fltr(dbid, fid, key, mask, result)

ret_t update_result(dbid, fid, result)

fltr get_fltr_bykey(dbid, key)

fltr get_fltr_byfid(dbid, fid)

result lookup_fltr(dbid, key)

retcode rem_fltr_bykey(dbid, key);retcode rem_fltr_byfid(dbid, fid)

RMP to SCD

status set_gpe_info(exPort, ldPort,

exQID, ldQID)

u32 result bind_queue(u16 miid,

u8 list_type,

u16[] qid_list)

u32 bw set_queue_params(u16 qid,

u32 threshold, u32 bw)

{u32 threshold, u32 bw} get_queue_params(u16 qid)

{u32 pktCnt, u32 byteCnt}

get_queue_len(u16 qid)

result write_sram(offset, len, data)

data read_sram(offset, len)

stats = read_stats(sindx, flags)

result = clear_stats(sindx)

handle

create_periodic(sindx, P, cnt, flags)

retcode del_periodic(handle)

retcode set_callback(handle, udp_port)

stats = get_periodic(handle)


Lc scd interface

LC SCD Interface

SRM to SCD

status set_sched_params(sid, ifn, BWmax, BWmin)

status set_sched_mac(sid, MACdst, MACsrc)

u32 result set_queue_sched(u16 qid, u16 sid)

result write_mem(kpa, len, data)

data read_mem(kpa, len)

SRM and RMP to SCD:

ret_t write_fltr(dbid, fid, key, mask, result)

ret_t update_result(dbid, fid, result)

fltr get_fltr_bykey(dbid, key)

fltr get_fltr_byfid(dbid, fid)

result lookup_fltr(dbid, key)

retcode rem_fltr_bykey(dbid, key);retcode rem_fltr_byfid(dbid, fid)

RMP to SCD

u32 actual_bw set_queue_params(u16 qid,

u32 threshold, u32 bw)

{u32 threshold, u32 bw}

get_queue_params(u16 qid)

{u32 pktCnt, u32 byteCnt}

get_queue_len(u16 qid)

stats = read_stats(sindx, flags)

result = clear_stats(sindx)

handle create_periodic(sindx, P, cnt, flags)

retcode del_periodic(handle)

retcode set_callback(handle, udp_port)

stats = get_periodic(handle)


Slice example

Slice Example

  • Get list of interfaces, their Ip addresses and available bandwidth

    if_list = {if_entry, ...}

    if_entry = {u16 ifn, // logical interface number

    u16 type, // peering or multi-access

    u32 ipaddr, // interface’s IP address

    u32 linkBW, // Link’s native BW

    u32 availBW} // BW available for allocation

    struct epoint_t {u32 bw,

    u32 ipaddr; // interface’s IP address

    u16 port, // UDP port number for meta-interface

    u32 bw;} // total BW required for meta-interface

    iflist = get_interfaces(iflist); // return list of all available interfaces

  • Estimate the computational complexity and memory bandwidth requirements on NPE.

    bwSpec = {BWmax=totalBW, BWmin=0}; // fast path total BW requirement

  • max general NPE resource counts for this example I just assume a max number but in general it may be that a user scales it by the number of meta-interfaces they will use.

    fpCounts = {FLTR_CNT, QID_CNT, BUFF_CNT, STATS_CNT};

  • Request substrate to allocate a fastpath instance for the IPv4 code option, assume we will use the default sram buffer sizes. Will also need to listen to returned sockes.

    [fpid, sockets] = alloc_fastpath(ipv4_copt, bwSpec, fpCnts, {IPV4_SRAM_SZ, 0});


Slice example continued

Slice Example - Continued

  • allocate one meta-interfaces for each external interface and assign our default UDP port number and BW requirement

    struct mi_t {uint_t mi; epoint_t rp;};

    mi_t milist[iflist.len()];

    for (indx = 0, mi = 0; indx < len(iflist); ++indx) {

    if (miBW > iflist[indx].availBW) throw Error;

    // allocate total BW required on this interface

    if (alloc_fpath_ifbw(fpid, iflist[indx].ifn, miBW)==-1)

    throw Error;

    // Allocate one meta-interface on this interface

    milist[indx] = alloc_udp_tunnel(fpid, miBW,

    iflist[indx].ipaddr,

    myPort)

    my_bind_queues(milist+indx);

    my_add_routes(milist+indx);

    }


Test spp node

natd

scd

srm

scd

Test SPP Node

Line Card (Slot 6)

CP

Hub

Ingress XScale

keystone.arl.wustl.edu

128.252.153.81

  • /etc/

    • dhcpd.conf

    • ethers

    • hosts

0/6

192.168.64.17

  • /tftpboot/

    • ramdisk.gz

    • zImage.ppm10

the ARL network

128.252.153.*

eth0

b1a

lc_b1a = 192.168.64.97/20

dhcpd

vlan 2

lc1_data = 171.16.1.6/26

...

0/6

f1/0

FP

1/6

Egress XScale

eth2

b1

cp_ctrl = 192.168.64.1/20

eth0

eth2.2

FP

1/7

noarp

128.252.153.YYY

128.252.153.XXX

RTM

3/1

vlan 2

eth0

b1b

keystone.arl.wustl.edu

lc_b1b = 192.168.64.98/20

eth0.2

eth0:0

f1/0

FP

1/9

128.252.153.XXX

eth0

cp_data = 171.16.1.1/26

192.168.64.2

eth1

GPE4 (Slot 5)

GPE1 (Slot 2)

/etc/{ethers,hosts}

/etc/{ethers,hosts}

IP Routing

proxy arp for keystone

/etc/sysconfig/network-scripts/ifcfg-eth*

/etc/sysconfig/network-scripts/ifcfg-eth*

0/5

2/1

eth2

b1

b2

eth2

gpe2_ctrl = 192.168.64.81/20

gpe1_ctrl = 192.168.64.33/20

noarp

noarp

RTM

3/2

“Router”

vlan 2

vlan 2

0/5

eth0.2

eth0.2

keystone.arl.wustl.edu

keystone.arl.wustl.edu

f1/0

f1/1*

eth0

eth0

gbe2_data = 171.16.1.5/26

gbe1_data = 171.16.1.2/26

Issue

Mounting /opt/crossbuild/* from ebony. Could export dirs form the “Router” host. Or could use ebony rather than “Router”. In that case will need an external switch connecting line cards of spp? to ebony’s eth2.2.

eth1

f1/1

f2/1*

eth1

gpe2_int = 172.16.1.69/26

gpe1_int = 172.16.1.66/26

GPE3 (Slot 4)

GPE2 (Slot 3)

/etc/{ethers,hosts}

/etc/{ethers,hosts}

/etc/sysconfig/network-scripts/ifcfg-eth*

/etc/sysconfig/network-scripts/ifcfg-eth*

0/3

0/4

b1

eth2

eth2

b1

gpe2_ctrl = 192.168.64.49/20

gpe2_ctrl = 192.168.64.65/20

noarp

noarp

vlan 2

vlan 2

0/3

0/4

eth0.2

eth0.2

keystone.arl.wustl.edu

keystone.arl.wustl.edu

f1/0

f1/0

eth0

eth0

gbe2_data = 171.16.1.3/26

gbe2_data = 171.16.1.4/26

f1/1

eth1

eth1

f1/1

gpe2_int = 172.16.1.67/26

gpe2_int = 172.16.1.68/26


Test bed use

Test Bed Use

  • Core platform issues:

    • Can we use the second fabric port on the GPE boards?

    • The hub does not display stats or mac fwd entries for the slots with GPEs. It used to work.

    • The radisys shelf manager

      • does not reliably reset boards

      • Base1 interface disabled on slot 2

  • NAT/Line Card testing

    • Overall reliability

    • Add support for aging

    • Specific issues (jdd)

      • restarting line card (without reboot) occasionally results in data-path thinking the scratch ring to the xscale is full.

      • looping iperf test from cp occasionally stalls with no packets getting through LC

      • Lookup needs fix to not use DONE bit to indicate a tcam lookup is done.

  • GPE/Intel board testing


  • Login