session e zoni
Download
Skip this Video
Download Presentation
Session E: Zoni

Loading in 2 Seconds...

play fullscreen
1 / 58

Session E: Zoni - PowerPoint PPT Presentation


  • 107 Views
  • Uploaded on

Session E: Zoni. Zoni. Richard Gass Intel. Sessions: (A) Intro 8.30-9.00 (B) Hadoop 9.00-10.00 Break 10.00-10.30 Hadoop 10.30-12:00 Lunch 12.00-1.30 Pig 1.30-2.00 (D) Tashi 2.00-3.00 Break 3.00-3.30

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Session E: Zoni' - oralee


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide2

Zoni

Richard Gass

Intel

agenda
Sessions:

(A) Intro 8.30-9.00

(B) Hadoop 9.00-10.00

Break 10.00-10.30

Hadoop 10.30-12:00

Lunch 12.00-1.30

Pig 1.30-2.00

(D) Tashi 2.00-3.00

Break 3.00-3.30

Zoni 3.30-4.45

Wrap up 4.45-5.00

Overview

Plans/Status

User View

Administration

Installation

Summary

Agenda
open cirrus stack
Open Cirrus Stack

Compute + network +

storage resources

Management and

control subsystem

Power + cooling

Physical Resource set (Zoni) service

Credit: John Wilkes (HP)

open cirrus stack1
Open Cirrus Stack

Zoni clients, each with theirown “physical data center”

Eucalyptus

Tashi/HDFS

NFS storage

service

Experiment

Zoni service

open cirrus stack2
Open Cirrus Stack

Virtual clusters

Virtual cluster

Virtual cluster

Eucalyptus

Tashi/HDFS

NFS storage

service

Experiment

Zoni service

open cirrus stack3
Open Cirrus Stack

Application running

On Hadoop

On Tashi virtual cluster

On Zoni

On real hardware

Web Service

BigData App

Hadoop

Virtual cluster

Virtual cluster

Eucalyptus

Tashi/HDFS

NFS storage

service

Experiment

Zoni service

open cirrus stack zoni
Zoni serviceOpen Cirrus stack - Zoni
  • Initial PRS implementation from HP
  • Re-write from Intel (in collaboration
  • with HP) soon to be contributed to Apache Software Foundation
  • Zoni service goals
    • Provide mini-datacenters to users
    • Isolate mini-datacenters from each other
  • Zoni service approach
    • Allocate sets of physical co-located nodes, isolated inside VLANs.
  • Allow running without virtualization overhead
    • Necessary for predictable QoS
      • e.g. cache interference
goals
Goals
  • Reduce complexity in allocating physical resources
  • Gain User Confidence
    • Show users that we can efficiently allocate/deallocate resources
  • Stop the squatting
    • Incentives
      • HP’s tycoon (economic model)
      • Simple points scheme for good behavior or early return
responsibilities of zoni
Isolate domains

Provision system software

Provide platform control

On/Off

Provide boot debug

 VLAN

 PXE

IPMI

 IPMI

Responsibilities of Zoni
slide12
VLAN
  • Virtual LAN technology allows a single physical network to appear as several isolated networks
    • Ethernet packets are tagged with a VLAN id
    • Switches and NICs enforce the policies associated with each VLAN
  • By associating Zoni domains with different VLANs, they can be isolated from each other
  • The Zoni system provides the interfaces necessary to abstract switch configuration programming across multiple switch vendors
slide13
Pre-

eXecution

Environment

PXE
  • Enables provisioning of OS image over the network
  • On machine boot, the NIC firmware contacts a PXE server via the DHCP process for the appropriate kernel and initrd to load
  • Once loaded, the init scripts in the initrd can pull the filesystem to the machine
  • In our environment, we download the desired filesystem to a ramdisk from a NFS server– enabling a very rapid provisioning (30 seconds or less) while leaving the host filesystem undisturbed
slide14
Intelligent

Platform

Management

Interface

IPMI
  • Defines a standardized, abstracted, message-based interface to intelligent platform management hardware
  • Defines standardized records for describing platform management devices and their characteristics
  • Operates independently of the operating system
  • Enables cross-platform management
some history
Some History
  • Previous prototype developed at HP Labs
  • Focus on economic model
  • Nice web interface which will be available upon reconvergence of code
zoni roadmap
Zoni Roadmap
  • Stage 1
      • Manages all cluster hardware
      • Handles resource provisioning
      • Provides interfaces for VLAN definition/programming
      • Administrator is still in the allocation decision-making loop
  • Stage 2
      • Introduces a request queue and primitive scheduler
      • Admin may still be in loop, definitely for special cases
      • Enables provisioning of OS to local disk
      • Enables virtual disk conversion to physical
  • Stage 3
      • Incentives module added (Tycoon)
      • Tashi integration
zoni roles
Zoni Roles
  • Admin: root of all authority
    • Controls the physical resources
  • User: requests domains
    • Controls the domain, once allocated
domains
Domains
  • A Domain is the unit of Zoni isolation
  • A simple domain is a set of compute nodes gathered into a single VLAN
  • Nodes are allocated from pools of available resources
zoni domains
Zoni Domains *

ISOLATION

Domain 1 Services

Server Pool 1

Gateway

Domain 0 Services

DNS

PXE

DHCP

HTTP

Domain 1

Domain 0

DNS

PXE

DHCP

HTTP

Server Pool 0

Server

Pool 0

the zoni interface
The Zoni Interface
  • Users and Admins currently interact with the Zoni system through a command line interface
  • This interface both:
    • Queries and updates records in the Zoni database
    • Wraps the various commands that must be issued to effect changes in the cluster
  • Zoni is currently a centralized system; users log into the Zoni manager to issue commands
    • An RPC interface is planned for the near future
zoni usage
Zoni Usage

Usage: zoni

Standard options:

--help [show this help message and exit]

--version [show program's version number and exit]

--verbose [be verbose]

Common options:

--nodeName [Specify node]

--switchPort [Specify switchport switchname:portnum]

image management interface
Image Management Interface

--addImage [Add image to Zoni]

--delImage [Delete image]

user allocation interface
User Allocation Interface

--createDomain

  • May fail if name already exists

--submitDomainRequest

--destroyDomain –domain

--requestNodes --domain [--count ] [--nodeName ] [--cores …]

  • Add the requested nodes to the domain

--assignImage

  • Assign image to resource

--associateNewVlan –domain

  • Allocate an unused VLAN number to domain

--createReservation

  • Specify duration of node reservation where start time may be “ASAP”

--reservationNotes “notes”

--updateReservation

admin allocation interface
Admin Allocation Interface

--allocateNode [Assign node to a user]

--releaseNode [Release node allocation]

--vlanIsolate [Specify vlan for isolation]

hardware control
Hardware Control

--hardware [Make hardware call]

--powerStatus [Get power status]

--rebootNode [Reboot node (Soft)]

--powerCycle [Power Cycle (Hard)]

--powerOff [Power off node]

--powerOn [Power on node]

query interface
Query Interface

--showReservations [Show current node reservations]

--showResources [Show available resources to choose from]

--procs [Filter by number of processors]

--clock [Filter by processor clock]

--memory [Filter by amount of memory (Bytes)]

--cpuflags “flags” [Filter by CPU flags]

--cores [Filter by number of cores]

--showPxeImages [Show available PXE images to choose from]

--showPxeImageMap [Show PXE images host mapping]

administration interface
Administration Interface

--admin Enter Admin mode

--addPxeImage [Add PXE image to database]

--enableHostPort [Enable a switch port]

--disableHostPort [Disable a switch port]

--removeVlan [Remove vlan from all switches]

--createVlan [Create a vlan on all switches]

--addNodeToVlan [Add node to a vlan]

--removeNodeFromVlan [Remove node from a vlan]

--setNativeVlan [Configure native vlan]

--restoreNativeVlan [Restore native vlan]

--removeAllVlans [Removes all vlans from a switchport]

--sendSwitchCommand “” [Send Raw Switch Command, BE CAREFUL]

--interactiveSwitchConfig “” [Interactively configure a switch]

--showSwitchConfig [Show switch config for node]

typical workflow
Typical Workflow
  • Admin queries available systems
  • Admin requests systems with desired user configuration
      • i.e., cores, memory, image, duration, etc
  • Request goes in queue
  • Zoni locates resources and provides a list to admin/Tashi.
  • Admin/Tashi moves VMs to free resources
      • Add node to blacklist and tell hadoop to reload
  • Zoni allocates resources
      • Provides estimated time to get resources
      • User can query
      • Zoni sends notification when allocated
  • Zoni reclaims resources and adds them back into respective pools
      • User may extend time period before expiration
slide36
System Servers

Zoni client

queries Zoni server

for available

resources

User chooses

machine attributes

and submits a request

for the resources

for some

time period

Zoni queries DB

to locate available

resources

VM

VM

VM

VM

VM

Management Servers

Results are sent back to the client

VM

VM

VM

VM

VM

VM

VM

VM

DB

VM

VM

VM

VM

Zoni server

VM

Node 1 : 8 Core, 16G memory, 6TB disk,30day

Node 2 : 8 Core, 16G memory, 6TB disk,30 day

Node 3 : 8 Core, 16G memory, 6TB disk,90 day

Node 4 : 8 Core, 16G memory, 6TB disk,1 day

Node 5 : 8 Core, 8G memory, 2TB disk, 90 day

Node 6 : 8 Core, 8G memory, 2TB disk,90 day

Node 7 : 8 Core, 8G memory, 2TB disk,90 day

Node 8 : 8 Core, 8G memory, 2TB disk,90 day

Node 9 : 8 Core, 8G memory, 2TB disk,90 day

Node 10: 8 Core, 8G memory, 2TB disk,30 day

Tashi Cluster

Manager

VM

VM

VM

VM

VM

VM

Zoni client

PXE server

Administrator

or

Cluster Manager

VM

VM

VM

VM

VM

slide37
Request Queue

System Servers

VM

VM

VM

VM

VM

Management Servers

VM

VM

VM

VM

VM

VM

VM

VM

DB

VM

VM

VM

R1

VM

Zoni server

VM

Tashi Cluster

Manager

VM

VM

VM

VM

VM

VM

Zoni client

PXE server

Administrator

or

Cluster Manager

VM

VM

VM

VM

VM

slide38
System Servers

VM

VM

VM

VM

VM

VM

VM

VM

VM

VM

Management Servers

VM

VM

VM

Zoni processes request and identifies physical machines that satify the user request

VM

VM

VM

VM

VM

VM

DB

VM

VM

VM

VM

Zoni server

VM

Tashi Cluster

Manager

VM

VM

VM

VM

VM

VM

Zoni client

PXE server

Administrator

or

Cluster Manager

VM

VM

VM

VM

VM

slide39
System Servers

VM

VM

VM

VM

VM

Management Servers

VM

VM

VM

VM

VM

VM

VM

VM

VM

VM

VM

Zoni sends request

to Tashi to free selected nodes

VM

VM

DB

VM

VM

VM

VM

VM

Zoni server

Tashi moves virtual machines off of selected nodes

VM

Tashi Cluster

Manager

VM

VM

VM

VM

VM

VM

Zoni client

PXE server

Administrator

or

Cluster Manager

VM

VM

VM

VM

VM

slide40
System Servers

VM

VM

VM

VM

VM

Management Servers

VM

VM

VM

VM

VM

VM

VM

Physical machines boot up with PXE image

VM

Zoni allocated the physical machines to the requested user and isolates them from the network using VLANs

Zoni reboots the physical machine and sets PXE image to users VM

DB

VM

VM

VM

VM

Zoni server

Tashi notifies Zoni that migration of virutal machines has completed

VM

VM

VM

Tashi Cluster

Manager

VM

VM

VM

VM

VM

VM

VM

VM

VM

VM

Zoni client

PXE server

PXE

PXE

PXE

PXE

Administrator

or

Cluster Manager

VM

VM

VM

Virtual disk image is converted to PXE image

VM

VM

VM

slide41
System Servers

VM

VM

VM

VM

VM

Management Servers

VM

VM

PXE

VM

VM

VM

VM

VM

VM

PXE

DB

Zoni updates reservation database

VM

PXE

VM

VM

User connects to the machines and starts running experiments

VM

Zoni server

VM

VM

VM

Tashi Cluster

Manager

VM

VM

VM

VM

VM

VM

VM

VM

VM

VM

Zoni client

Zoni client queries server for allocation

PXE server

Administrator

or

Cluster Manager

VM

VM

VM

VM

VM

VM

after allocation
After allocation
  • A returned Zoni node is typically untrusted
    • update the system to default settings
      • Clean physical node by PXE booting a reset image
        • Restore all setting to defaults (address, IPMI passwords)
        • Repartition and format disks
  • (Option) Trust images from some users
    • No re-format needed
  • Clean network configuration (VLAN)
example minicluster
Example: Minicluster

./zoni –addimage amd64-rgass-testing:hardy:8.03

./zoni –assignimage amd64-rgass-testing –nodename r1r1u25

./zoni –allocatenode –nodename r1r1u25 –username rgass –reservationDuration 30 –vlanisolate 300 –notes “Practice allocation”

./zoni –addnodetovlan 300 –nodename r1r1u25

./zoni –hardware –rebootnode –nodename r1r1u25

example cloudconnect 1
Example: CloudConnect 1
  • Network isolate a rack of machines and PXE boot them with a user’s kernel and initrd
  • Create a VM that acts as a SSH gateway and a NAT for the private cluster
  • Dynamically configure switches to support the networking experiment
example cloudconnect 11
100Mb/s Switch

100Mb/s Switch

VLAN #1: Electrical

Rack C region

Rack A region

Rack B region

Rack D region

Rack D

Rack C

Rack A

Rack B

M

1 Gb/s Switch

M

4x1Gb trunk link

VLAN #2: Optical

- server

- switch

4Gb/s Switch

- manager

M

1Gb/s Switch

Example: CloudConnect 1
  • Network isolate a rack of machines and PXE boot them with a user’s kernel and initrd
  • Create a VM that acts as a SSH gateway and a NAT for the private cluster
  • Dynamically configure switches to support the networking experiment
example cloudconnect 2
Example: CloudConnect 2

for i in r1r1u12 r1r1u13 r1r1u14 r1r1u15;do

./zoni --admin --setnativevlan 300 -n ${i}

./zoni --admin --addnodetovlan 800 -n ${i}

./zoni --admin --addnodetovlan 801 -n ${i}

./zoni --admin --addnodetovlan 802 -n ${i}

done

./zoni --admin --switchport sw0-r1r1 --sendswitchcommand "config;interface range ethernet g(25-28); spanning-tree disable"

./zoni --admin --switchport sw0-r1r1 --sendswitchcommand "config;interface ethernet g25;switchport mode trunk;exit"

./zoni --admin --switchport sw0-r1r1 --sendswitchcommand "config;interface ethernet g26;switchport mode trunk;exit"

./zoni --admin --switchport sw0-r1r1 --sendswitchcommand "config;interface ethernet g27;switchport mode trunk;exit"

./zoni --admin --switchport sw0-r1r1 --sendswitchcommand "config;interface ethernet g28;switchport mode trunk;exit“

./zoni --admin --switchport sw0-r1r1:25 --setnativevlan 802 -v

./zoni --admin --switchport sw0-r1r1:26 --setnativevlan 804 -v

./zoni --admin --switchport sw0-r1r1:27 --setnativevlan 806 -v

./zoni --admin --switchport sw0-r1r1:28 --setnativevlan 808 -v

for i in $(seq 12 16);do

./zoni --hardware --rebootnode -n r1r1u${i}

done

future work
Future Work
    • Introduces a request queue and primitive scheduler
    • Enable provisioning of OS to local disk
    • Enables virtual disk conversion to physical
  • Integration with Tashi…
    • Would enable free exchange of resources between the Tashi pool and the free pool
necessary components
Necessary Components
  • DHCP Server
  • PXE Server
  • NFS Server
  • DNS Server (optional)
  • Configurable switches
    • New switch types may require new Zoni modules
  • Hardware access method
    • E.g. IPMI /iLO/DRAC
    • IP-addressable PDUs enable rescue if IPMI becomes compromised
zoni register
Zoni Register *
  • Gather unique identifier from system
    • Mac Address / Dell Tag
  • Assign hostname (r1r2u24)
  • Switch/PDU info

Example

  • J3GPGD r1r2u24 172.16.129.100 tashi_nm sw0-r1r2:9 pdu0-r1r2:18
zoni register1
Zoni Register *

PXE

Server

Image

store

Server Node

Web

Server

Server Boots for the first time, starts the PXE boot process

Defaults to register

Downloads register kernel and initrd from pxe server

  • Gather unique identifier from system
    • Mac Address / Dell Tag
  • Assign hostname (r1r2u24)
  • Get switch/pdu info
  • Example
  • J3GPGD r1r2u24 172.16.129.100 tashi_nm sw0-r1r2:9 pdu0-r1r2:18
zoni register2
Zoni Register *

PXE

Server

Image

store

Server Node

Web

Server

  • Register_node scrapes for system information and populates Zoni database
  • Number of procs/cores
  • Number of memory sticks/slots
  • Disk info
  • Nic info
  • Final Server Prep
    • Wipe disks
    • Configure IPMI (IP/admin accounts)
    • Register node with DNS/DHCP
    • Assign image
    • Reboot
  • Init script downloads files from web server
  • register_node.sh
  • register_automate
  • Install specific details
    • Register_automate
    • Interactive mode
notes on current software
Notes on Current Software
  • Zoni client code is Python 2.5
  • Zoni database implemented in MySQL
    • Reachable through python-MySQLdb interface
  • pExpect used for switch configuration
  • User information currently obtained through LDAP
slide56
Zoni
  • Zoni lays the foundation of the Open Cirrus software stack– easing management of multiple projects in a single cluster
  • Zoni enables partitioning clusters into isolated domains of physical resources
  • Current implementation allows rapid provisioning of system software
  • Zoni code base is open source software available through Tashi project in Apache Incubator
    • Contributions welcome
  • http://opencirrus.intel-research.net/sc09/sc09-zoni.pdf
slide58
Intel BigData Cluster

Mobile Rack

8 (1u) nodes

2 Xeon E5440

(quad-core)

[Harpertown/

Core 2]

16GB DRAM

2 1TB Disk

1 Gb/s

(x8 p2p)

1 Gb/s

(x4)

Switch

24 Gb/s

1 Gb/s

(x8)

1 Gb/s

(x4)

Switch

48 Gb/s

45 Mb/s T3

to Internet

*

Switch

48 Gb/s

1 Gb/s (x2x5 p2p)

1 Gb/s

(x4)

1 Gb/s

(x4)

1 Gb/s

(x4)

1 Gb/s

(x4)

1 Gb/s

(x4)

3U Rack

5 storage nodes

-------------

12 1TB Disks

Switch

48 Gb/s

Switch

48 Gb/s

Switch

48 Gb/s

Switch

48 Gb/s

Switch

48 Gb/s

1 Gb/s

(x4x4 p2p)

1 Gb/s

(x4x4 p2p)

1 Gb/s

(x15 p2p)

1 Gb/s

(x15 p2p)

1 Gb/s

(x15 p2p)

(r1r5)

PDU

w/per-port monitoring

and control

Blade Rack

40 nodes

Blade Rack

40 nodes

1U Rack

15 nodes

2U Rack

15 nodes

2U Rack

15 nodes

20 nodes: 1 Xeon (1-core) [Irwindale/Pent4], 6GB DRAM, 366GB disk (36+300GB)

10 nodes: 2 Xeon 5160 (2-core) [Woodcrest/Core], 4GB RAM, 2 75GB disks

10 nodes: 2 Xeon E5345 (4-core) [Clovertown/Core],8GB DRAM, 2 150GB Disk

2 Xeon E5345

(quad-core)

[Clovertown/

Core]

8GB DRAM

2 150GB Disk

2 Xeon E5420

(quad-core)

[Harpertown/

Core 2]

8GB DRAM

2 1TB Disk

2 Xeon E5440

(quad-core)

[Harpertown/

Core 2]

8GB DRAM

6 1TB Disk

2 Xeon E5520

(quad-core)

[Nehalem-EP/

Core i7]

16GB DRAM

6 1TB Disk

Key:

rXrY=row X rack Y

rXrYcZ=row X rack Y chassis Z

x2

x3

x2

(r2r1c1-4)

(r2r2c1-4)

(r1r1, r1r2)

(r1r3, r1r4, r2r3)

(r3r2, r3r3)

ad