Session e zoni
This presentation is the property of its rightful owner.
Sponsored Links
1 / 58

Session E: Zoni PowerPoint PPT Presentation


  • 58 Views
  • Uploaded on
  • Presentation posted in: General

Session E: Zoni. Zoni. Richard Gass Intel. Sessions: (A) Intro 8.30-9.00 (B) Hadoop 9.00-10.00 Break 10.00-10.30 Hadoop 10.30-12:00 Lunch 12.00-1.30 Pig 1.30-2.00 (D) Tashi 2.00-3.00 Break 3.00-3.30

Download Presentation

Session E: Zoni

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Session e zoni

Session E:Zoni


Session e zoni

Zoni

Richard Gass

Intel


Agenda

Sessions:

(A) Intro 8.30-9.00

(B) Hadoop 9.00-10.00

Break 10.00-10.30

Hadoop 10.30-12:00

Lunch 12.00-1.30

Pig 1.30-2.00

(D) Tashi 2.00-3.00

Break 3.00-3.30

Zoni 3.30-4.45

Wrap up 4.45-5.00

Overview

Plans/Status

User View

Administration

Installation

Summary

Agenda


Overview

Overview


Open cirrus stack

Open Cirrus Stack

Compute + network +

storage resources

Management and

control subsystem

Power + cooling

Physical Resource set (Zoni) service

Credit: John Wilkes (HP)


Open cirrus stack1

Open Cirrus Stack

Zoni clients, each with theirown “physical data center”

Eucalyptus

Tashi/HDFS

NFS storage

service

Experiment

Zoni service


Open cirrus stack2

Open Cirrus Stack

Virtual clusters

Virtual cluster

Virtual cluster

Eucalyptus

Tashi/HDFS

NFS storage

service

Experiment

Zoni service


Open cirrus stack3

Open Cirrus Stack

Application running

On Hadoop

On Tashi virtual cluster

On Zoni

On real hardware

Web Service

BigData App

Hadoop

Virtual cluster

Virtual cluster

Eucalyptus

Tashi/HDFS

NFS storage

service

Experiment

Zoni service


Open cirrus stack zoni

Zoni service

Open Cirrus stack - Zoni

  • Initial PRS implementation from HP

  • Re-write from Intel (in collaboration

  • with HP) soon to be contributed to Apache Software Foundation

  • Zoni service goals

    • Provide mini-datacenters to users

    • Isolate mini-datacenters from each other

  • Zoni service approach

    • Allocate sets of physical co-located nodes, isolated inside VLANs.

  • Allow running without virtualization overhead

    • Necessary for predictable QoS

      • e.g. cache interference


Goals

Goals

  • Reduce complexity in allocating physical resources

  • Gain User Confidence

    • Show users that we can efficiently allocate/deallocate resources

  • Stop the squatting

    • Incentives

      • HP’s tycoon (economic model)

      • Simple points scheme for good behavior or early return


Responsibilities of zoni

Isolate domains

Provision system software

Provide platform control

On/Off

Provide boot debug

 VLAN

 PXE

IPMI

 IPMI

Responsibilities of Zoni


Session e zoni

VLAN

  • Virtual LAN technology allows a single physical network to appear as several isolated networks

    • Ethernet packets are tagged with a VLAN id

    • Switches and NICs enforce the policies associated with each VLAN

  • By associating Zoni domains with different VLANs, they can be isolated from each other

  • The Zoni system provides the interfaces necessary to abstract switch configuration programming across multiple switch vendors


Session e zoni

Pre-

eXecution

Environment

PXE

  • Enables provisioning of OS image over the network

  • On machine boot, the NIC firmware contacts a PXE server via the DHCP process for the appropriate kernel and initrd to load

  • Once loaded, the init scripts in the initrd can pull the filesystem to the machine

  • In our environment, we download the desired filesystem to a ramdisk from a NFS server– enabling a very rapid provisioning (30 seconds or less) while leaving the host filesystem undisturbed


Session e zoni

Intelligent

Platform

Management

Interface

IPMI

  • Defines a standardized, abstracted, message-based interface to intelligent platform management hardware

  • Defines standardized records for describing platform management devices and their characteristics

  • Operates independently of the operating system

  • Enables cross-platform management


Status plans

Status/Plans


Some history

Some History

  • Previous prototype developed at HP Labs

  • Focus on economic model

  • Nice web interface which will be available upon reconvergence of code


Zoni roadmap

Zoni Roadmap

  • Stage 1

    • Manages all cluster hardware

    • Handles resource provisioning

    • Provides interfaces for VLAN definition/programming

    • Administrator is still in the allocation decision-making loop

  • Stage 2

    • Introduces a request queue and primitive scheduler

    • Admin may still be in loop, definitely for special cases

    • Enables provisioning of OS to local disk

    • Enables virtual disk conversion to physical

  • Stage 3

    • Incentives module added (Tycoon)

    • Tashi integration


  • User view

    User View


    Zoni roles

    Zoni Roles

    • Admin: root of all authority

      • Controls the physical resources

    • User: requests domains

      • Controls the domain, once allocated


    Domains

    Domains

    • A Domain is the unit of Zoni isolation

    • A simple domain is a set of compute nodes gathered into a single VLAN

    • Nodes are allocated from pools of available resources


    Zoni domains

    Zoni Domains *

    ISOLATION

    Domain 1 Services

    Server Pool 1

    Gateway

    Domain 0 Services

    DNS

    PXE

    DHCP

    HTTP

    Domain 1

    Domain 0

    DNS

    PXE

    DHCP

    HTTP

    Server Pool 0

    Server

    Pool 0


    The zoni interface

    The Zoni Interface

    • Users and Admins currently interact with the Zoni system through a command line interface

    • This interface both:

      • Queries and updates records in the Zoni database

      • Wraps the various commands that must be issued to effect changes in the cluster

    • Zoni is currently a centralized system; users log into the Zoni manager to issue commands

      • An RPC interface is planned for the near future


    Zoni usage

    Zoni Usage

    Usage: zoni <options>

    Standard options:

    --help [show this help message and exit]

    --version [show program's version number and exit]

    --verbose[be verbose]

    Common options:

    --nodeName <name>[Specify node]

    --switchPort <port>[Specify switchport switchname:portnum]


    Image management interface

    Image Management Interface

    --addImage <img>[Add image to Zoni]

    --delImage <img>[Delete image]


    User allocation interface

    User Allocation Interface

    --createDomain <name>

    • May fail if name already exists

      --submitDomainRequest <name>

      --destroyDomain –domain <name>

      --requestNodes --domain <name> [--count <N>] [--nodeName <name>] [--cores <n> …]

    • Add the requested nodes to the domain

      --assignImage <kernel> <image>

    • Assign image to resource

      --associateNewVlan –domain <name>

    • Allocate an unused VLAN number to domain

      --createReservation <YYYYMMDD> <YYYYMMDD>

    • Specify duration of node reservation where start time may be “ASAP”

      --reservationNotes “notes”

      --updateReservation


    Admin allocation interface

    Admin Allocation Interface

    --allocateNode[Assign node to a user]

    --releaseNode[Release node allocation]

    --vlanIsolate <vlanid>[Specify vlan for isolation]


    Hardware control

    Hardware Control

    --hardware [Make hardware call]

    --powerStatus[Get power status]

    --rebootNode[Reboot node (Soft)]

    --powerCycle[Power Cycle (Hard)]

    --powerOff[Power off node]

    --powerOn[Power on node]


    Query interface

    Query Interface

    --showReservations[Show current node reservations]

    --showResources[Show available resources to choose from]

    --procs <N>[Filter by number of processors]

    --clock <N>[Filter by processor clock]

    --memory <N>[Filter by amount of memory (Bytes)]

    --cpuflags “flags”[Filter by CPU flags]

    --cores <N> [Filter by number of cores]

    --showPxeImages[Show available PXE images to choose from]

    --showPxeImageMap[Show PXE images host mapping]


    Administration interface

    Administration Interface

    --admin Enter Admin mode

    --addPxeImage[Add PXE image to database]

    --enableHostPort[Enable a switch port]

    --disableHostPort[Disable a switch port]

    --removeVlan <vlanId>[Remove vlan from all switches]

    --createVlan <vlanId> [Create a vlan on all switches]

    --addNodeToVlan <vlanId> [Add node to a vlan]

    --removeNodeFromVlan <vlanId> [Remove node from a vlan]

    --setNativeVlan <vlanId> [Configure native vlan]

    --restoreNativeVlan[Restore native vlan]

    --removeAllVlans[Removes all vlans from a switchport]

    --sendSwitchCommand “<command>”[Send Raw Switch Command, BE CAREFUL]

    --interactiveSwitchConfig “<switchname>” [Interactively configure a switch]

    --showSwitchConfig<nodename>[Show switch config for node]


    Administration

    Administration


    Typical workflow

    Typical Workflow

    • Admin queries available systems

    • Admin requests systems with desired user configuration

      • i.e., cores, memory, image, duration, etc

  • Request goes in queue

  • Zoni locates resources and provides a list to admin/Tashi.

  • Admin/Tashi moves VMs to free resources

    • Add node to blacklist and tell hadoop to reload

  • Zoni allocates resources

    • Provides estimated time to get resources

    • User can query

    • Zoni sends notification when allocated

  • Zoni reclaims resources and adds them back into respective pools

    • User may extend time period before expiration


  • Session e zoni

    System Servers

    Zoni client

    queries Zoni server

    for available

    resources

    User chooses

    machine attributes

    and submits a request

    for the resources

    for some

    time period

    Zoni queries DB

    to locate available

    resources

    VM

    VM

    VM

    VM

    VM

    Management Servers

    Results are sent back to the client

    VM

    VM

    VM

    VM

    VM

    VM

    VM

    VM

    DB

    VM

    VM

    VM

    VM

    Zoni server

    VM

    Node 1 : 8 Core, 16G memory, 6TB disk,30day

    Node 2 : 8 Core, 16G memory, 6TB disk,30 day

    Node 3 : 8 Core, 16G memory, 6TB disk,90 day

    Node 4 : 8 Core, 16G memory, 6TB disk,1 day

    Node 5 : 8 Core, 8G memory, 2TB disk, 90 day

    Node 6 : 8 Core, 8G memory, 2TB disk,90 day

    Node 7 : 8 Core, 8G memory, 2TB disk,90 day

    Node 8 : 8 Core, 8G memory, 2TB disk,90 day

    Node 9 : 8 Core, 8G memory, 2TB disk,90 day

    Node 10: 8 Core, 8G memory, 2TB disk,30 day

    Tashi Cluster

    Manager

    VM

    VM

    VM

    VM

    VM

    VM

    Zoni client

    PXE server

    Administrator

    or

    Cluster Manager

    VM

    VM

    VM

    VM

    VM


    Session e zoni

    Request Queue

    System Servers

    VM

    VM

    VM

    VM

    VM

    Management Servers

    VM

    VM

    VM

    VM

    VM

    VM

    VM

    VM

    DB

    VM

    VM

    VM

    R1

    VM

    Zoni server

    VM

    Tashi Cluster

    Manager

    VM

    VM

    VM

    VM

    VM

    VM

    Zoni client

    PXE server

    Administrator

    or

    Cluster Manager

    VM

    VM

    VM

    VM

    VM


    Session e zoni

    System Servers

    VM

    VM

    VM

    VM

    VM

    VM

    VM

    VM

    VM

    VM

    Management Servers

    VM

    VM

    VM

    Zoni processes request and identifies physical machines that satify the user request

    VM

    VM

    VM

    VM

    VM

    VM

    DB

    VM

    VM

    VM

    VM

    Zoni server

    VM

    Tashi Cluster

    Manager

    VM

    VM

    VM

    VM

    VM

    VM

    Zoni client

    PXE server

    Administrator

    or

    Cluster Manager

    VM

    VM

    VM

    VM

    VM


    Session e zoni

    System Servers

    VM

    VM

    VM

    VM

    VM

    Management Servers

    VM

    VM

    VM

    VM

    VM

    VM

    VM

    VM

    VM

    VM

    VM

    Zoni sends request

    to Tashi to free selected nodes

    VM

    VM

    DB

    VM

    VM

    VM

    VM

    VM

    Zoni server

    Tashi moves virtual machines off of selected nodes

    VM

    Tashi Cluster

    Manager

    VM

    VM

    VM

    VM

    VM

    VM

    Zoni client

    PXE server

    Administrator

    or

    Cluster Manager

    VM

    VM

    VM

    VM

    VM


    Session e zoni

    System Servers

    VM

    VM

    VM

    VM

    VM

    Management Servers

    VM

    VM

    VM

    VM

    VM

    VM

    VM

    Physical machines boot up with PXE image

    VM

    Zoni allocated the physical machines to the requested user and isolates them from the network using VLANs

    Zoni reboots the physical machine and sets PXE image to users VM

    DB

    VM

    VM

    VM

    VM

    Zoni server

    Tashi notifies Zoni that migration of virutal machines has completed

    VM

    VM

    VM

    Tashi Cluster

    Manager

    VM

    VM

    VM

    VM

    VM

    VM

    VM

    VM

    VM

    VM

    Zoni client

    PXE server

    PXE

    PXE

    PXE

    PXE

    Administrator

    or

    Cluster Manager

    VM

    VM

    VM

    Virtual disk image is converted to PXE image

    VM

    VM

    VM


    Session e zoni

    System Servers

    VM

    VM

    VM

    VM

    VM

    Management Servers

    VM

    VM

    PXE

    VM

    VM

    VM

    VM

    VM

    VM

    PXE

    DB

    Zoni updates reservation database

    VM

    PXE

    VM

    VM

    User connects to the machines and starts running experiments

    VM

    Zoni server

    VM

    VM

    VM

    Tashi Cluster

    Manager

    VM

    VM

    VM

    VM

    VM

    VM

    VM

    VM

    VM

    VM

    Zoni client

    Zoni client queries server for allocation

    PXE server

    Administrator

    or

    Cluster Manager

    VM

    VM

    VM

    VM

    VM

    VM


    After allocation

    After allocation

    • A returned Zoni node is typically untrusted

      • update the system to default settings

        • Clean physical node by PXE booting a reset image

          • Restore all setting to defaults (address, IPMI passwords)

          • Repartition and format disks

    • (Option) Trust images from some users

      • No re-format needed

    • Clean network configuration (VLAN)


    Example minicluster

    Example: Minicluster

    ./zoni –addimage amd64-rgass-testing:hardy:8.03

    ./zoni –assignimage amd64-rgass-testing –nodename r1r1u25

    ./zoni –allocatenode –nodename r1r1u25 –username rgass –reservationDuration 30 –vlanisolate 300 –notes “Practice allocation”

    ./zoni –addnodetovlan 300 –nodename r1r1u25

    ./zoni –hardware –rebootnode –nodename r1r1u25


    Example cloudconnect 1

    Example: CloudConnect 1

    • Network isolate a rack of machines and PXE boot them with a user’s kernel and initrd

    • Create a VM that acts as a SSH gateway and a NAT for the private cluster

    • Dynamically configure switches to support the networking experiment


    Example cloudconnect 11

    100Mb/s Switch

    100Mb/s Switch

    VLAN #1: Electrical

    Rack C region

    Rack A region

    Rack B region

    Rack D region

    Rack D

    Rack C

    Rack A

    Rack B

    M

    1 Gb/s Switch

    M

    4x1Gb trunk link

    VLAN #2: Optical

    - server

    - switch

    4Gb/s Switch

    - manager

    M

    1Gb/s Switch

    Example: CloudConnect 1

    • Network isolate a rack of machines and PXE boot them with a user’s kernel and initrd

    • Create a VM that acts as a SSH gateway and a NAT for the private cluster

    • Dynamically configure switches to support the networking experiment


    Example cloudconnect 2

    Example: CloudConnect 2

    for i in r1r1u12 r1r1u13 r1r1u14 r1r1u15;do

    ./zoni --admin --setnativevlan 300 -n ${i}

    ./zoni --admin --addnodetovlan 800 -n ${i}

    ./zoni --admin --addnodetovlan 801 -n ${i}

    ./zoni --admin --addnodetovlan 802 -n ${i}

    done

    ./zoni --admin --switchport sw0-r1r1 --sendswitchcommand "config;interface range ethernet g(25-28); spanning-tree disable"

    ./zoni --admin --switchport sw0-r1r1 --sendswitchcommand "config;interface ethernet g25;switchport mode trunk;exit"

    ./zoni --admin --switchport sw0-r1r1 --sendswitchcommand "config;interface ethernet g26;switchport mode trunk;exit"

    ./zoni --admin --switchport sw0-r1r1 --sendswitchcommand "config;interface ethernet g27;switchport mode trunk;exit"

    ./zoni --admin --switchport sw0-r1r1 --sendswitchcommand "config;interface ethernet g28;switchport mode trunk;exit“

    ./zoni --admin --switchport sw0-r1r1:25 --setnativevlan 802 -v

    ./zoni --admin --switchport sw0-r1r1:26 --setnativevlan 804 -v

    ./zoni --admin --switchport sw0-r1r1:27 --setnativevlan 806 -v

    ./zoni --admin --switchport sw0-r1r1:28 --setnativevlan 808 -v

    for i in $(seq 12 16);do

    ./zoni --hardware --rebootnode -n r1r1u${i}

    done


    Future work

    Future Work

    • Introduces a request queue and primitive scheduler

    • Enable provisioning of OS to local disk

    • Enables virtual disk conversion to physical

  • Integration with Tashi…

    • Would enable free exchange of resources between the Tashi pool and the free pool


  • Installation

    Installation


    Necessary components

    Necessary Components

    • DHCP Server

    • PXE Server

    • NFS Server

    • DNS Server (optional)

    • Configurable switches

      • New switch types may require new Zoni modules

    • Hardware access method

      • E.g. IPMI /iLO/DRAC

      • IP-addressable PDUs enable rescue if IPMI becomes compromised


    Zoni register

    Zoni Register *

    • Gather unique identifier from system

      • Mac Address / Dell Tag

    • Assign hostname (r1r2u24)

    • Switch/PDU info

      Example

    • J3GPGD r1r2u24 172.16.129.100 tashi_nm sw0-r1r2:9 pdu0-r1r2:18


    Zoni register1

    Zoni Register *

    PXE

    Server

    Image

    store

    Server Node

    Web

    Server

    Server Boots for the first time, starts the PXE boot process

    Defaults to register

    Downloads register kernel and initrd from pxe server

    • Gather unique identifier from system

      • Mac Address / Dell Tag

    • Assign hostname (r1r2u24)

    • Get switch/pdu info

    • Example

    • J3GPGD r1r2u24 172.16.129.100 tashi_nm sw0-r1r2:9 pdu0-r1r2:18


    Zoni register2

    Zoni Register *

    PXE

    Server

    Image

    store

    Server Node

    Web

    Server

    • Register_node scrapes for system information and populates Zoni database

    • Number of procs/cores

    • Number of memory sticks/slots

    • Disk info

    • Nic info

    • Final Server Prep

      • Wipe disks

      • Configure IPMI (IP/admin accounts)

      • Register node with DNS/DHCP

      • Assign image

      • Reboot

    • Init script downloads files from web server

    • register_node.sh

    • register_automate

    • Install specific details

      • Register_automate

      • Interactive mode


    Internals

    Internals


    Notes on current software

    Notes on Current Software

    • Zoni client code is Python 2.5

    • Zoni database implemented in MySQL

      • Reachable through python-MySQLdb interface

    • pExpect used for switch configuration

    • User information currently obtained through LDAP


    Summary

    Summary


    Session e zoni

    Zoni

    • Zoni lays the foundation of the Open Cirrus software stack– easing management of multiple projects in a single cluster

    • Zoni enables partitioning clusters into isolated domains of physical resources

    • Current implementation allows rapid provisioning of system software

    • Zoni code base is open source software available through Tashi project in Apache Incubator

      • Contributions welcome

    • http://opencirrus.intel-research.net/sc09/sc09-zoni.pdf


    Backup

    Backup


    Session e zoni

    Intel BigData Cluster

    Mobile Rack

    8 (1u) nodes

    2 Xeon E5440

    (quad-core)

    [Harpertown/

    Core 2]

    16GB DRAM

    2 1TB Disk

    1 Gb/s

    (x8 p2p)

    1 Gb/s

    (x4)

    Switch

    24 Gb/s

    1 Gb/s

    (x8)

    1 Gb/s

    (x4)

    Switch

    48 Gb/s

    45 Mb/s T3

    to Internet

    *

    Switch

    48 Gb/s

    1 Gb/s (x2x5 p2p)

    1 Gb/s

    (x4)

    1 Gb/s

    (x4)

    1 Gb/s

    (x4)

    1 Gb/s

    (x4)

    1 Gb/s

    (x4)

    3U Rack

    5 storage nodes

    -------------

    12 1TB Disks

    Switch

    48 Gb/s

    Switch

    48 Gb/s

    Switch

    48 Gb/s

    Switch

    48 Gb/s

    Switch

    48 Gb/s

    1 Gb/s

    (x4x4 p2p)

    1 Gb/s

    (x4x4 p2p)

    1 Gb/s

    (x15 p2p)

    1 Gb/s

    (x15 p2p)

    1 Gb/s

    (x15 p2p)

    (r1r5)

    PDU

    w/per-port monitoring

    and control

    Blade Rack

    40 nodes

    Blade Rack

    40 nodes

    1U Rack

    15 nodes

    2U Rack

    15 nodes

    2U Rack

    15 nodes

    20 nodes: 1 Xeon (1-core) [Irwindale/Pent4], 6GB DRAM, 366GB disk (36+300GB)

    10 nodes: 2 Xeon 5160 (2-core) [Woodcrest/Core], 4GB RAM, 2 75GB disks

    10 nodes: 2 Xeon E5345 (4-core) [Clovertown/Core],8GB DRAM, 2 150GB Disk

    2 Xeon E5345

    (quad-core)

    [Clovertown/

    Core]

    8GB DRAM

    2 150GB Disk

    2 Xeon E5420

    (quad-core)

    [Harpertown/

    Core 2]

    8GB DRAM

    2 1TB Disk

    2 Xeon E5440

    (quad-core)

    [Harpertown/

    Core 2]

    8GB DRAM

    6 1TB Disk

    2 Xeon E5520

    (quad-core)

    [Nehalem-EP/

    Core i7]

    16GB DRAM

    6 1TB Disk

    Key:

    rXrY=row X rack Y

    rXrYcZ=row X rack Y chassis Z

    x2

    x3

    x2

    (r2r1c1-4)

    (r2r2c1-4)

    (r1r1, r1r2)

    (r1r3, r1r4, r2r3)

    (r3r2, r3r3)


  • Login