leveraging standard core technologies to programmatically build linux cluster appliances l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Leveraging Standard Core Technologies to Programmatically Build Linux Cluster Appliances PowerPoint Presentation
Download Presentation
Leveraging Standard Core Technologies to Programmatically Build Linux Cluster Appliances

Loading in 2 Seconds...

play fullscreen
1 / 34

Leveraging Standard Core Technologies to Programmatically Build Linux Cluster Appliances - PowerPoint PPT Presentation


  • 281 Views
  • Uploaded on

Leveraging Standard Core Technologies to Programmatically Build Linux Cluster Appliances. Mason Katz San Diego Supercomputer Center IEEE Cluster 2002. Outline . Problem definition What is so hard about clusters? Distinction between Software Packages (bits)

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Leveraging Standard Core Technologies to Programmatically Build Linux Cluster Appliances' - erika


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
leveraging standard core technologies to programmatically build linux cluster appliances

Leveraging Standard Core Technologies to Programmatically Build Linux Cluster Appliances

Mason Katz

San Diego Supercomputer Center

IEEE Cluster 2002

outline
Outline
  • Problem definition
    • What is so hard about clusters?
  • Distinction between
    • Software Packages (bits)
    • System Configuration (functionality and state)
  • Programmatic software installation with:
    • XML, SQL, HTTP, Kickstart
  • Future Work

San Diego Supercomputer Center

build this cluster
Build this cluster
  • Build a 128 node cluster
    • Known configuration
    • Consistent configuration
    • Repeatable configuration
    • Do this in an afternoon
  • Problems
    • How to install software?
    • How to configure software?
  • We manage clusters with (re)installation
    • So we care a lot about this problem
    • Other strategies still must solve this

San Diego Supercomputer Center

the myth of the homogeneous cots cluster
The Myth of the Homogeneous COTS Cluster
  • Hardware is not homogeneous
    • Different chipset revisions
    • Chipset of the day (e.g. Linksys Ethernet cards)
    • Different disk sizes (e.g. changing sector sizes)
    • Vendors do not know this is happening!
  • Entropy happens
    • Hardware components fail
    • Cannot replace with the same components past a single Moore cycle
  • A Cluster is not just compute nodes (appliances)
    • Fileserver Nodes
    • Management Nodes
    • Login Nodes

San Diego Supercomputer Center

what heterogeneity means
What Heterogeneity Means
  • Hardware
    • Cannot blindly replicate machine software
      • AKA system imaging / disk cloning
      • Requires patching the system after cloning
    • Need to manage system software at a higher level
  • Software
    • Subsets of a cluster have unique software configuration
    • One “golden image” cannot build a cluster
    • Multiple images replicate common configuration
    • Need to manage system software at a higher level

San Diego Supercomputer Center

packages vs configuration
Packages vs. Configuration

Collection of all possible

software packages

(AKA Distribution)

Descriptive information to

configure a node

Kickstart

file

RPMs

Appliances

Compute Node

IO Server

Web Server

San Diego Supercomputer Center

software packages
Software Packages

Collection of all possible

software packages

(AKA Distribution)

Descriptive information to

configure a node

Kickstart

file

RPMs

Appliances

Compute Node

IO Server

Web Server

San Diego Supercomputer Center

system configuration
System Configuration

Collection of all possible

software packages

(AKA Distribution)

Descriptive information to

configure a node

Kickstart

file

RPMs

Appliances

Compute Node

IO Server

Web Server

San Diego Supercomputer Center

what is a kickstart file
Setup & Packages (20%)

cdrom

zerombr yes

bootloader --location mbr --useLilo

skipx

auth --useshadow --enablemd5

clearpart --all

part /boot --size 128

part swap --size 128

part / --size 4096

part /export --size 1 --grow

lang en_US

langsupport --default en_US

keyboard us

mouse genericps/2

timezone --utc GMT

rootpw --iscrypted nrDq4Vb42jjQ.

text

install

reboot

%packages

@Base

@Emacs

@GNOME

Post Configuration (80%)

%post

cat > /etc/nsswitch.conf << 'EOF'

passwd: files

shadow: files

group: files

hosts: files dns

bootparams: files

ethers: files

EOF

cat > /etc/ntp.conf << 'EOF'

server ntp.ucsd.edu

server 127.127.1.1

fudge 127.127.1.1 stratum 10

authenticate no

driftfile /etc/ntp/drift

EOF

/bin/mkdir -p /etc/ntp

cat > /etc/ntp/step-tickers << 'EOF'

ntp.ucsd.edu

EOF

/usr/sbin/ntpdate ntp.ucsd.edu

/sbin/hwclock --systohc

What is a Kickstart File?

San Diego Supercomputer Center

issues
Issues
  • High level description of software installation
    • List of packages (RPMs)
    • System configuration (network, disk, accounts, …)
    • Post installation scripts
  • De facto standard for Linux
  • Single ASCII file
    • Simple, clean, and portable
    • Installer can handle simple hardware differences
  • Monolithic
    • No macro language (as of RedHat 7.3 this is changing)
    • Differences require forking (and code replication)
    • Cut-and-Paste is not a code re-use model

San Diego Supercomputer Center

it looks something like this
It looks something like this

San Diego Supercomputer Center

implementation
Implementation
  • Nodes
    • Single purpose modules
    • Kickstart file snippets (XML tags map to kickstart commands)
    • Over 100 node files in Rocks
  • Graph
    • Defines interconnections for nodes
    • Think OOP or dependencies (class, #include)
    • A single default graph file in Rocks
  • Macros
    • SQL Database holds site and node specific state
    • Node files may contain <var name=“state”/> tags

San Diego Supercomputer Center

composition
Composition
  • Aggregate Functionality
  • Scripting
    • IsA perl-development
    • IsA python-development
    • IsA tcl-development

San Diego Supercomputer Center

functional differences
Functional Differences
  • Specify only the deltas
  • Desktop IsA
    • Standalone
  • Laptop IsA
    • Standalone
    • Pcmcia

San Diego Supercomputer Center

architecture differences
Architecture Differences
  • Conditional inheritance
  • Annotate edges with target architectures
  • if i386
    • Base IsA lilo
  • if ia64
    • Base IsA elilo

San Diego Supercomputer Center

putting it all together
Putting it all together

- “Complete” Appliances (compute, NFS, frontend, desktop, …)

- Some key shared configuration nodes (slave-node, node, base)

San Diego Supercomputer Center

sample node file
Sample Node File

<?xml version="1.0" standalone="no"?>

<!DOCTYPE kickstart SYSTEM "@KICKSTART_DTD@" [<!ENTITY ssh "openssh">]>

<kickstart>

<description>

Enable SSH

</description>

<package>&ssh;</package>

<package>&ssh;-clients</package>

<package>&ssh;-server</package>

<package>&ssh;-askpass</package>

<post>

cat &gt; /etc/ssh/ssh_config &lt;&lt; 'EOF’ <!-- default client setup -->

Host *

ForwardX11 yes

ForwardAgent yes

EOF

chmod o+rx /root

mkdir /root/.ssh

chmod o+rx /root/.ssh

</post>

</kickstart>>

San Diego Supercomputer Center

sample graph file
Sample Graph File

<?xml version="1.0" standalone="no"?>

<!DOCTYPE kickstart SYSTEM "@GRAPH_DTD@">

<graph>

<description>

Default Graph for NPACI Rocks.

</description>

<edge from="base" to="scripting"/>

<edge from="base" to="ssh"/>

<edge from="base" to="ssl"/>

<edge from="base" to="lilo" arch="i386"/>

<edge from="base" to="elilo" arch="ia64"/>

<edge from="node" to="base" weight="80"/>

<edge from="node" to="accounting"/>

<edge from="slave-node" to="node"/>

<edge from="slave-node" to="nis-client"/>

<edge from="slave-node" to="autofs-client"/>

<edge from="slave-node" to="dhcp-client"/>

<edge from="slave-node" to="snmp-server"/>

<edge from="slave-node" to="node-certs"/>

<edge from="compute" to="slave-node"/>

<edge from="compute" to="usher-server"/>

<edge from="master-node" to="node"/>

<edge from="master-node" to="x11"/>

<edge from="master-node" to="usher-client"/>

</graph>

San Diego Supercomputer Center

nodes and groups
Nodes and Groups

Nodes Table

Memberships Table

San Diego Supercomputer Center

groups and appliances
Groups and Appliances

Memberships Table

Appliances Table

San Diego Supercomputer Center

simple key value pairs
Simple key - value pairs
  • Used to configure DHCP and to customize appliance kickstart files

San Diego Supercomputer Center

space time and http
Space-Time and HTTP

Node Appliances

Frontends/Servers

DHCP

IP + Kickstart URL

Kickstart RQST

Generate File

kpp

SQL DB

Request Package

Serve Packages

kgen

Install Package

  • HTTP:
  • Kickstart URL (Generator) can be anywhere
  • Package Server can be (a different) anywhere

Post Config

Reboot

San Diego Supercomputer Center

256 node scaling
256 Node Scaling
  • Attempt a TOP 500 Run on a two fused 128 node PIII (1GHz, 1GB mem) clusters
    • 100 Mbit ethernet, Gigabit to frontend.
    • Myrinet 2000. 128 port switch on each cluster
  • Questions
    • What LINPACK performance could we get?
    • Would Rocks scale to 256 nodes?
    • Could we set up/teardown and run benchmarks in the allotted 48 hours?
  • SDSC’s Teragrid Itanium2 system is about this size

San Diego Supercomputer Center

setup
Setup

New Frontend

  • Fri Night: Built new frontend. Physical rewiring of Myrinet, added Ethernet switch.
  • Sat: Initial LINPACK runs, and debugging hardware failures, 240 node Myri run.
  • Sun: Submitted 256 Ethernet run, re-partitioned clusters, complete re-installation (40 min)

8 Cross Connects (Myrinet)

128 nodes (120 on Myrinet)

128 nodes (120 on Myrinet)

San Diego Supercomputer Center

some results
Some Results

240 Dual PIII (1Ghz, 1GB) - Myrinet

  • 285 GFlops
  • 59.5% Peak
  • Over 22 hours of continuous computing

San Diego Supercomputer Center

installation reboot performance
Installation, Reboot, Performance
  • < 15 minutes to reinstall 32 node subcluster (rebuilt myri driver)
  • 2.3min for 128 node reboot

32 Node Re-Install

Start

Finsish

Reboot

Start HPL

San Diego Supercomputer Center

future work
Future Work
  • Other backend targets
    • Solaris Jumpstart
    • Windows Installation
  • Supporting on-the-fly system patching
    • Cfengine approach
    • But using the XML graph for programmability
  • Traversal order
    • Subtleties with order of evaluation for XML nodes
    • Ordering requirements != Code reuse requirements
  • Dynamic cluster re-configuration
    • Node re-targets appliance type according to system need
    • Autonomous clusters?

San Diego Supercomputer Center

summary
Summary
  • Installation/Customization is done in a straightforward programmatic way
    • Leverages existing standard technologies
    • Scaling is excellent
  • HTTP is used as a transport for reliability/performance
    • Configuration Server does not have to be in the cluster
    • Package Server does not have to be in the cluster
    • (Sounds grid-like)

San Diego Supercomputer Center

www rocksclusters org
www.rocksclusters.org

San Diego Supercomputer Center