Futuregrid design and implementation of a national grid test bed
1 / 26

FutureGrid Design and Implementation of a National Grid Test-Bed - PowerPoint PPT Presentation

  • Uploaded on

FutureGrid Design and Implementation of a National Grid Test-Bed. David Hancock – [email protected] HPC Manager - Indiana University Hardware & Network Lead - FutureGrid. IU in a nutshell. $1.7B Annual Budget, >$100M annual IT budget Recent credit upgrade to AAA One university with

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' FutureGrid Design and Implementation of a National Grid Test-Bed' - nolen

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Futuregrid design and implementation of a national grid test bed

FutureGridDesign and Implementation of a National Grid Test-Bed

David Hancock – [email protected]

HPC Manager - Indiana University

Hardware & Network Lead - FutureGrid

Iu in a nutshell
IU in a nutshell

  • $1.7B Annual Budget, >$100M annual IT budget

  • Recent credit upgrade to AAA

  • One university with

    • 8 campuses

    • 107,000 students

    • 3,900 faculty

  • Nation’s 2nd largest school of medicine

  • Serious HPC since 1990’s

  • Research staff increased from 30-120 since 1995

  • 50/50 Split in base and grant funding

  • Large scale projects: TeraGrid, Open Science Grid (ATLAS Tier2 center), PolarGrid, Data Capacitor

  • New Data Center opened in 2009

Research Technologies

Nsf track overview
NSF Track Overview

  • Track 1 – NCSA Blue Waters

  • Track 2a – TACC Ranger

  • Track2b – NICS Kraken

  • Track 2d

    • Data Intensive High Performance System (SDSC)

    • Experimental High Performance System (GaTech)

    • Experimental High Performance Test-Bed (IU)


  • The goal of FutureGrid is to support the research on the future of distributed, grid, and cloud computing.

  • FutureGrid will build a robustly managed simulation environment and test-bed to support the development and early use in science of new technologies at all levels of the software stack: from networking to middleware to scientific applications.

  • The environment will mimic TeraGrid and/or general parallel and distributed systems – FutureGrid is part of TeraGrid and one of two experimental TeraGrid systems (other is GPU)

  • This test-bed will succeed if it enables major advances in science and engineering through collaborative development of science applications and related software.

  • FutureGrid is a (small 5400 core) Science/Computer Science Cloud but it is more accurately a virtual machine based simulation environment

Futuregrid partners
FutureGrid Partners

  • Indiana University (Architecture, core software, Support)

  • Purdue University (HTC Hardware)

  • San Diego Supercomputer Center at University of California San Diego (INCA, Performance Monitoring)

  • University of Chicago/Argonne National Labs (Nimbus)

  • University of Florida (ViNe, Education and Outreach)

  • University of Southern California Information Sciences Institute (Pegasus to manage experiments)

  • University of Tennessee Knoxville (Benchmarking)

  • University of Texas at Austin/Texas Advanced Computing Center (Portal)

  • University of Virginia (OGF, User Advisory Board)

  • Center for Information Services and GWT-TUD from Technische Universtität Dresden. (VAMPIR)

  • Blue institutions host FutureGrid hardware

Other important collaborators
Other Important Collaborators

  • Early users from an application and computer science perspective and from both research and education

  • Grid5000 and D-Grid in Europe

  • Commercial partners such as

    • Eucalyptus ….

    • Microsoft (Dryad + Azure)

    • Application partners

  • NSF

  • TeraGrid – Tutorial at TG10

  • Open Grid Forum – Possible BoF

  • Possibly Open Nebula, Open Cirrus Testbed, Open Cloud Consortium, Cloud Computing Interoperability Forum. IBM-Google-NSF Cloud, and other DoE/NSF/… clouds

Futuregrid timeline
FutureGrid Timeline

  • October 2009 – Project Starts

  • November 2009 – SC09 Demo

  • January 2010 – Significant Hardware installed

  • April 2010 – First Early Users

  • May 2010 – FutureGrid network complete

  • August 2010 – FutureGrid Annual Meeting

  • September 2010 – All hardware, except shared memory system, available

  • October 2011 – FutureGrid allocatable via TeraGrid process – first two years by user/science board

Futuregrid usage scenarios
FutureGrid Usage Scenarios

  • Developers of end-user applications who want to create new applications in cloud or grid environments, including analogs of commercial cloud environments such as Amazon or Google.

    • Is a Science Cloud for me? Is my application secure?

  • Developers of end-user applications who want to experiment with multiple hardware environments.

  • Grid/Cloud middleware developers who want to evaluate new versions of middleware or new systems.

  • Networking researchers who want to test and compare different networking solutions in support of grid and cloud applications and middleware.

  • Education as well as research

  • Interest in performance testing requires that bare metal images areimportant

Storage hardware
Storage Hardware

  • FutureGrid has a dedicated network (except to TACC) and a network fault and delay generator

  • Experiments can be isolated by request

  • Additional partner machines may run FutureGrid software and be supported (but allocated in specialized ways)

System milestones
System Milestones

  • New Cray System (xray)

    • Delivery: January 2010

    • Acceptance: February 2010

    • Available for Use: April 2010

  • New IBM Systems (india)

    • Delivery: January 2010

    • Acceptance: March 2010

    • Available for Use: May 2010

  • Dell System (tango)

    • Delivery: April 2010

    • Acceptance: June 2010

    • Available for Use: July 2010

  • Existing IU iDataPlex (sierra)

    • Move to SDSC: January 2010

    • Available for Use: April 2010

  • Storage Systems (Sun & DDN)

    • Delivery: December 2009

    • Acceptance: January 2010

Network impairments device
Network Impairments Device

  • Spirent XGEM Network Impairments Simulator for jitter, errors, delay, etc

  • Full Bidirectional 10G w/64 byte packets

  • up to 15 seconds introduced delay (in 16ns increments)

  • 0-100% introduced packet loss in .0001% increments

  • Packet manipulation in first 2000 bytes

  • up to 16k frame size

  • TCL for scripting, HTML for manual configuration

Network milestones
Network Milestones

  • December 2009

    • Setup and configuration of core equipment at IU

    • Juniper EX 8208

    • Spirent XGEM

  • January 2010

    • Core equipment relocated to Chicago

    • IP addressing & AS #

  • February 2010

    • Coordination with local networks

    • First Circuits to Chicago Active

  • March 2010

    • Peering with TeraGrid & Internet2

  • April 2010

    • NLR Circuit to UFL (via FLR) Active

  • May 2010

    • NLR Circuit to SDSC (via CENIC) Active

Global noc background
Global NOC Background

  • ~65 total staff

  • Service Desk: proactive & reactive monitoring 24x7x365, coordination of support

  • Engineering: All operational troubleshooting

  • Planning/Senior Engineering: Senior Network engineers dedicated to single projects

  • Tool Developers: Developers of GlobalNOC tool suite

Supported projects
Supported Projects



Futuregrid architecture
FutureGrid Architecture

  • Open Architecture allows to configure resources based on images

  • Managed images allows to create similar experiment environments

  • Experiment management allows reproducible activities

  • Through our modular design we allow different clouds and images to be “rained” upon hardware.

  • Will support deployment of preconfigured middleware including TeraGrid stack, Condor, BOINC, gLite, Unicore, Genesis II

Software goals
Software Goals

  • Open-source, integrated suite of software to

    • instantiate and execute grid and cloud experiments.

    • perform an experiment

    • collect the results

    • tools for instantiating a test environment

      • TORQUE, Moab, xCAT, bcfg, and Pegasus, Inca, ViNE, a number of other tools from our partners and the open source community

      • Portal to interact

    • Benchmarking


Draft gui for futuregrid dynamic provisioning
Draft GUI for FutureGrid Dynamic Provisioning

Command line
Command line

  • fg-deploy-image

    • host name

    • image name

    • start time

    • end time

    • label name

  • fg-add

    • label name

    • framework hadoop

    • version 1.0

  • Deploys an image on a host

  • Adds a feature to a deployed image


Fg stratosphere
FG Stratosphere

  • Objective

    • Higher than a particular cloud

    • Provides all mechanisms to provision a cloud on a given FG hardware

    • Allows the management of reproducible experiments

    • Allows monitoring of the environment and the results

  • Risks

    • Lots of software

    • Possible multiple path to do the same thing

  • Good news

    • We worked in a team, know about different solutions and have identified a very good plan

    • We can componentize Stratosphere


Dynamic provisioning
Dynamic Provisioning

  • Change underlying system to support current user demands

  • Linux, Windows,Xen/KVM, Nimbus, Eucalyptus

  • Stateless images

    • Shorter boot times

    • Easier to maintain

  • Stateful installs

    • Windows

  • Use Moab to trigger changes and xCAT to manage installs


Xcat and moab
xCAT and Moab

  • xCAT

    • uses installation infrastructure to perform installs

    • creates stateless Linux images

    • changes the boot configuration of the nodes

    • remote power control and console

  • Moab

    • meta-schedules over resource managers

      • TORQUE and Windows HPC

    • control nodes through xCAT

      • changing the OS


Experiment manager
Experiment Manager

  • Objective

    • Manage the provisioning for reproducible experiments

    • Coordinate workflow of experiments

    • Share workflow and experiment images

    • Minimize space through reuse

  • Risk

    • Images are large

    • Users have different requirements and need different images



  • FutureGrid - http://www.futuregrid.org/

  • NSF Award OCI-0910812

  • NSF Solicitation 08-573

    • http://www.nsf.gov/pubs/2008/nsf08573/nsf08573.htm

  • ViNe - http://vine.acis.ufl.edu/

  • Nimbus - http://www.nimbusproject.org/

  • Eucalyptus - http://www.eucalyptus.com/

  • VAMPIR - http://www.vampir.eu/

  • Pegasus - http://pegasus.isi.edu/