Academic compute cloud provisioning and usage project
Download
1 / 45

Academic Compute Cloud Provisioning and Usage Project - PowerPoint PPT Presentation


  • 90 Views
  • Uploaded on

Academic Compute Cloud Provisioning and Usage Project. Peter Kunszt ETH/SystemsX.ch. 2012, November 19 Bern. Motivation. Researchers often only want services , not products . Services rely on Infrastructure Middleware Application Software Research Informatics ‘ Glue ’

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Academic Compute Cloud Provisioning and Usage Project' - barclay-rivera


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Academic compute cloud provisioning and usage project

Academic Compute Cloud Provisioning and UsageProject

Peter Kunszt

ETH/SystemsX.ch

2012, November 19

Bern


Motivation
Motivation

Researchers often only want services, notproducts.

Services rely on

  • Infrastructure

  • Middleware

  • Application Software

  • Research Informatics ‘Glue’

    We ‘the supporters’ want to offer ‘Apps’

  • Maintainable services

  • Published, usable tools and software

  • Browsable published research data

SDCD Bern


Motivation systemsx ch
Motivation: SystemsX.ch

Largest Swiss national research effort to date

SCHWEIZERISCHE EIDGENOSSENSCHAFTCONFÉDÉRATION SUISSECONFEDERAZIONE SVIZZERACONFEDERAZIUN SVIZRA

SDCD Bern


Some numbers
Somenumbers..

  • Funded by the Swiss government with CHF 25Million/year for 2008-2011, 2012, 2013-2016

  • 12 Swiss Universities and Research Institutions invest a matching 25 Million/y

  • Projects approved by the SNSF

  • 14 large research projects (4-7MCHF) until 2012, 10 new starting 2013 (3MCHF)

  • 50+ PhD projects

  • 20+ interdisciplinary pilot projects

  • 1 strategic support project: SyBIT 2MCHF/y average

SDCD Bern


Sybit project motivation
SyBIT Project Motivation

  • SystemsX.ch will produce and analyze a large amount of data

  • Strong need for coordinationamong data providers

  • Strong need for commonsemantics and compatibleservice offerings

  • Increased need for professionally supportedtools and services

SDCD Bern


Sybit provides support
SyBIT provides support

IPP

Service Providers

Platforms

PhosphoNetX

LipidX

MetaNetX

PlantGrowth

CellPlasticity

LiverX

CycliX

Neurochoice

WingX

YeastX

DynamiX

CINA

BattleX

InfectX

IT Infrastructure

SyBIT

Bioinformatics

IPHD

SDCD Bern


Sybit gives feedback
SyBIT gives feedback

IPP

Service Providers

Platforms

PhosphoNetX

LipidX

MetaNetX

PlantGrowth

CellPlasticity

LiverX

CycliX

Neurochoice

WingX

YeastX

DynamiX

CINA

BattleX

InfectX

IT Infrastructure

SyBIT

Bioinformatics

IPHD

SDCD Bern


Motivation1
Motivation

Researchers often only want services, notproducts.

Services rely on

  • Infrastructure

  • Middleware

  • Application Software

  • Research Informatics ‘Glue’

    We ‘the supporters’ want to offer ‘Apps’

  • Maintainable services

  • Published, usable tools and software

  • Browsable published research data

Enabling Research IT as a Service

SDCD Bern


Project goals
Project Goals

  • How to extend current cluster services using cloud technology?

  • Support new application models (MapReduce, specialized servers).

  • Test real applications.

  • Understand performance implications.

  • Define Service Models: How to move to cloud-like service orientation models.

  • Define Business Models: How to accommodate pay-per-use, OpEx vs. CapEx, how to plan an academic private cloud, and how to use and offer public clouds

  • Run real applications: Run a regular, a compute-intensive and a data-intensive application on the cloud.

SDCD Bern


Project goals1
Project Goals

Provide input to the mid- and long-term strategy for cluster and cloud infrastructure at ETH and UZH.

  • How to extend current cluster services using cloud technology?

  • Support new application models (MapReduce, specialized servers).

  • Test real applications.

  • Understand performance implications.

  • Define Service Models: How to move to cloud-like service orientation models.

  • Define Business Models: How to accommodate pay-per-use, OpEx vs. CapEx, how to plan an academic private cloud, and how to use and offer public clouds

  • Run real applications: Run a regular, a compute-intensive and a data-intensive application on the cloud.

Disseminate results in Switzerland broadly in academia and to interested parties (Workshop at project end)

SDCD Bern


Cloud attributes when do we talk about a cloud

DEFINITION

Cloud Attributes: When do we talk about a cloud

  • Self-service, On-demand, Cost transparency

    • Access to immediately available resources, paying for usage only. No long-term commitments. No up-front investments needed. Operational expenses only.

  • Elasticity, Multi-tenancy, Scalability

    • Grow and shrink size of resource on request. Sharing with other users without impacting each other. Economies of scale.

SDCD Bern


Definitions
Definitions

  • Self-service: A consumer can unilaterally provision computing capabilities, such as server time and network storage, without requiring human interaction.

  • On-demand: As needed, at the time when needed, automatic provisioning.

  • Cost Transparency: Accounting of actual usage transparent to user and service provider both, measured in corresponding terms (Hours CPU time, GB per Month, MB Transfer, etc)

SDCD Bern


Definitions1
Definitions

  • Elastic: Capabilities can be elastically provisioned and released, in some cases automatically, to scale rapidly outward and inward commensurate with demand.

  • Multi-tenant: The provider’s computing resources are pooled to serve multiple consumers, with resources dynamically assigned and reassigned according to consumer demand.

  • Scalable: To the consumer, the capabilities available for provisioning often appear to be unlimited and can be appropriated in any quantity at any time.

http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf


Hpc pyramid
HPC Pyramid

CSCS

Computing needs

Number of users

SDCD Bern


Relation to cloud as user extension
Relation to Cloud: As User (extension)

Cloud

CSCS

Computing needs

Use

Burst

Number of users

SDCD Bern


Today university clusters do not make use of the cloud
Today, University Clusters do not make use of the Cloud:

  • Technical details to be investigated:

    • Bursting the cluster into the cloud

      • Networking?

      • User Management?

      • File System?

  • Cloud-compatible licenses for commercial products are often not available

  • No billing mechanism to bill users of cluster for pay-per-use services

SDCD Bern


Relation to cloud as provider
Relation to Cloud: As Provider

Cloud

CSCS

Account / charge usage

Computing needs

Expose to

Number of users

SDCD Bern


Not clear how to be a cloud provider with a university cluster
Not clear how to be a Cloud Provider with a University Cluster

  • Univ. cluster is not self-service

  • Capital expenses, not just pay-per-use

  • Long-term commitment

  • Not extensible on-demand, not elastic

  • Sharing with others only according to policies

  • More stringent terms of use, needs account

  • We have examples to look at:

    • SDSC, Cornell, Oslo

SDCD Bern


Infrastructure and platform as a service
Infrastructure and Platform as a Service Cluster

Classic Approach Today

IaaS .

PaaS

From www.cloudadoption.org

SaaS

95%

time savings

Infrastructure

Platform

Software

FINISH

START


Software & Apps run on Cluster

platforms,

NOT

infrastructure

www.cloudadoption.org


Cloud stack

DEFINITION Cluster

Cloud Stack

Users or Portals. Can directly use each stack.

CLIENTS

Software

User Interface

MachineInterface

Platform

Components

Services

Infrastructure

Compute

Storage

Network

HARDWARE

Any kind of infrastructure for any of the stacks.


Who can makes use of what Cluster

  • Users may use any service

  • Portals may use any service

  • SaaS may or may not be built on top of PaaS or IaaS

  • PaaS may or may not be built on top of IaaS

User Portal

SaaS

PaaS

IaaS

Hardware

SDCD Bern


Public private hybrid clouds

DEFINITION Cluster

Public, Private, Hybrid Clouds

Private Cloud

Public Cloud

Hybrid Cloud

Connect

  • Own infrastructure only

  • In-house or hosted

  • Internal use or for sale

  • Full control on cloud stack, accounting, etc

  • Offered by partner organizations or cloud providers

  • Only operational expenses

  • No control on cloud stack, dependency on external partner

  • Private Cloud connected to Public Cloud

  • Remote cloud resources on-demand

  • Constraints on own cloud stack: needs to interoperate with public cloud

Institutional boundary

SDCD Bern


How to evolve the hpc service
How to evolve the HPC Service.. Cluster

  • ..to be able to offer a Platform as a Service.

  • ..to be able to make use of public clouds seamlessly (Hybrid model, CloudBursting)

SDCD Bern


Information gathering
Information Gathering Cluster

  • We collected a lot of information and conducted a survey on existing solutions (mandate to CloudBroker)

SDCD Bern


Lots of interactions
Lots of Interactions Cluster

  • With Cloud providers

    • IBM, Amazon, CloudSigma, HP, Google

  • Software providers

    • VMWare, HP, Dell, OpenStack flavors (Piston, ..)

  • Universities

    • SWITCH, ZHAW, SDSC, Cornell, Imperial College, U Oslo, Zaragoza

SDCD Bern


Choices
Choices Cluster

  • Commercial Cloud Appliance

    • Evaluate HP CloudSystem Matrix

    • Integrated hardware: HP blades and 3PAR storage

    • Runs with VMWare or Hyper-V

    • Complete management and end-user interfaces

  • Build our own

    • 2 different systems (Dell based)

    • OpenStack: Several distributions to test

    • Special software: ScaleMP, cloud FS

SDCD Bern





Infrastructure 1
Infrastructure 1 Cluster

  • ETH: HP CloudSystem Matrix Testbed

    • Operational as of THIS WEEK

  • 8 Intel, 8 AMD blades

  • 128GB memory per blade

  • 10TB storage 3PAR

  • HP Matrix cloud software is fixed

  • This is on RENT we have to give it back


Infrastructure 2
Infrastructure 2 Cluster

  • ETH: Build our own from new components.

    • Standard cluster nodes x16, diskless

    • 128GB RAM on each node

    • Very fast storage (SSD based) for VM images

  • Attach standard storage NAS from ETH

  • Cloud Stack:

    • OpenStack

    • VMWare

  • Being installed next monday

  • This remains at ETH after the project


Infrastructure 3
Infrastructure 3 Cluster

  • University of Zurich: Recycle existing components.

    • Set of old cluster nodes, heterogeneous

    • Cloud filesystem using local node storage (technologies will be evaluated)

      • GlusterFS

      • Ceph

SDCD Bern


Hpc cloud on the same hw
HPC + Cloud: On the same HW Cluster

Compute Nodes

…….

Storage

HPC CLUSTER

CLOUD HW

SDCD Bern


Hpc cloud on the same hw1
HPC + Cloud: On the same HW Cluster

  • Classic CLUSTER – Not Virtualized

  • Can be heterogeneous HW

  • OS controlled by Admins

  • Scheduler for job submission

  • Applications compiled and installed

  • Shared FS

  • CLOUD – Virtualized

  • Hypervisor and Cloud Stack controlled by Admins

  • Template ‘Apps’

  • Users can create new

  • Different kinds of storage

  • Different setups possible

  • Virtual SMP

Compute Nodes

…….

Storage

HPC CLUSTER

CLOUD HW

SDCD Bern


Storage
Storage Cluster

  • Ceph, Gluster

  • Mount REAL=non-virtual cluster FS (Lustre, GPFS)

  • Mount NFS

  • Object stores, e.g. SWIFT

  • Different HW

    • Local Disks

    • iSCSI

    • Very fast SSD-based appliance over 10Gb or FC or IB (deduplication, compression) – for VMs and fast disk

SDCD Bern


Cloud hpc use cases to test 1
Cloud HPC Use Cases to Test 1 Cluster

  • Extending the regular cluster into the cloud

    • Just run cluster node instances

    • Register back with cluster scheduler

    • Jobs can request these nodes explicitly

    • ALREADY tested using Amazon

  • Building a full virtualized cluster in our Cloud

    • Everything virtual: Cluster nodes, headnodes

    • Cluster FS : several options (see storage)

    • What do we learn? Reality check: HPC performance

SDCD Bern


Test case 1 software
Test Case 1 Software Cluster

  • Use regular cluster workloads, NOT data intensive

  • Rosetta: structural biology

  • GAMESS: molecular chemistry simulation

  • SMSCG workloads (if we get there)

SDCD Bern


Cloud hpc use cases to test 2
Cloud HPC Use Cases to Test 2 Cluster

  • Hadoop cluster

    • Build the virtual cluster dedicated to Hadoop

    • HFS or Swift

  • Commercial tool cluster: Matlab

    • Matlab ‘cluster’: allocate a few ‘fat’ VMs to Matlab

    • Let it run its internal clustering, expose to user

SDCD Bern


Test case 2 software
Test Case 2 Software Cluster

  • A bit more data intensive

  • Hadoop use cases

    • Proteomics: analysis of selected reaction monitoring data

    • Genomics: bowtie over hadoop (Crossbow)

  • Matlab and R

    • Set up cluster matlab on regular cluster

    • On SMP’d nodes

SDCD Bern


Cloud hpc use cases to test 3
Cloud HPC Use Cases to Test 3 Cluster

  • Data intensive workflow

    • InfectX pipeline: Image analysis – several TB of small files

    • Many kinds of scripts, mostly Matlab

    • Same workflow can be submitted many times

    • Error prone!

  • OpenBIS on-demand workflow

    • Extend metadata catalog with some basic processing capabilities using remote resources

    • Streaming of data to perform some processing in the cloud


Business models
Business Models Cluster

  • Cannot charge at full cost if we want to be the service provider (competitive advantage)

    • Internal and external views

  • Efficient, fair, feasible and generally accepted funding and charging model

  • New opportunities should not require to change existing business procedures for existing infrastructure (evolution not revolution)

  • Transparent Financial Accounting mechanism

SDCD Bern


Business models1
Business Models Cluster

  • Several models are being worked out

    • Shareholder model – one-time fee for TFLOPS or TB

    • Subscription model – yearly fee

    • Pay-per-use model

  • Self service options

    • Very detailed like Amazon

    • High-level ‘virtual cluster’ or PaaS

    • Top-level SaaS user gateways

SDCD Bern


Timeline
Timeline Cluster

today

Jul‘12

Apr‘12

Oct‘12

Jan‘13

Apr‘13

ETHProject Start

SWITCH AAAProject Start

SWITCH AAAProject End

Information Gathering

Business Model

Refinement of Targets

Application Definition

HP CloudSystem on lease

delivered

ready

return to HP

ETH Self-built system

ready

call

decision

delivery

UZH Self-built system

assemblyfrom existing stuff

ready

Application testing


Output
Output Cluster

  • Workshop in April’13 to show results of project

    • To all Swiss research community – See you there!

  • Input to ETH, UZH strategies for research infrastructure

    • Drive next procurement processes

    • Drive strategies for cooperation/outsourcing models

    • Drive new policy models for funding and sustainability

SDCD Bern


ad