Distributed computing at the facility level applications and attitudes
This presentation is the property of its rightful owner.
Sponsored Links
1 / 20

Distributed computing at the Facility level: applications and attitudes PowerPoint PPT Presentation


  • 49 Views
  • Uploaded on
  • Presentation posted in: General

Distributed computing at the Facility level: applications and attitudes. Tom Griffin STFC ISIS Facility [email protected] NOBUGS 2008, Sydney. Spare cycles. Typical PC CPU usage is about 10% Usage minimal 5pm – 8am Most desktop PCs are really fast Waste of energy

Download Presentation

Distributed computing at the Facility level: applications and attitudes

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Distributed computing at the facility level applications and attitudes

Distributed computing at the Facility level: applications and attitudes

Tom Griffin

STFC ISIS Facility

[email protected]

NOBUGS 2008, Sydney


Spare cycles

Spare cycles

  • Typical PC CPU usage is about 10%

  • Usage minimal 5pm – 8am

  • Most desktop PCs are really fast

  • Waste of energy

  • How can we use (“steal?”) unused CPU cycles to solve computational problems?


Types of application

Types of Application

  • CPU Intensive

  • Low to moderate memory use

  • Not too much file output

  • Coarse grained

  • Command line / batch driven

  • Licensing issues?


Distributed computing solutions

Distributed computing solutions

Lots of choice CONDOR, GridEngine, GridMP…

  • Grid MP Server hardware

    • Two, dual Xeon 2.8GHz servers RAID 10

  • Software

    • Servers run RedHat Linux Enterprise Server / DB2

    • Unlimited Windows (and other) clients

  • Programming

    • Web Services interface – XML, SOAP

    • Accessed with C++ , Java, C#

  • Management Console

    • Web browser based

    • Can manage services, jobs, devices etc

  • Large industrial user base

    • GSK, J&J, Novartis etc.


Installing and running grid mp

Installing and Running Grid MP

Server Installation

2 hours

Client Installation

Create MSI and RPM using ‘setmsiprop’

30 seconds

Manual Install

Better security on Linux and Macs


Adapting a program for gridmp

Adapting a program for GridMP

  • Fairly easy to write

  • Interface to grid via Web Services

    • C++, Java, C#

  • Think about how to split your data

  • Wrap your executable

  • Write the application service

    • Pre and Post processing


Package your executable

Package your executable

DLLs

Standard data

files

Executable

Environment

variables

}

PROGRAM MODULE

EXECUTABLE

Compress?

Encrypt?

Uploaded to, and resident

on, the server


Create run a job

Create / run a job

Proteins

Molecules

Pkg3

Pkg4

Pkg2

Pkg1

Client side

https://

Datasets

Create job, generate

cross product

Server side

Workunits

Start job


Code examples

Code examples

Mgsi.Job job = new Mgsi.Job();

job.application_gid = app.application_gid;

job.description = txtJobName.Text.Trim();

job.state_id = 1;

job.job_gid = ud.createJob(auth, job);

Mgsi.JobStep js = new Mgsi.JobStep();

js.job_gid = job.job_gid;

js.state_id = 1;

js.max_concurrent = 1

js.max_errors = 20;

js.num_results = 1;

js.program_gid = prog.program_gid;


Code examples1

Code examples

  • Mgsi.DataSet ds =new Mgsi.DataSet();

  • ds.job_gid = job.job_gid;

  • ds.data_set_name = job.description + "_ds_" + DateTime.Now.Ticks;

  • ds.data_set_gid = ud.createDataSet(auth, ds);

  • for (int i = 1; i <= numWorkunits.Value; i++) {

  • FileTransfer.UploadData uploadD = ft.uploadFile(auth, Application.StartupPath + "\\testdata.tar");

  • Mgsi.Data data = new Mgsi.Data();

  • data.data_set_gid = ds.data_set_gid;

  • data.index = i;

  • data.file_hash = uploadD.hash;

    • data.file_size = long.Parse(uploadD.size);

  • datas[i - 1] = data; }

  • ud.createDatas(auth, datas);

  • ud.createWorkunitsFromDataSetsAsync(auth, js.job_step_gid, new string[] { ds.data_set_gid }, options);


Performance

Performance

Famotidine form B

13 degrees of freedom

P21/c V=1421

Sync data to 1.64A

1 x 107 moves per run, 64 runs

Standard DASH

2.4GHz Core2 Quad

using single core

Gdash submit to test

grid of 5 in-use PCs

4 x 2.4GHz Core2 Quad

1 x 2.8GHz Core2 Quad

Job complete = 9 hrs

Job complete = 24 minutes

Speedup = 22.5 x


Performance 999 sa runs full grid

Performance – 999 SA runs, full grid

4 days 18 hours CPU in

~40 minutes elapsed time

317 cores

from

163 devices

42 Athlons: 1.6–2.2Ghz

168 Core 2 duos: 1.8–3 Ghz

36 Core 2 quads: 2.4–2.8 Ghz

1 duron @ 1.2Ghz

42 P4s 2.4–3.6Ghz

27 Xeons: 2.5–3.6Ghz

Workunits

Time


A particular success mcstas

A Particular Success - McStas

HRPD supermirror guide design

Complex design

Meaningful simulations take a long time

Want to try lots of ideas

Many runs of >200 CPU days

Simpler model was best value

Massive improvement in flux

Significant cost savings


Problems

Problems

  • McStas

  • Interactions in the wild

  • Symantec Anti-Virus

  • Did not show up in testing

  • McStas restricted to night running only


User attitudes

User Attitudes

  • A range

  • Theft

  • “I’m not having that on my machine”

  • First thing to get blamed

    • Gaining more trust

    • Evangelism by users


Flexibility with virtualisation

Flexibility with virtualisation

  • Request to run ‘GARefl’ code

  • ISIS is Windows based

  • Few Linux PCs

  • VMWare server is freeware

  • 8 Hosts gave 26 cores

  • More cores = more demand

  • 56 real cores recruited from servers, 64-core Beowulf

  • 10 mac cores

  • Run Linux as a job


Flexibility with virtualisation1

Flexibility with virtualisation


The future

The Future

Grid growing in power every day

New machines added, old ones still left on

Electricity

Energy saving drive at STFC – switch machines off

Wake On-LAN ‘Magic Packets’ + Remote hibernate

Laptops

Good or bad?


Summary

Summary

Distributed computing Perfect for coarse-grained,CPU intensive, ‘disk-lite’

Resources Use existing resources. Power increases with time, no need to write-off assets. Scalable

Not just faster Allows one to try different scenarios

Virtualisation Linux under Windows, Windows under Linux.

Green credentials PCs are running anyway, better to utilise them. Can be powered down & up.


Acknowledgements

Acknowledgements

  • ISIS Data Analysis Group

    • Kenneth Shankland

    • Damian Flannery

  • STFC FBU IT Service Desk and ISIS Computing Group

  • Key Users

    • Richard Ibberson (HRPD)

    • Stephen Holt (GARefl)

  • Questions?


  • Login