Connecting lrms to grms
1 / 12

Connecting LRMS to GRMS - PowerPoint PPT Presentation

  • Uploaded on

Connecting LRMS to GRMS. Jeff Templon PDP Group, NIKHEF. HEPiX Batch Workshop 12-13 May 2005. Example Site Scenario. Computer cluster at HIKHEF: 50% guaranteed for SC-Grid 50% guaranteed for LHC experiments Allow either group to exceed 50% if other group not active

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Connecting LRMS to GRMS' - keran

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Connecting lrms to grms

Connecting LRMS to GRMS

Jeff Templon


HEPiX Batch Workshop

12-13 May 2005

Example site scenario
Example Site Scenario

  • Computer cluster at HIKHEF:

    • 50% guaranteed for SC-Grid

    • 50% guaranteed for LHC experiments

    • Allow either group to exceed 50% if other group not active

    • Allow D0 experiment to scavenge any crumbs

    • Give ‘dteam’ (operations group) extremely high priority; but limit to 2 concurrent jobs

    • Limit running jobs from production groups to ~ 95% of capacity (always keep a few CPUs free for e.g. operations checks)

Example user scenarios
Example User Scenarios

  • “polite user”

    • Uses grid job submission tools ‘bare’ & lets grid figure it out

  • “high-throughput user”

    • Ignores grid suggestions on sites; blast each site until jobs start piling up in ‘waiting’ state, then go to next site

  • “sneaky high-throughput user”

    • Like above but doesn’t even look at whether jobs pile up … jobs aren’t real jobs, they are ‘pilot’ jobs (supermarket approach)

  • “fast turnaround user”

    • Wants jobs to complete as soon as possible (special priority)

Connect Users to Sites with “Maximal Joint Happiness”

  • Users: work finished ASAP

  • Sites: always full and usage matches fair-share commitments

Key question how long to run
Key Question: How Long to Run?

  • Users: want to submit to sites that will complete job as fast as possible

  • Sites: site may be “full” i.e. no free CPUs BUT:

    • HIKHEF 100% full for ATLAS means that

    • Any ‘SC-Grid’ jobs submitted will run as soon as a free CPU appears

    • If you can’t get this message to users, won’t get any SC-Grid jobs

  • Should be clear from this that answer to “how long” depends on who is asking!

Different answers same question
Different answers, same question



Time to start (sec)

Real Time -> (sec)

Black lines are measured, blue triangles are statistical predictions

See Laurence’s Talk

How long to run
How Long to Run

  • Need reasonable normalized estimates from users

  • Need normalized CPU units

  • Need solution for heterogeneous CPU population behind most site’s grid entry points (HIKHEF has these)

  • Probably see Laurence’s talk here too!

  • Added value: good run-time estimates helps LRMS scheduling (eg MPI jobs & backfill)

Sneaky ht vs polite users
Sneaky HT vs Polite Users

  • Polite almost always loses

  • Sneaky HT good for sites to 0th order – mix of waiting jobs allows good scheduling

  • However

    • Templon needs to run 10 jobs

    • Submits 10 jobs to each of 100 sites in grid

    • First ten to start grab the ‘real’ jobs

    • Other 990 look exactly like black hole jobs

    • Waste ~ 16 CPU hrs (2 min scheduling cycle * 500 passes)

Polite users still lose unless we solve
Polite Users still Lose unless we solve:

  • One question, one answer … one size fits nobody

  • High overhead in WMS: avg 250 sec life cycle for 20 sec job!

  • Two hour job

  • Single user

  • Single RB

  • Best RB perf

  • Sched cycle is only delay at site

Grid Speedup

Number of Jobs Submitted

High priority users
High Priority Users

  • Sol’n 1: dedicated CPUs (standing reservations) (expensive!)

  • Soln’ 2: virtualization w/preemption (long way off?)

Other issues
Other Issues

  • Transferring Info to LRMS

    • Run-time estimate

      • helps enormously in e.g. scheduling MPI jobs

      • Also may help in answering “the question”

    • Memory usage, disk space needs, etc etc

  • MPI & accounting – what about “the dip”?

  • Self-disabling sites (avoid hundreds of lost jobs and tens of lost person-hours)

  • “Circuit breakers”? (Miron Livny)