connecting lrms to grms
Skip this Video
Download Presentation
Connecting LRMS to GRMS

Loading in 2 Seconds...

play fullscreen
1 / 12

Connecting LRMS to GRMS - PowerPoint PPT Presentation

  • Uploaded on

Connecting LRMS to GRMS. Jeff Templon PDP Group, NIKHEF. HEPiX Batch Workshop 12-13 May 2005. Example Site Scenario. Computer cluster at HIKHEF: 50\% guaranteed for SC-Grid 50\% guaranteed for LHC experiments Allow either group to exceed 50\% if other group not active

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Connecting LRMS to GRMS' - keran

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
connecting lrms to grms

Connecting LRMS to GRMS

Jeff Templon


HEPiX Batch Workshop

12-13 May 2005

example site scenario
Example Site Scenario
  • Computer cluster at HIKHEF:
    • 50% guaranteed for SC-Grid
    • 50% guaranteed for LHC experiments
    • Allow either group to exceed 50% if other group not active
    • Allow D0 experiment to scavenge any crumbs
    • Give ‘dteam’ (operations group) extremely high priority; but limit to 2 concurrent jobs
    • Limit running jobs from production groups to ~ 95% of capacity (always keep a few CPUs free for e.g. operations checks)
example user scenarios
Example User Scenarios
  • “polite user”
    • Uses grid job submission tools ‘bare’ & lets grid figure it out
  • “high-throughput user”
    • Ignores grid suggestions on sites; blast each site until jobs start piling up in ‘waiting’ state, then go to next site
  • “sneaky high-throughput user”
    • Like above but doesn’t even look at whether jobs pile up … jobs aren’t real jobs, they are ‘pilot’ jobs (supermarket approach)
  • “fast turnaround user”
    • Wants jobs to complete as soon as possible (special priority)
Connect Users to Sites with “Maximal Joint Happiness”
  • Users: work finished ASAP
  • Sites: always full and usage matches fair-share commitments
key question how long to run
Key Question: How Long to Run?
  • Users: want to submit to sites that will complete job as fast as possible
  • Sites: site may be “full” i.e. no free CPUs BUT:
    • HIKHEF 100% full for ATLAS means that
    • Any ‘SC-Grid’ jobs submitted will run as soon as a free CPU appears
    • If you can’t get this message to users, won’t get any SC-Grid jobs
  • Should be clear from this that answer to “how long” depends on who is asking!
different answers same question
Different answers, same question



Time to start (sec)

Real Time -> (sec)

Black lines are measured, blue triangles are statistical predictions

See Laurence’s Talk

how long to run
How Long to Run
  • Need reasonable normalized estimates from users
  • Need normalized CPU units
  • Need solution for heterogeneous CPU population behind most site’s grid entry points (HIKHEF has these)
  • Probably see Laurence’s talk here too!
  • Added value: good run-time estimates helps LRMS scheduling (eg MPI jobs & backfill)
sneaky ht vs polite users
Sneaky HT vs Polite Users
  • Polite almost always loses
  • Sneaky HT good for sites to 0th order – mix of waiting jobs allows good scheduling
  • However
    • Templon needs to run 10 jobs
    • Submits 10 jobs to each of 100 sites in grid
    • First ten to start grab the ‘real’ jobs
    • Other 990 look exactly like black hole jobs
    • Waste ~ 16 CPU hrs (2 min scheduling cycle * 500 passes)
polite users still lose unless we solve
Polite Users still Lose unless we solve:
  • One question, one answer … one size fits nobody
  • High overhead in WMS: avg 250 sec life cycle for 20 sec job!
  • Two hour job
  • Single user
  • Single RB
  • Best RB perf
  • Sched cycle is only delay at site

Grid Speedup

Number of Jobs Submitted

high priority users
High Priority Users
  • Sol’n 1: dedicated CPUs (standing reservations) (expensive!)
  • Soln’ 2: virtualization w/preemption (long way off?)
other issues
Other Issues
  • Transferring Info to LRMS
    • Run-time estimate
      • helps enormously in e.g. scheduling MPI jobs
      • Also may help in answering “the question”
    • Memory usage, disk space needs, etc etc
  • MPI & accounting – what about “the dip”?
  • Self-disabling sites (avoid hundreds of lost jobs and tens of lost person-hours)
  • “Circuit breakers”? (Miron Livny)