1 / 20

Condor and the Grid

Condor and the Grid. D. Thain, T. Tannenbaum, M. Livny. Christopher M. Moretti 23 February 2007. Problem & Opportunity. Users need CPUs Scientific computing Mathematical modeling Data mining Many CPU cycles are unused Personal workstations General use laboratories Research machines.

morenok
Download Presentation

Condor and the Grid

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Condor and the Grid D. Thain, T. Tannenbaum, M. Livny Christopher M. Moretti 23 February 2007

  2. Problem & Opportunity • Users need CPUs • Scientific computing • Mathematical modeling • Data mining • Many CPU cycles are unused • Personal workstations • General use laboratories • Research machines

  3. Solution: Condor • “A hunter of idle workstations” • Keeps track of resources • needed and available • Determines and assigns matches • Monitors progress • Cleans up and reports results

  4. Architecture • Three principals: • Agent: machine needing resources • Matchmaker • Resource: machine lending resources • Three phases: • Advertising • Matching/Claiming • Deploying/Executing

  5. Advertising Does Y satisfy X? MatchMaker I need X I have Y Agent Lender needy.cse.nd.edu idle.cse.nd.edu

  6. Matching & Claiming MatchMaker Use idle.cse.nd.edu Listen for needy.cse.nd.edu Agent Lender Are you still available? Yes. needy.cse.nd.edu idle.cse.nd.edu

  7. Deploying / Executing Agent Lender Fork! Fork! Shadow Sandbox Run job J. J I need file /tmp/foo. Split Execution needy.cse.nd.edu idle.cse.nd.edu

  8. Matching • How are matches determined? • Policy • ClassAds • Why independently claim a match? • What if the Matchmaker dies?

  9. MyType=“Job” TargetType=“Machine” Requirements= ((other.Arch==“INTEL”&&other.OpSys==“LINUX” && KeyboardIdle>600)) Cmd=“/tmp/a.out” Owner=“cmoretti” MyType=“Machine” TargetType=“Job” Machine= “dustpuppy.cse.nd.edu” Requirements= (( KeyboardIdle>600 )) Arch=“INTEL” OpSys=“LINUX” ClassAds

  10. Flocking • Using another pool’s resources • Utilize more total resources • Find resources that match needs • Two methods • Gateway flocking • Direct flocking

  11. Gateway Flocking • Each pool has a known “gateway” • Gateways negotiate sharing • Advertise resources and needs • Transmit requests to local matchmaker • Pool-level granularity • Accounting • Policy • Now obsolete

  12. Gateway Flocking R 1 MM A R 2 Gateway Gateway 3 4 R 5 MM R R 5 R R

  13. Direct Flocking • Agents report to other matchmakers • No gateways • Equivalent to being in multiple pools? • Now the preferred (only) method

  14. Gateway Flocking R MM A 1 R 2 R MM R R 3 R R

  15. Transparency Fosters organization-level sharing Poor accounting Complicated No gateways Individual relationships supported Non-transparent Fewer organization-level agreements Flocking Comparison Gateway Flocking Direct Flocking

  16. Things Aren’t Perfect • What happens if (when) … • Matchmaker goes down • Network or Agent fails during deploy • Resource or App fails during compute • Non-dedicated machines. • How do we keep owners happy? • What happens when an owner reclaims a resource?

  17. 2376456 (100%) CPU-Hours Total 281003 (11%) CPU-Hours Consumed by Owner at Keyboard 934277 (39%) CPU-Hours Totally Unused 1161176 (48%) CPU-Hours Harnessed by Condor Total Consumption in 2006 Condor at Notre Dame http://www.cse.nd.edu/~ccl/operations/condor/2005/users.html “Harnessing Idle Computers with Condor at Notre Dame: Impact on Research in 2006”, Douglas Thain

  18. Current Donors Feb 2007 “Harnessing Idle Computers with Condor at Notre Dame: Impact on Research in 2006”, Douglas Thain

  19. CPU History “Harnessing Idle Computers with Condor at Notre Dame: Impact on Research in 2006”, Douglas Thain

  20. Recap • Condor facilitates distributed computation on dedicated or scavenged CPUs arranged by a matchmaker using ClassAds. • Split Execution is necessary to fit the job’s needs to the environment. • An agent can advertise to multiple matchmakers to examine more potential matches.

More Related