1 / 28

Grid Application Deployment

Grid Application Deployment. Adam Birnbaum Grid Applications Group -- SDSC birnbaum@sdsc.edu. Outline. Grid Promises and Reality NPACI and Teragrid Resources Costs of Distribution/Parallelism Good Candidates for Grid Deployment. Grid Promises. Interoperation of Distributed Resources

raleigh
Download Presentation

Grid Application Deployment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Grid Application Deployment Adam Birnbaum Grid Applications Group -- SDSC birnbaum@sdsc.edu

  2. Outline • Grid Promises and Reality • NPACI and Teragrid Resources • Costs of Distribution/Parallelism • Good Candidates for Grid Deployment

  3. Grid Promises • Interoperation of Distributed Resources • Seamless Virtual Organizations • Computing as a Commodity • Transparent Computing • Harvesting of Idle Compute Time

  4. Grid Reality • Problems • Emerging Middleware • Confusing Hype • Difficulty of Application Development • Local Accounting and Scheduling Policies • No meta-schedulers or co-allocation • Grid: Distributed Computing, or HPC? (SETI@Home vs NPACI Grid?) • Opportunities • Some apps are well-suited to distributed resources • New types of systems are possible

  5. Grid Reality: Submitting Jobs jimbob.foo.edu peanutcluster.foo.edu bigiron.bar.edu

  6. Grid Reality: Submitting Jobs (SSH)1. Setup authorized keys jimbob.foo.edu peanutcluster.foo.edu bigiron.bar.edu ~/.ssh/authorized_keys ~/.ssh/identity.pub ~/.ssh/authorized_keys

  7. Grid Reality: Submitting Jobs (SSH)2. Write Batch Submission Script jimbob.foo.edu peanutcluster.foo.edu bigiron.bar.edu PBS script (submit.pbs): LoadLeveler script (submit.ll): #! /bin/sh #PBS –l nodes=80 #PBS –l walltime=00:10:00 ~/proj/doit #! /bin/sh #@ node=10 #@ wall_clock_limit=00:10:00 #@ <other special commands> ~/proj/doit

  8. Grid Reality: Submitting Jobs (SSH)3. Submit and monitor the jobs ssh ssh jimbob.foo.edu peanutcluster.foo.edu bigiron.bar.edu > qsub submit.pbs 8553.peanutcluster > qstat 8553 > llsubmit submit.ll llsubmit: The job “bigiron.7993” has been submitted > llq 7993

  9. Grid Reality: Submitting Jobs (Globus) jimbob.foo.edu peanutcluster.foo.edu bigiron.bar.edu

  10. Grid Reality: Submitting Jobs (Globus)1. Get in the remote grid-mapfiles jimbob.foo.edu peanutcluster.foo.edu bigiron.bar.edu email email > grep ‘Subject:’ ~/.globus/usercert.pem | mail theadmin@bar.edu > grep ‘Subject:’ ~/.globus/usercert.pem | mail sysadmin@foo.edu

  11. Grid Reality: Submitting Jobs (Globus)2. Write RSL Submission Script jimbob.foo.edu peanutcluster.foo.edu bigiron.bar.edu Globus RSL script (submit.rsl): &(executable=$(HOME)/doit) (count=80)(maxtime=10)

  12. Grid Reality: Submitting Jobs (Globus)3. Authenticate and Submit globus globus jimbob.foo.edu peanutcluster.foo.edu bigiron.bar.edu > grid-proxy-init > globus-job-submit peanutcluster.foo.edu submit.rsl https://peanutcluster.foo.edu:44159/30282/1061323349/ > globus-job-submit bigiron.foo.edu submit.rsl https://bigiron.foo.edu:44159/33242/1068476352/ > globus-job-status https://bigiron.foo.edu:44159/33242/1068476352/

  13. Grid Reality: Submitting Jobs • SSH Advantages • Ubiquitous • Can set up keys without involving sysadmins • Richness of batch scheduler syntax • Globus Advantages • Single batch submission syntax • Single signon • Certificates work transparently on good systems (like NPACI Grid and Teragrid) • Interface with portals

  14. Outline • Grid Promises and Reality • NPACI and Teragrid Resources • Costs of Distribution/Parallelism • Good Candidates for Grid Deployment

  15. NPACI Grid and NPACKage • NPACI Grid • BlueHorizon (SDSC, 1152 CPU’s, LoadLeveler) • Longhorn (TACC, 224 CPU’s, LoadLeveler) • Morpheus (U Mich., 134 CPU’s, PBS) • Hypnos (U Mich., 256 CPU’s, PBS) • http://npacigrid.npaci.edu • http://hotpage.npaci.edu • Software – NPACKage

  16. NPACKage • Grid Middleware packaging effort • Easy Installation • Included software: • NMI (Globus, GSI-SSH, Condor-G, NWS…) • NPACI products (SRB client, DataCutter, Ganglia, APST, LaPACK for Clusters) • http://npackage.npaci.edu

  17. Teragrid • Machines (all PBS) • SDSC (256 CPUs, 500 TB) • CalTech (64 Itanium 2 CPUs, 80 TB) • NCSA (512 CPUs, 230 TB) • Pittsburgh (3000 CPUs + 128 CPUs, 221 TB) • ANL (198 CPUs, 20TB) • Very fast backbone! But 50ms latency. • http://www.teragrid.org • Available Q1 ‘04 • Software – NMI

  18. Outline • Grid Promises and Reality • NPACI and Teragrid Resources • Costs of Distribution/Parallelism • Good Candidates for Grid Deployment

  19. Parallel Application Scalability • Speedup: ratio of serial to parallel runtime • Amdahl’s law: upper limit on speedup (ts/tp) ts = serial runtime tp = parallel runtime fs = serial fraction of program fp = parallel fraction of program N = number of CPU’s

  20. Amdahl’s Laww/ Extra Communication Costs, fp = .95

  21. Parallelizing: Keep in mind… • Scale, not speed, may be biggest advantage of parallelizing • Programming & debugging: tricky! • Keep inter-process communication within a single cluster. • Same issues apply to Grid apps, with heavier emphasis on communication NPACI Consultants can help!

  22. Costs of Distribution • Jim Gray, Microsoft Research (http://www.clustercomputing.org/content/tfcc-5-1-gray.html) • A dollar buys roughly: • 10 Tops (2 GHz CPU for $2000, 666 day lifetime) • 1 GB sent over a WAN (1 Mbps link @ $100/mo) • Conclusion: breakeven at 10,000 ops / byte • But: time on NPACI Grid & Teragrid is “free” • Metrics: time-to-solution, throughput, problem size • Time Costs: data transfer, queue, execution • Nmax cpu’s ~ texecution / ttransfer

  23. Grid Software Challenges • Tracking task states • Managing data files and dependencies • Failure detection and mitigation • Accessing heterogeneous resources • Application performance modeling • Application-specific issues

  24. Deploy on the Grid if… • Your app has limited inter-process communication • Bag-of-tasks or Parameter Sweeps • Workflow of Parallel Tasks • You have no other choice • Expensive or distributed data sources (detector, electron microscope) • Requirement is for a service or portal. • Insufficient allocations on any one cluster Otherwise, keep everything on one cluster!

  25. Good Grid App: Bag of Tasks • “Embarassingly Parallel”: Lots of single-CPU jobs, 1 process per task • Performance limited by file transfer times • Systems to look at • APST (http://grail.sdsc.edu/projects/apst) • Condor (http://www.cs.wisc.edu/condor) • MOSIX (http://www.mosix.org)

  26. Good Grid App: Workflow • Series of independent tightly-coupled codes • New performance factor: Queue time. • No silver bullets, but expect rapid progress • System to look at: Condor-G / DAGMan

  27. Portal • Hides complexity of distributing applications from users. • User interfaces for data management • Sharing of experiments and data • Systems to look at • GridPort (http://gridport.npaci.edu) • MyProxy (http://grid.ncsa.uiuc.edu/myproxy/) • GPDK (http://doesciencegrid.org//projects/GPDK/)

  28. Grid Application Deployment: Conclusions • Recommendations • Justify parallelization and distribution • Plan communication and data transfers carefully • Measure and monitor performance, utilization, throughput • Keep expectations of available software realistic • References • Parallel Computing: http://www.npaci.edu/PCOMP • NPACI Grid: http://npacigrid.npaci.edu • Teragrid: http://www.teragrid.org • NPACI Resources & User Manuals: http://hotpage.npaci.edu • GGF: http://www.gridforum.org

More Related