1 / 33

From Grid to Global Computing: Deploying Parameter Sweep Applications

From Grid to Global Computing: Deploying Parameter Sweep Applications. Henri Casanova Grid Research And Innovation Laboratory (GRAIL) http://grail.sdsc. edu/ San Diego Supercomputer Center (SDSC) Computer Science and Engineering Dept. (CSE) University of California, San Diego (UCSD).

kelsie-kidd
Download Presentation

From Grid to Global Computing: Deploying Parameter Sweep Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. From Grid to Global Computing:Deploying Parameter SweepApplications Henri Casanova Grid Research And Innovation Laboratory (GRAIL) http://grail.sdsc.edu/ San Diego Supercomputer Center (SDSC) Computer Science and Engineering Dept. (CSE) University of California, San Diego (UCSD)

  2. Parameter Sweep Applications Input data Tasks Raw Output Post-processing Final Output • Many compute tasks • No or simple dependencies • Several output post-processing stages • Potentially large datasets

  3. Relevance • Arise in virtually every field of science an engineering • Monte Carlo, Parameter Space Searches, Parameter Studies, etc. • Biology, Astrophysics, Physics, Bioinformatics, Economics, etc. • Primary candidate for Grid computing • Latency-tolerant, amenable to simple fault-tolerance • Need huge amount of resources

  4. Outline of the Presentation • Parameter Sweep Applications (PSAs) • APST • The Virtual Instrument • BIO@Home

  5. Scheduling of PSAs ? Grid

  6. Grid Scheduling Practice • Ad-hoc solutions: • specific to one application • hand-tuned to the environment (e.g. SF-Express demo) • Large body of work on Scheduling • What can we re-use on the Grid? • Heterogeneous resources • Dynamic performance characteristics • Resources downtimes • Complex network topologies • Performance prediction errors

  7. “DataGrid” Scheduling Goal: Co-locate/replicate data and computation • Dynamic Priority List-Scheduling • Built on heuristics described in [Ibarra77, Siegel99] • Added adaptivity • Simulation results • List-scheduling works, adaptivity should make it practical • Experimental results (Demo at SC’00 and SC’01) [HCW’00] H. Casanova, A. Legrand, et al.

  8. Lessons • Much scheduling work to re-use • List-scheduling with Dynamic Priorities seems effective • Simulation • Experimental • Let’s build software that uses it • Let’s target scientific communities

  9. Motivation for APST • Started as scheduling research • Evolved into a tool that provides • Transparency of Grid execution • Data movements • Remote job management • Multiple Grid middleware back-ends • Scheduling • Self-scheduling • List scheduling w/ dynamic priorities

  10. APST Designs Scheduler Metadata Bookkeeper Decisions Information XML application and resource descriptions Compute Transport Actions • The AppLeS Parameter Sweep Template: An Application Execution Environment APST Grid Services APST client Grid

  11. APST: Lessons • The Grid is difficult to use • APST provides a simple software layer that does one thing well • Minimal user interface (XML, command-line) • Used as a building block for domain-specific applications • E.g. multi-cluster bio-informatics (Singapore) • Ssh? • Default mechanism • Critical for gaining user buy in • Natural way to lead to using the Grid

  12. APST Status • Version 1.1 released 2 weeks ago • Available for public download • Used for 10+ applications • Bioinformatics (BLAST, HMM, …) • Computational Neuro-science • Globus, NetSolve, Ssh, Condor • GASS, IBP, Scp, GridFTP, SRB, • NWS, MDS, Ganglia,… http://grail.sdsc.edu/projects/apst

  13. APST Research Directions • APST is a research platform • Maintained by one staff • Several graduate student contributors • Partitionable Workload • Bioinformatics (database splitting) • Factoring: Decrease chunk size • Pipelining: Increase chunk size • Combined? • Create APST-BLAST (Mario Lauria, OSU Yang Yang, UCSD)

  14. Outline of the Presentation • Parameter Sweep Applications (PSAs) • APST • Virtual Instrument • BIO@home

  15. Computational Neuroscience • MCell: Monte Carlo Cell simulator • Developed at Salk and PSC • Gain knowledge about neuro-transmission mechanisms • Fundamental for drug design (psychiatry) • Large user base (yearly MCell workshop) • Parallel MC simulations at the molecular level

  16. Traditional MCell usage • “By hand” • No automatic project management • No transparent resource access • No automated data management • Consequences • No interactive simulations • No fault-tolerance, scheduling, … • MCell limited to resources in the lab

  17. MCell and APST • APST alleviates some of the limitations • Large-scale simulations • Fault-tolerance and scheduling • Data retrieval from distributed storage • XML application descriptions • No interactivity • MCell is exploratory • User interaction is fundamental for many users

  18. The Virtual Instrument • $2.5M funding from the NSF • Salk, PSC, UCSB, UTK, UCSD • A running MCell simulation should behave as a lab instrument • Computational steering for MCell • User interface • Grid software • Application software • Scheduling research (how does one scheduling an application that’s being steered interactively?)

  19. VI Software Grid Storage and Compute Resources control + data VI Daemon compute Grid Services control VI Interface control + data process VI Database VI User data data OpenDX storage

  20. Scheduling Goals • Reduce the “search” time • Let user assign levels of importance to regions on the parameter space • Assign fraction of resources with respect to the importance levels • Assign priorities to tasks • Interesting questions • Job control limited on Grid resource • Cannot assign exact fractions • Interesting trade-offs between control overhead and accuracy of priorities

  21. Current Status • First software prototype released in Feb 2002 • Globus and Ssh • MySQL • OpenDX • priority-based scheduling • 20,000 lines of C++ • Upcoming papers • JPDC submission • Scheduling paper (SC submission)

  22. Outline of the Presentation • Parameter Sweep Applications (PSAs) • PSAs on the Grid with APST • MCell Virtual Instrument • Global Computing

  23. SETI@home • Over 500,000 active participants, most of which run screensaver on home PC • Over a cumulative 20 TeraFlop/sec • Versus 12.3 TeraFlop/sec of IBM’s ASCI White • Cost: $500,000 + $200,000 in donated hardware • Less than 1% of the $110 million required for ASCI White

  24. Global vs. Grid Computing • Nature of resources • Home desktops running Windows and are completely autonomous • Machines powered on and off by user • Behind firewalls, dynamic IP, transient network connections • Programming model • Server cannot “push” tasks to clients • Server has no little means for remote job control • Server has incomplete information about resources and availability

  25. Goal • SETI@home limitations: • Embarrassingly parallel • Infinite amount of input data • Pure throughput • Can we do something more? • Short-lived applications? • Parallel applications? • Compute service? • BIO@Home • Smith-Waterman for short/long sequences • No real software yet (build on XtremWeb?)

  26. Scheduling? • Sophisticated scheduling algorithms need information and control • At the moment: Simple mechanisms • Work unit duplication Specifies max number of times a work unit can be resent • Timeouts Time that must elapse before work unit is resent

  27. Simulation • Built a simulation model • Using statistics/surveys/extrapolations • Next: logs from real systems (XtremWeb?, Entropia?) • Evaluated the impact of both mechanisms on performance and throughput

  28. Early Lessons • Trade-off between throughput and turn-around time • Duplication: • aggressively decreases turn-around time • wastes resources • there is an optimal value • Timeouts: • moderately lowers turnaround times • preserves good throughput • infinite timeouts is of course not a good idea

  29. Future work • Two knobs • Question: A compute service? • Mix of applications (SETI, short-lived, …) • Singapore Bio-informatics institute • Notion of fairness? • How do we implement policy with many volatile resources? • Software • Re-use existing platforms: • XtremWeb • Entropia

  30. Conclusion • APST, Virtual Instrument, BIO@Home • Other GRAIL activities I didn’t talk about • Scientific Computing • Simulation • Adaptive Scheduling • Networking http://grail.sdsc.edu

  31. Experimental Results TITECH GASS • Self-scheduling • XSufferage Ssh UCSD Globus Tokyo NWS GASS UTK IBP NetSolve

More Related