1 / 18

Using APST to Run Parameter Sweeps on the Grid

Using APST to Run Parameter Sweeps on the Grid. Jim Hayes Grid Research and Innovation Laboratory (GRAIL). Parameter Sweep Applications. “Many” tasks Tasks vary in switches and/or input files Minimal inter-task communication Typically produce files as output

sef
Download Presentation

Using APST to Run Parameter Sweeps on the Grid

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using APST to Run Parameter Sweeps on the Grid Jim Hayes Grid Research and Innovation Laboratory (GRAIL)

  2. Parameter Sweep Applications • “Many” tasks • Tasks vary in switches and/or input files • Minimal inter-task communication • Typically produce files as output • Found in bio-informatics, neuroscience, computer graphics, discrete-event simulations, protein folding, database searches, etc.

  3. PSAs Map Well to Grid • Can effectively use huge amounts of resources • Flexibility in task assignment • Latency tolerant—little communication • Fault tolerant—restarting a task is sufficient

  4. Grid Execution Can Be Hard • Available infrastructure varies • Some infrastructures have steep learning curves • Grid is a mix of batch & interactive systems • Changing Grid environment complicates task/resource mapping • PSAs on Grid require much bookkeeping—task location/status, input file locations, output tracking, etc.

  5. APST Eases PSA Path to Grid • Provides a single interface to multiple infrastructures • Lowers the learning curve • Does intelligent mixing of batch & interactive • Incorporates dynamic resource info into smart (re)scheduling • Handles bookkeeping

  6. Application Tasks Application Data Files APST Example – EOL Project Analysis Apps (e.g., psiblast, 123d) Postprocessing Preprocessing Genome Data Sequences Analysis Output Data Base

  7. horizon.sdsc.edu AIX/LoadLeveler saxicolous.sdsc.edu Linux/PBS morpheus.engin.umich.edu Linux/PBS {multivac/nbcr3/nbcr4/nbcr5/nbcr6}.sdsc.edu Solaris APST Example – EOL Platform

  8. APST System • Daemon (apstd) schedules tasks, stages input, spawns processes, returns output • Client (apst) controls/monitors daemon • Daemon is user agent (single-user) • Resource/task spec via XML

  9. APST Infrastructure Support • Compute: GRAM, SSH • Storage: FTP, GASS, SCP, SFTP, SRB • Batch: Condor, DQS, LL, LSF, PBS, SGE • Meta-data: Ganglia, MDS, NWS, self-generated

  10. APST XML • Used for all resource/task descriptions • Growing familiarity as a “common language” • Availability of editing tools • Design philosophy: easy things should be easy (and brief); hard things should be possible • Primary tags <storage>, <compute>, <tasks> • <files>, <gridinfo> used in special circumstances • Most projects produce XML via application-specific scripts/GUI

  11. APST XML - Storage <storage> <disk id=‘myDisk’ datadir=‘${HOME}/myData’> <ftp|gass|local|sftp|srb server=‘blue.ufo.edu’/> </disk> </storage>

  12. APST XML - Compute <compute> <host id=‘myHost’ disk=‘myDisk’> <globus|local|ssh server=‘blue.ufo.edu’/> <condor|dqs|loadleveler|lsf|pbs|sge|shell/> </host> </compute>

  13. APST XML - Tasks <tasks> <task executable=‘myProgram’ arguments=‘arg1 arg2 arg3’ input=‘infile1 infile2’ output=‘outfile1 outfile2’ /> </tasks>

  14. APST Advanced Features • Site-specific executables/paths • Task priority • Direct tasks to a host/subset of hosts • Task-specific working directories • User estimate of task “cost” for scheduling

  15. Major Projects Using APST • Encyclopedia of Life (eol.sdsc.edu) • This is an ambitious project seeking to catalog the complete proteome of every living species in a flexible, powerful reference system. This includes calculating three-dimensional models and assigning biological function for all recognizable proteins in all currently known genomes. • Mcell (www.mcell.cnl.salk.edu) • This is a general simulator for cellular microphysiology. MCell uses Monte Carlo diffusion and chemical reaction algorithms in 3D to simulate the complex biochemical interactions of molecules inside and outside of living cells.

  16. APST Status • APST v2.2.0, released 4/9/03, includes all of the features discussed • APST v2.3.0, in testing, to include client authentication, faster client response, better task dependency support • Software, tutorial, FAQ, man pages, XML DTD available on-line

  17. APST Spin-Offs • AppleSeeds: Porting and data structure utility library • String manipulation • Command-line parsing • Process/thread control • Hash tables/vectors/property lists • XML parsing/manipulation • ELAGI: Grid infrastructure access library • Remote process spawning and control • Individual and bulk file transfer • Interactive batch node access • Meta-data generation and retrieval

  18. For more information • http://grail.sdsc.edu/projects/{apst,appleseeds,elagi} • apst@sdsc.edu, apst-users@sdsc.edu • jhayes@sdsc.edu, casanova@cs.ucsd.edu

More Related