1 / 12

Workflow Tools Used in the SCEC CyberShake Project

Workflow Tools Used in the SCEC CyberShake Project. Scott Callaghan Southern California Earthquake Center University of Southern California Gateway Workflow Survey December 11, 2009. CyberShake Science. What will peak ground motion be over the next 50 years?

Download Presentation

Workflow Tools Used in the SCEC CyberShake Project

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Workflow Tools Used in the SCEC CyberShake Project Scott Callaghan Southern California Earthquake Center University of Southern California Gateway Workflow Survey December 11, 2009

  2. CyberShake Science • What will peak ground motion be over the next 50 years? • Used in building codes, insurance, government, planning • Probabilistic Seismic Hazard Analysis (PSHA) • Communicated via hazard curves and maps 2% in 50 years 0.6 g Curve for downtown LA Probability of exceeding 0.1g in 50 yrs

  3. Computational Requirements(per science site) SGT Creation Post Processing

  4. CyberShake dependencies Seismogram synthesis PSA extract SGT Seismogram synthesis PSA Mesh generation SGT simulation . . . . . . . . . extract SGT Seismogram synthesis PSA x1 x2 x7,000 x415,000 x415,000

  5. Software Requirements • High throughput • Large number of short-running jobs • Data management • 840,000 output files per science site • Stage-in and -out • Resource provisioning • Acquire grid resources for execution • Possibly multiple execution sites • Error identification and recovery • Use community account

  6. Workflow Tools • Pegasus/Condor/Globus stack • Create workflow description in Pegasus • Abstract workflow (DAX) with logical names • Plan workflow for specific execution site • Concrete workflow (DAG) with physical paths • Adds stage-in and stage-out of data • Wraps in kickstart • Can be mined with NetLogger toolkit • Bundles multiple tasks into single job • Easier to change execution sites

  7. Workflow Tools (2) • Submit workflow via Pegasus • Submits to Condor DAGMan • Tracks dependencies • Matches jobs to resources • Jobs enter queue on local host • Communicates via Globus to remote system • On job failure • Retry job • Write rescue DAG as checkpoint

  8. Software Requirements Fulfilled • High throughput • Data management • Pegasus adds staging automatically, tracks input and output files for jobs • Resource provisioning • Condor supports remote submission to batch queue • Also glideins (temporary Condor pool) • Error identification and recovery • Automatic retries, rescue DAGes

  9. CyberShake Map Calculation • Calculated Southern California hazard map on Ranger (223 sites) • 1200 wallclock hours • 189 million tasks (43 tasks/sec) • 3.9 million Condor jobs • Averaged 4424 cores, 14544 peak (23% of Ranger) • 2.1 TB and 36,000 staged (zipped) output files

  10. Experiences from Map Calculation • For CyberShake, configurability is very important • Adjusted many Condor scheduler parameters • Modified Pegasus bundling factors • Used sub-workflows to manage load • Experimenting with priorities • So is automation • Set up workflow hierarchies to submit multiple workflows • Cron job monitored queue to submit new science sites when needed • Corral automated glidein submission and monitoring

  11. Experiences (cont.) • Can be hard to understand impact of parameters • Performing parameter studies • With high job counts, takes some work to reduce local and remote load • Bundling, glideins, breaking up workflows • A higher-level monitoring tool is needed • Too many log files to tail • Designed a Run Manager to track workflow status

  12. Thanks!

More Related