1 / 8

Experiences Running Seismic Hazard Workflows

Experiences Running Seismic Hazard Workflows. Scott Callaghan Southern California Earthquake Center University of Southern California SC13 Workflow BoF. Overview of SCEC Computational Problem. Domain: Probabilistic Seismic Hazard Analysis (PSHA)

beau
Download Presentation

Experiences Running Seismic Hazard Workflows

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Experiences Running Seismic Hazard Workflows Scott Callaghan Southern California Earthquake Center University of Southern California SC13 Workflow BoF

  2. Overview of SCEC Computational Problem • Domain: Probabilistic Seismic Hazard Analysis (PSHA) • PSHA computation defined as workflow with two parts • “Strain Green Tensor Workflow” • ~10 jobs, largest are 4000 core/hrs • Output: Two (2) twenty (20) GB files = 40GB Total • “Post-processing Workflow” • High throughput • ~410,000 serial jobs, runtimes less than 1 min • Output: 14,000 files totaling 12 GB • Use Pegasus-WMS, HTCondor, Globus • Calculates hazard for 1 location

  3. Recent PSHA Research Calculation Using Workflows – CyberShake Study 13.4 • PSHA calculation exploring impact of alternative velocity models and alternative codes was defined as 1144 workflows • Parallel Strain Green Tensor (SGT) workflow on Blue Waters • No remote job submission support in Spring 2013 • Created basic workflow system using qsub dependencies • Post-processing workflow on Stampede • Pegasus-MPI-Cluster to manage serial jobs • Single job is submitted to Stampede • Master-worker paradigm; work is distributed to workers • Relatively little output, so writes were aggregated in master

  4. CyberShake 13.45 Workflow Performance Metrics • April 17, 2013 – June 17, 2013 (62 days) • Running jobs 60% of the time • Blue Waters Usage – Parallel SGT calculations: • Average of 19,300 cores, 8 jobs • Average queue time: 401 sec (34% job runtime) • Stampede Usage – Many-task post-processing: • Average of 1,860 cores, 4 jobs • 470 million tasks executed (177 tasks/sec) • 21,912 jobs total • Average queue time: 422 sec (127% job runtime)

  5. Constructing Workflow Capabilities using HPC System Services • Our workflows rely on multiple services • Remote gatekeeper • Remote gridftp server • Local gridftp server • Currently issues are detected via workflow failure • Helpful if there were automated checks that everything is operational • Additional factors • Valid X509 proxy • Available disk space

  6. Why Define PSHA Calculations as Multiple Workflows ? • CyberShake Study 13.4 used 1144 workflows • Too many jobs to merge into 1 workflow • Currently, we create 1144 workflows and monitor submission via cronjob

  7. Improving Workflow Capabilities • Workflow tools could provide higher-level organization • Specify set of workflows as a group • Workflows are submitted and monitored • Errors detected and group paused • Easy to obtain status, progress • Statistics are aggregated over the group

  8. Summary of SCEC Workflows Needs • Workflows tools for large calculations on leadership-class systems that do not support remote job submission • Need capabilities to define job dependencies, manage failure and retry, track files using dbms • Light-weight workflow tools that can be included in scientific software releases • For unsophisticated users, the complexity of workflow tools should not outweigh their advantages • Methods to define workflows with interactive (people-in-the-loop) stages • Needed to automate computational pathways that have manual stages • Methods to gather both scientific and computational metadata into databases • Needed to ensure reproducibility and support user interfaces

More Related