Optimizing for time and space in distributed scientific workflows
Download
1 / 29

Optimizing for Time and Space in Distributed Scientific Workflows - PowerPoint PPT Presentation


  • 113 Views
  • Uploaded on

Optimizing for Time and Space in Distributed Scientific Workflows. Ewa Deelman University of Southern California Information Sciences Institute. Generating mosaics of the sky (Bruce Berriman, Caltech). *The full moon is 0.5 deg. sq. when viewed form Earth, Full Sky is ~ 400,000 deg. sq.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Optimizing for Time and Space in Distributed Scientific Workflows' - maeve


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Optimizing for time and space in distributed scientific workflows

Optimizing for Time and Space in Distributed Scientific Workflows

Ewa Deelman

University of Southern California

Information Sciences Institute

Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu


Generating mosaics of the sky bruce berriman caltech
Generating mosaics of the sky (Bruce Berriman, Caltech) Workflows

Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu

*The full moon is 0.5 deg. sq. when viewed form Earth, Full Sky is ~ 400,000 deg. sq.


Issue Workflows

  • How to manage such applications on a variety of resources and distributed resources?

    Approach

  • Structure the application as a workflow

    • Allow the scientist to describe the application at a high-level

  • Tie the execution resources together

    • Using Condor and/or Globus

  • Provide an automated mapping and execution tool to map the high-level description onto the available resources

Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu


Specification place y f x at l
Specification: Place Y = F(x) at L Workflows

  • Find where x is--- {S1,S2, …}

  • Find where F can be computed--- {C1,C2, …}

  • Choose c and s subject to constraints (performance, space availability,….)

  • Move x from s to c

    • Move F to c

  • Compute F(x) at c

  • Move Y from c to L

  • Register Y in data registry

  • Record provenance of Y, performance of F(x) at c

Error! x was not at s!

Error! F(x) failed!

Error! c crashed!

Error! there is not enough space at L!

Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu


Pegasus workflow management system
Pegasus-Workflow Management System Workflows

  • Leverages abstraction for workflow description to obtain ease of use, scalability, and portability

  • Provides a compiler to map from high-level descriptions to executable workflows

    • Correct mapping

    • Performance enhanced mapping

  • Provides a runtime engine to carry out the instructions

    • Scalable manner

    • Reliable manner

  • In collaboration with Miron Livny, UW Madison, funded under NSF-OCI SDCI

    Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu


    Pegasus workflow management system1
    Pegasus Workflow Management System Workflows

    A decision system that develops strategies for reliable and efficient execution in a variety of environments

    Pegasus mapper

    DAGMan

    Reliable and scalable execution of dependent tasks

    Condor Schedd

    Reliable, scalable execution of independent tasks (locally, across the network), priorities, scheduling

    Abstract Workflow

    Results

    Cyberinfrastructure: Local machine, cluster, Condor pool, OSG, TeraGrid

    Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu


    Basic workflow mapping
    Basic Workflow Mapping Workflows

    Select where to run the computations

    Apply a scheduling algorithm

    HEFT, min-min, round-robin, random

    The quality of the scheduling depends on the quality of information

    Transform task nodes into nodes with executable descriptions

    Execution location

    Environment variables initializes

    Appropriate command-line parameters set

    Select which data to access

    Add stage-in nodes to move data to computations

    Add stage-out nodes to transfer data out of remote sites to storage

    Add data transfer nodes between computation nodes that execute on different resources

    Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu


    Basic workflow mapping1
    Basic Workflow Mapping Workflows

    Add nodes that register the newly-created data products

    Add data cleanup nodes to remove data from remote sites when no longer needed

    reduces workflow data footprint

    Provide provenance capture steps

    Information about source of data, executables invoked, environment variables, parameters, machines used, performance

    Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu


    Pegasus workflow mapping
    Pegasus Workflow Mapping Workflows

    4

    8

    3

    7

    Resulting workflow mapped onto 3 Grid sites:

    11 compute nodes (4 reduced based on available intermediate data)

    12 data stage-in nodes

    8 inter-site data transfers

    14 data stage-out nodes to long-term storage

    14 data registration nodes (data cataloging)

    9

    12

    10

    15

    13

    60 jobs to execute

    1

    4

    Original workflow: 15 compute nodes

    devoid of resource assignment

    5

    8

    9

    10

    12

    13

    15

    Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu


    Time optimizations during mapping
    Time Optimizations during Mapping Workflows

    • Node clustering for fine-grained computations

      • Can obtain significant performance benefits for some applications (in Montage ~80%, SCEC ~50% )

    • Data reuse in case intermediate data products are available

      • Performance and reliability advantages—workflow-level checkpointing

    • Workflow partitioning to adapt to changes in the environment

      • Map and execute small portions of the workflow at a time

    Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu


    Ligo laser interferometer gravitational wave observatory
    LIGO: (Laser Interferometer Gravitational-Wave Observatory) Workflows

    • Aims to detect gravitational waves predicted by Einstein’s theory of relativity.

    • Can be used to detect

      • binary pulsars

      • mergers of black holes

      • “starquakes” in neutron stars

    • Two installations: in Louisiana (Livingston) and Washington State

      • Other projects: Virgo (Italy), GEO (Germany), Tama (Japan)

    • Instruments are designed to measure the effect of gravitational waves on test masses suspended in vacuum.

    • Data collected during experiments is a collection of time series (multi-channel)

    Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu


    Ligo laser interferometer gravitational wave observatory1
    LIGO: (Laser Interferometer Gravitational-Wave Observatory) Workflows

    LIGO Livingston

    • Aims to detect gravitational waves predicted by Einstein’s theory of relativity.

    • Can be used to detect

      • binary pulsars

      • mergers of black holes

      • “starquakes” in neutron stars

    • Two installations: in Louisiana (Livingston) and Washington State

      • Other projects: Virgo (Italy), GEO (Germany), Tama (Japan)

    • Instruments are designed to measure the effect of gravitational waves on test masses suspended in vacuum.

    • Data collected during experiments is a collection of time series (multi-channel)

    Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu


    Example of ligo s computations
    Example of Workflows LIGO’s computations

    • Binary inspiral analysis

    • Size of analysis for meaningful results

      • at least 221 GBytes of gravitational-wave data

      • approximately 70,000 computational tasks

    • Desired analysis:

      • Data from November 2005--November 2006

        • 10TB of input data

      • Approximately 185,000 computations edges

        • 1 Tb of output data

    Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu


    Ligo s computational resources
    LIGO’s computational resources Workflows

    • LIGO Data Grid

      • Condor clusters managed by the collaboration

      • ~ 6,000 CPUs

    • Open Science Grid

      • A US cyberinfrastructure shared by many applications

      • ~ 20 Virtual Organizations

      • ~ 258 GB of shared scratch disk space on OSG sites

    Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu


    Problem
    Problem Workflows

    • How to “fit” the computations onto the OSG

      • Take into account intermediate data products

      • Minimize the data footprint of the workflow

      • Schedule the workflow tasks in a disk-space aware fashion

    Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu


    Workflow footprint
    Workflow Footprint Workflows

    • In order to improve the workflow footprint, we need to determine when data are no longer needed:

      • Because data was consumed by the next component and no other component needs it

      • Because data was staged-out to permanent storage

      • Because data are no longer needed on a resource and have been stage-out to the resource that needs it

    Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu


    Cleanup disk space as workflow progresses
    Cleanup Disk Space as Workflow Progresses Workflows

    • For each node add dependencies to cleanup all the files used and produced by the node

    • If a file is being staged-in from r1 to r2, add a dependency between the stage-in and the cleanup node

    • If a file is being staged-out, add a dependency between the stage-out and the cleanup node

    Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu


    Experiments on the grid with ligo and montage

    Experiments on the Grid Workflows with LIGOand Montage

    Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu


    Cleanup on the grid montage application
    Cleanup on the Grid, Montage application Workflows

    ~ 1,200 nodes

    1.25GB versus 4.5 GB

    Open Science Grid

    Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu


    LIGO Inspiral Analysis Workflow Workflows

    Small test workflow, 166 tasks, 600 GB max total storage (includes intermediate data products)

    LIGO workflow running on OSG

    Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu


    Opportunities for data cleanup in ligo workflow
    Opportunities for data cleanup in LIGO workflow Workflows

    Assumes level-based scheduling, all nodes at a level need to complete before the next level starts

    Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu


    Montage workflow
    Montage Workflow Workflows

    Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu


    26% Workflows

    Improvement

    In disk space

    Usage

    50% slower runtime

    LIGO Workflows

    Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu


    26% Workflows

    Improvement

    In disk space

    Usage

    50% slower runtime

    LIGO Workflows

    Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu


    LIGO Workflows Workflows

    56% improvement

    in space usage

    3 times slower in runtime

    Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu


    Challenges in implementing data space aware scheduling
    Challenges in implementing data space-aware scheduling Workflows

    a

    0

    b

    b

    1

    2

    d

    c

    c

    3

    4

    5

    e

    h

    f

    6

    g

    • Difficult to get accurate performance estimates for tasks

    • Difficult to get good estimates of the sizes of the output data

      • Errors compound in the workflow

    • Difficult to get accurate estimates of data storage space

      • Space is shared among many users

      • Hard to get allocation estimates available to users

      • Even if you have space when you schedule, may not be there to receive all the data

    • Need space allocation support

    Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu


    Conclusions
    Conclusions Workflows

    • Data are an important part of today’s applications and need to be managed

    • Optimizing workflow disk space usage

      • Data workflow footprint concept applicable within one resource

      • Data-aware scheduling across resources

    • Designed an algorithm which can cleanup the data as a workflow progresses

      • The effectiveness of the algorithm depends on the structure of the workflow and its data characteristics

    • Workflow restructuring may be needed to decrease footprint

    Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu


    Acknowledgments
    Acknowledgments Workflows

    • Henan Zhao, Rizos Sakellariou

      • University of Manchester, UK

    • Kent Blackburn, Duncan Brown, Stephen Fairhurst, David Meyers

      • LIGO, Caltech, USA

    • G. Bruce Berriman, John Good, Daniel S. Katz

      • Montage, Caltech and LSU, USA

    • Miron Livny and Kent Wenger

      • DAGMan, UW Madison

    • Gurmeet Singh, Karan Vahi, Arun Ramakrishnan, Gaurang Mehta

      • USC Information Science Institute, Pegasus

    Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu


    Relevant links
    Relevant Links Workflows

    Condor: www.cs.wisc.edu/condor

    Pegasus: pegasus.isi.edu

    LIGO: www.ligo.caltech.edu/

    Montage: montage.ipac.caltech.edu/

    Open Science Grid: www.opensciencegrid.org

    Workflows for e-Science

    I.J. Taylor, E. Deelman, D. B. Gannon

    M. Shields (Eds.), Springer, Dec. 2006

    NSF Workshop on Challenges of Scientific Workflows : www.isi.edu/nsf-workflows06,

    E. Deelman and Y. Gil (chairs)

    OGF Workflow research group

    www.isi.edu/~deelman/wfm-rg

    Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu


    ad