ms thesis defense dynamic fault tolerant grid workflow in the water threat management project
Download
Skip this Video
Download Presentation
MS Thesis Defense Dynamic Fault Tolerant Grid Workflow in the Water Threat Management Project

Loading in 2 Seconds...

play fullscreen
1 / 34

MS Thesis Defense Dynamic Fault Tolerant Grid Workflow in the Water Threat Management Project - PowerPoint PPT Presentation


  • 72 Views
  • Uploaded on

MS Thesis Defense Dynamic Fault Tolerant Grid Workflow in the Water Threat Management Project. Young Suk Moon Chair: Dr. Hans-Peter Bischof Reader: Dr. Gregor von Laszewski Observer: Dr. Minseok Kwon. Outline. Introduction to Water Threat Management Project Motivation

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' MS Thesis Defense Dynamic Fault Tolerant Grid Workflow in the Water Threat Management Project' - kiora


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
ms thesis defense dynamic fault tolerant grid workflow in the water threat management project

MS Thesis DefenseDynamic Fault Tolerant Grid Workflowin the Water Threat Management Project

Young Suk Moon

Chair: Dr. Hans-Peter Bischof

Reader: Dr. Gregor von Laszewski

Observer: Dr. Minseok Kwon

outline
Outline
  • Introduction to Water Threat Management Project
  • Motivation
  • Research Objectives
  • Fault-Tolerant Queue
  • Evaluation
  • Conclusion
water threat management
Water Threat Management
  • Motivation
    • Urban Water Distribution Systems (WDSs) can be an easy target of terror attacks - e.g. contaminating the water.
  • Methods
    • Detect contamination using sensors located across the WDSs.
    • Run algorithms (developed by NCSA) to determine the sensor locations to minimize the searching time to find the contaminant source locations (sensors are expensive).
water threat management1
Water Threat Management
  • Requirements
    • Time sensitive
    • Massive calculation
    • Dynamic adaptation to a Grid environment
    • Fault tolerance
  • Our goals
    • The current system is not fault-tolerant.
    • Develop a fault-tolerant framework and increase performance in the faulty environment.
motivation 1 resource outages
Motivation – (1) Resource Outages
  • TeraGrid resource outages during 2009.

TeraGrid User & System News (http://news.teragrid.org/)

motivation 1 resource outages1
Motivation – (1) Resource Outages
  • Outage Rate (total outage time / year) in 2009

TeraGrid User & System News (http://news.teragrid.org/)

motivation 1 resource outages2
Motivation – (1) Resource Outages
  • WTM deployment problem with outages

TeraGrid User & System News (http://news.teragrid.org/)

research objectives
Research Objectives
  • Develop a fault-tolerant framework dealing with resource outages
    • Strategy: generation distribution on multiple sites
  • Reduce queue wait time
    • Strategy: dynamic job dependency
water threat management application
Water Threat Management Application
  • Sequential & parallel processing
generation distribution
Generation Distribution
  • Divide generations into multiple parts as multiple jobs.
generation distribution1
Generation Distribution
  • File communication
dynamic job dependency
Dynamic Job Dependency
  • Problems of generation distribution on multiple sites
    • Additional queue wait times
      • Each job is dependent on another.
      • Cannot submit a job before the prior job finishes.
  • Solution: determine job dependency at run time.
    • Submit jobs at the same time.
    • Any job start first computes the first set of generations
fault tolerant queue
Fault-tolerant Queue
  • Most common fault-tolerant strategies in a Grid
    • Replication
    • Checkpointing
  • Limitation of checkpointing with time-criticality
    • Checkpointing performance degradation
    • Checkpointing may not be compatible on a different site (heterogeneity)
    • Cannot reschedule job on the same site in case of site outage
  • Choosing the replication strategy within the fault-tolerant queue
fault tolerant queue design1
Fault-tolerant Queue Design
  • Components
    • Command Line Interface
    • Task Pool
    • Resource Pool
    • Scheduler
    • Resource Checker (intergration with the TeraGrid Information Services)
fault tolerant queue design2
Fault-tolerant Queue Design
  • Fault detection
    • Message from Grid Resource Allocation and Management (GRAM) in the Globus Toolkit
      • Communicate with GRAM to detect job failure
    • TeraGrid Information Services
      • GRAM service may fail when the resource is down
      • Publishes XML documents containing the outage information
evaluation wtm performance
Evaluation – WTM performance
  • WTM application performance (generation)
evaluation queue wait time
Evaluation – Queue Wait Time
  • Queue wait time statistics
evaluation overhead
Evaluation - Overhead
  • Performance overhead
    • Integrating a fault-tolerant framework usually causes performance degradation
    • No performance loss in our framework
slide24

Evaluation – Workflow Performance

  • Different type of workflow run time comparison
    • Original deployment VS. fault-tolerant deployment
    • Dynamic job dependency VS. static job dependency
    • Test each type of deployment in the real Grid system including queue wait time
evaluation workflow performance
Evaluation – Workflow Performance
  • Setup points
    • What to measure
      • Job run time + queue wait time
    • 4 different types of deployment
      • Original on Abe
      • Original on Big Red
      • Static fault-tolerant workflow on Abe + Big Red
      • Dynamic fault-tolerant workflow on Abe + Big Red
    • 6 different jobs
      • 6 = 1 (original) + 1 (original) + 2 (static) + 2 (dynamic)
evaluation workflow performance1
Evaluation – Workflow Performance
  • Setup points
    • “Submit” 4 different deployments at the same time
      • 5 jobs are submitted at the same time (1 job is for static workflow).
    • Repeat this at different times
    • The queue wait times will make different results
evaluation workflow performance2
Evaluation – Workflow Performance
  • Workflow comparison results
simulation run time comparison
Simulation – Run Time Comparison
  • Average run time
    • Statistical model for the original WTM deployment

t: run time of a job, p: failure rate, q: avg. queue wait time

    • Statistical model for the dynamic WTM deployment

k: number of jobs, qi: avg. queue wait time of ith job, ti: run time of ith job

simulation run time comparison1
Simulation – Run Time Comparison
  • Results (queue wait time + job run time + “failure” time)
simulation worst case run time comparison
Simulation – Worst Case Run Time Comparison
  • A threat management system must deliver results in any circumstances.
  • Thus, a run time of the worst case is a critical factor in the Water Threat Management system.
simulation worst case run time comparison1
Simulation – Worst Case Run Time Comparison
  • Simulation setup
  • Use the 2009 TeraGrid outage data for this simulation
  • Submit jobs every 5 minutes during 2009 and compare the worst case run time between the original deployment and the dynamic workflow deployment
conclusion
Conclusion
  • In general, the dynamic fault-tolerant workflow has similar performance to the performance of the original deployment.
  • However, the dynamic workflow ofthe worst case scenario has much better performance than the performance of the worst case scenario of the original deployment.
ad