1 / 14

VGrADS Programming Tools Research: Vision and Overview

This document provides an overview of the programming tools being developed by VGrADS at Rice University for transparent and efficient grid computing. It discusses the need for abstract views of the grid, workflow scheduling, fault tolerance, platform-independent applications, and performance modeling.

rboss
Download Presentation

VGrADS Programming Tools Research: Vision and Overview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. VGrADS Programming Tools Research: Vision and Overview Ken Kennedy Center for High Performance Software Rice University http://vgrads.rice.edu/site_visit/april_2005/slides/kennedy-tools.pdf

  2. Database Supercomputer Database Supercomputer The VGrADS Vision:National Distributed Problem Solving • Where We Want To Be • Transparent Grid computing • Submit job • Find & schedule resources • Execute efficiently • Where We Are • Low-level hand programming • What Do We Need? • A more abstract view of the Grid • Each developer sees a specialized “virtual grid” • Simplified programming models built on the abstract view • Permit the application developer to focus on the problem

  3. Programming Tools • Focus: Automating critical application-development steps: • Initiating application execution • Optimize and launch application on heterogeneous resources • Scheduling workflow graphs • Heuristics required (problems are NP-complete at best) • Good initial results if accurate predictions of resource performance are available (see EMAN demo) • Constructing performance models • Based on loop-level performance models of the application • Requires benchmarking with (relatively) small data sets, extrapolating to larger cases • Building workflow graphs from high-level scripts • Examples: Python (EMAN), Matlab

  4. Initiating Application Execution • Vision: Transparent to the User with Fault Tolerance • Binaries shipped or preinstalled • Reconfigured where necessary • Data moved to computations • Support for fault tolerance • Support for rescheduling, migration and restart • Research • Separation of concerns between workflow management and virtual grid • Support for fault tolerance and rescheduling/migration • Checkpointing, load projection, performance modeling • Platform-independent application representation

  5. Application Initiation and Management Abstract Workflow Workflow Scheduler Virtual Grid Pegasus Data Discovery Workflow Reduction Resource Discovery Mapping Concrete Workflow DAGMan Virtual Grid Execution System Virtual Grid Resources

  6. Support for Fault Tolerance and Rescheduling • Fault tolerance • Workflows: Use Pegasus mechanisms for recovery • Issue: support for registration of completed steps in vgES • MPI steps: Disk-free checkpointing • See Dongarra talk • Rescheduling • Mechanisms for monitoring and extending virtual grids in vgES • Under development • Issue: Determining whether rescheduling will be profitable • Projection of load on current resources without current application • Models to project performance on current and alternative resources

  7. Platform-Independent Applications • Supporting a Single Application Image for Different Platforms • Translation and optimization tools that map the application onto the hardware in an effective and efficient way • At the high level, resource allocation and scheduling • At the low level, optimization, scheduling, and runtime reoptimization • We are working with the LLVM system (from Illinois) • Produces good code for a variety of hardware platforms • Run-time Reoptimization • The Idea: In response to poor performance, reoptimize • The Vision: To reduce the runtime cost of reoptimization, move analysis and planning to compile time • Example: alternative blocking strategies for a loop nest, with shift triggered by excessive cache misses

  8. Scheduling Workflows • Vision: • Application includes performance models for all workflow nodes • Performance models automatically constructed • Software schedules applications onto Grid in two phases • Virtual grid requirement and acquisition • Model based scheduling on the returned vGrid • Research • Scheduling strategy: Whole workflow • Dramatic makespan reduction • Two-phase scheduling • Exploring trade-offs • Expectation: dramatic reduction in complexity with little loss in performance

  9. Set of resources: 50 rtc nodes at Rice (IA-64) 13 medusa nodes at U of Houston (Opteron) RDV data set Varying scheduling strategy RNP - Random / No PerfModel RAP - Random / Accurate PerfModel HCP - Heuristic / GHz Only PerfModel HAP - Heuristic / Accurate PerfModel EMAN Scheduling:Varying Performance Models

  10. Performance Model Construction • Vision: Automatic performance modeling • From binary for a single resource and execution profiles • Models for all target resources • Research • Uniprocessor modeling • Can be extended to MPI steps • Memory hierarchy behavior • Role of instruction mix • Application-specific models • Scheduling

  11. Object Code Binary Instrumenter Instrumented Code Binary Analyzer BB Counts Communication Volume & Frequency Memory Reuse Distance Control flow graph Loop nesting structure BB instruction mix Post Processing Tool Performance Prediction for Target Architecture Architecture neutral model Scheduler Architecture Description Performance Prediction Overview Dynamic Analysis Execute Static Analysis Post Processing

  12. Modeling Memory Reuse Distance

  13. Execution Behavior: NAS LU 2D 3.0

  14. Building Workflow Graphs • Vision: • Application developer writes in scripting language • Candidates: Python (EMAN), Matlab, S-PLUS/R, LabView • Components represent applications • Software constructs workflow and data movement • Research • Related project: Telescoping languages • Type-based specialization • Type analysis • Plan: Harvest this work for Grid application development • Just getting started

More Related