Text

Distributed Applications:Examining the Past Understanding the Present Preparing for the Future(Grid) Text Shantenu Jha Director, Cyber-Infrastructure Development, CCT Computer Science e-Science Institute, Edinburgh http://www.cct.lsu.edu/~sjha http://saga.cct.lsu.edu

Outline • Critical Perspective on Large-Scale Distributed Applications and Production Cyber-Infrastructure (CI) • Understanding Distributed Applications (DA) • Differ from HPC or || App, Challenges of DA • DA Development Objectives (IDEAS) • Understanding SAGA • Using SAGA to develop Distributed Applications • Frameworks • Abstractions for Dynamic Execution • Data-Intensive Applications • Discuss how IDEAS are met • Derive (Initial) User Requirements/Requests for FutureGrid Text

Critical Perspectives • Distributed CI: Is the whole > than the sum of the parts? • Several BIG Projects have success stories on TG • But REAL Science happens at ALL SCALES • Tools for the individual users to innovate and develop? • Infrastructure capabilities and policy determine Applications development, deployment and execution: • Proportion of App. that utilize multiple distributed sites sequentially, concurrently or asynchronously is low (~5%) • Not referring to tightly-coupled across multiple-sites • TG (exclusively) supported legacy, static execution models • Move data to computing  Compute where the data is? • Distributed Data/Jobs vs Bringing it all into the Cloud • What novel applications & science has Distributed CI fostered?

Understanding Distributed Applications Development Challenges • Fundamentally a hard problem: • Dynamical Resource, Heterogeneous resources • Variable Control (or lack thereof) • Add to it: Complex underlying infrastructure provisioning • Programming Systems for Distributed Applications: • Incomplete? Customization? Extensibility? • Computational Models of Distributed Computing • Design Points: More than (peak) performance • Primary role of Usage Modes • Range of DA, no clear taxonomy Text

Understanding Distributed ApplicationsDevelopment Challenges • Distributed Applications Require: • Coordination over Multiple & Distributed sites: • Scale-up and Scale-out • Logically or physically Distributed • 1st Gen of Peta/Exa/Zetta/Yotta -- Applications requiring multiple-runs, ensembles, workflows.. • Core characteristics and challenges of logically and physically distributed applications are SAME • Inter-play of Requirements, Infrastructure, Usage Mode Ability to develop simple, novel or effective distributed Applications lags behind other aspects of CI General purpose Distributed Application Development Lacking in NSF/OCIs portfolio….

Understanding Distributed Applications Development Objectives • Interoperability: Ability to work across multiple distributed resources • Distributed Scale-Out: The ability to utilize multiple distributed resources concurrently • Extensibility: Support new patterns/abstractions, different programming systems, functionality & Infrastructure • Adaptivity: Response to fluctuations in dynamic resource and availability of dynamic data • Simplicity: Accommodate above distributed concerns at different levels easily… Challenge: How to develop DA effectively and efficiently with the above as first-class objectives?

SAGA: Basic Philosophy • There exists a lack of Programmatic approaches that: • Provide general-purpose common grid functionality for applications and thus hide underlying complexity, varying semantics.. • The building blocks upon which to construct “consistent” higher-levels of functionality and abstractions • Hides “bad” heterogeneity, means to address “good” heterogeneity • Meets the need for a Broad Spectrum of Application: • Simple scripts, Gateways, Smart Applications and Production Grade Tooling, Workflow… • Simple, integrated, stable, uniform and high-level interface • Simple and Stable: 80:20 restricted scope and Standard • Integrated: Similar semantics & style across • Uniform: Same interface for different distributed systems • SAGA: Provides Application* developers with basic unit required to compose high-level functionality across (distinct) distributed systems (*) One Person’s Application is another Person’s Tool Text

SAGA: The Standard Landscape Text

SAGA: In a thousand words..

SAGA: Job Submission Role of Adaptors (middleware binding)‏ Text

SAGA Job API: Example

SAGA: Other Packages

SAGA and Distributed Applications

SAGA-based Frameworks: Types • Frameworks: Logical structure for Capturing Application Requirements, Characteristics & Patterns • Runtime and/or Application Framework • Application Frameworks designed to either: • Pattern: Commonly recurring modes of computation • Programming, Deployment, Execution, Data-access.. • MapReduce, Master-Worker, H-J Submission • Abstraction: Mechanism to support patterns and application characteristics • Runtime Frameworks: • Load-Balancing – Compute and Data Distribution • SAGA-based Framework: Infrastructure-independent

Abstractions for Dynamic Execution (1) Container Task Adaptive: Type A: Fix number of replicas; vary cores assigned to each replica. Type B: Fix the size of replica, vary number of replicas (Cool Walking) -- Same temperature range (adaptive sampling) -- Greater temperature range (enhanced dynamics)

Abstractions for Dynamic Execution (2)SAGA Pilot-Job (BigJob)

Coordinate Deployment & Scheduling of Multiple Pilot-Jobs

Distributed Adaptive Replica Exchange (DARE)Scale-Out, Dynamic Resource Allocation and Aggregation

Multi-Physics Runtime FrameworksExtensibility • Coupled Multi-Physics require two distinct, but concurrent simulations • Can co-scheduling be avoided? • Adaptive execution model: Yes • Load-balancing required. • Pilot-Job facilitates LB! • Across sites? (open Q) • First demonstrated multi-platform Pilot-Job: • MPI-based TG – Condor GI

Dynamic ExecutionReduced Time to Solution

Ensemble Kalman FiltersHeterogeneous Sub-Tasks • Ensemble Kalman filters (EnKF), are recursive filters to handle large, noisy data; use the EnKF for history matching and reservoir characterization • EnKF is a particularly interesting case of irregular, hard-to-predict run time characteristics:

Using more machines decreases the TTC and variation between experiments Using BQP decreases the TTC & variation between experiments further Lowest time to completion achieved when using BQP and all available resources Results: Scale-Out Performance Khamra & Jha, GMAC, ICAC’09

But Why does BQP Help? The Case for System Senors

Autonomic Integration of HPC Grids-Clouds EnKF: Extensibility and Interoperabilty (work with M. Parashar et al. Accepted for e-Science 2009) • Application Objectives: • Acceleration • Resilience • Conservation • Pull vs Push Task map

Application-level InteroperabilityCloud-Cloud; Cloud-Grid • Application-level (ALI) vs. System-level Interoperability (SLI) • Infrastructure Independence is Pre-requisite for ALI • The case for both Grids AND Clouds: • Hybrid & Heterogeneous workload: data-compute affinity differ • Availability zone, Data-transfer cost.. • Complex data-flow dependency: need runtime determination • Just because you can use Grids AND Clouds, should you ? Important Research Question: When should you? • Runtime Decision:Mechanism to determine when/if ? • Should be influenced by Application Objectives • Programming Model should be Infrastructure independent • Same application, priced differently, for same performance • Same application, priced same, for different performance

SAGA-based Frameworks: Examples • SAGA-based Pilot-Job Framework (FAUST) • Extend to support Load-balancing for multi-components • SAGA MapReduce Framework: • Control the distribution of Tasks (workers) • Master-Worker: File-Based &/or Stream-Based • Data-locality optimization using SAGA’s replica API • SAGA NxM Framework: • Compute Matrix Elements, each is a Task • All-to-All Sequence comparison • Control the distribution of Tasks and Data • Data-locality optimization via external (runtime) module

Distributed Data Intensive ApplicationsResearch Challenges • Goal: Develop DDI scientific applications to utilize a broad range of distributed systems, without vendor lock-in, or disruption, yet with the flexibility and performance that scientific applications demand. • Frameworks as possible solutions • Frameworks address some primary challenges in developing Distributed DI Applications • Coordination of distributed data & computing • Runtime (Dynamic) scheduling, placement • Fault-tolerance • Many Challenges in developing such Frameworks: • What are the components? How are they coupled? Functionality expressed/exposed? Coordination? • Layering, Ordering, Encapsulations of Components • “Just because you use can’t use MPI (on distributed systems), doesn’t mean you can’t use other approaches”

Frameworks: Logical ordering SAGA

Frameworks: Logical ordering

SAGA-MapReduce(Miceli, Jha et al CCGrid’09; Merzky, Jha et al GPC’09) • Interoperability: Use multiple infrastructure concurrently • Control the NW placement • Simple staging of data • SAGA-Sphere-Sector: • Open Cloud Consortium • Stream processing model • Ongoing work • Apply to all elements (files) in a data-set (stream) Ts: Time-to-solution, including data-staging for SAGA-MapReduce (simple file-based mechanism)

Controlling Relative Compute-Data Placement

SAGA All-Pairs: Runtime Data Placement • Classical: Place task on 4 LONI machines (512px Dell Clusters) • Simple data staging • “Intelligent”: Map a task to a resource based upon Cost • Cost = Data Dependency + transfer times (latency) • “Ignoring Intelligent mapping is no longer an option” • Quote (undergraduate) Miceli 

Understanding Distributed Applications Development Objectives Redux • Interoperability: Ability to work across multiple distributed resources • SAGA: Middleware Agnostic • Distributed Scale-Out: The ability to utilize multiple distributed resources concurrently • Support Multiple Pilot-Jobs: Ranger, Abe, QB • Extensibility: Support new patterns/abstractions, different programming systems, functionality & Infrastructure • Pilot-Job also Coupled CFD-MD, Integrated BQP • Adaptivity: Response to fluctuations in dynamic resource and availability of dynamic data • Simplicity: Accommodate above distributed concerns at different levels easily…

Does SAGA Provide A Fresh Perspective?

Early User: An Environment that Supports • Echo what Andrew Grimshaw said!! • e.g., test-bed for Standards interoperation • Trivial Remarks: • Not obsessed with system utilization like TG • Policies that support IDEAS as first-class concerns • Support Dynamic, First-Principles Explicitly Distributed App. • Dynamic, Adaptive Applications: • Dynamic Resource Utilization: • e.g BQP (Jha et al, GMAC, ICAC Barcelona 2009) • Grid Observatory (EGEE) – all kinds of Traces • Dynamic Adaptive Data: • Network Aware Application (Jha et al, IEEE eScience ’07) • Data Scheduler: Big Data, Frequent Data

Early User: An Environment that Supports • Autonomic Computational Science Applications • Support the tuning of and by Applications • Platform for developing (SAGA) AF and RT Frameworks • Design, Stand-up and Experiment with Frameworks • eg load-balancer for dynamic resource allocation • SAGA-MapReduce, NxM • eg Control Relative Placement of Data/Compute • Supporting Distributed Abstractions – Development, Deployment and Execution-level • A controlled but realistic environment • RAIN – Dynamic Provisioning (provide clean API) • (Reproducible) Experimental Manager, VAMPIR • [Connection with Grid Observatory]

SAGA-based Tools and ProjectsOne person’s Tool is another person’s Application • DESHL • DEISA-based Shell and Workflow library • JSAGA from IN2P3 (Lyon) • http://grid.in2p3.fr/jsaga/index.html • GANGA-DIANE • gLite • XtreemOS (Based upon SAGA for the Distribution) • NAREGI/KEK • SD Specification • With gLite adaptors Advantage of Standards Text

Acknowledgements SAGA Team and DPA Team and the UK-EPSRC (UK EPSRC: DPA, OMII-UK , OMII-UK PAL) People: SAGA D&D: Hartmut Kaiser, Ole Weidner, Andre Merzky, Joohyun Kim, Lukasz Lacinski, João Abecasis, Chris Miceli, Bety Rodriguez-Milla SAGA Users: Andre Luckow, Yaakoub el-Khamra, Kate Stamou, Cybertools (Abhinav Thota, Jeff, N. Kim), Owain Kenway Google SoC: Michael Miceli, Saurabh Sehgal, Miklos Erdelyi Collaborators and Contributors: Steve Fisher & Group, Sylvain Renaud (JSAGA), Go Iwai & Yoshiyuki Watase (KEK) DPA: Dan Katz, Murray Cole, Manish Parashar, Omer Rana, Jon Weissman

Abstractions for Distributed Applications and Systems: A Computational Science Perspective Authors: S Jha, D Katz, M Parashar, O Rana, J Weissman Upcoming Book by Wiley (Summer 2010)

SAGA: Building the abstractions to Bridge the Infrastructure-Applications Gap Focus on Application Development and Characteristics, not infrastructure details

Interoperability

DAG based Workflow ApplicationsExtensibility Approach Application Development Phase Generation & Exec. Planning Phase Execution Phase

SAGA-based DAG ExecutionPreserving Performance

Text

Text

Presentation Transcript

Information Text – Text Features

Text text text.

Text

Text

Text

text

Text

Text

Text Text Text Character limit = 994 characters

Text Text Text Text Text Text Text Text Text Text Text Text Text Text

Your text Your text Your text Your text

Informational Text – Text Features

Text to text links

Text-to-Text Generation

Title Text Title Text Title Text Title Text Title Text Title Text Title Text Title Text

Title Text Title Text Title Text Title Text Title Text Title

Information Text – Text Features

What he/she currently does: text Text Text Text

Text here Text here Text here Text here Text here Text here Text here Text here Text here

Text Text Text

Information Text – Text Features