Understanding Distributed Applications: Challenges and Development Objectives
410 likes | 507 Views
Explore the critical perspectives, challenges, and development objectives of distributed applications with a focus on the SAGA framework. Understand the application usage modes and the role of explicit vs. implicit approaches. Discover the basic philosophy and goals of SAGA for effective application development.
Understanding Distributed Applications: Challenges and Development Objectives
E N D
Presentation Transcript
SAGA-based Frameworks: Supporting Application Usage Modes Text Shantenu Jha Director, Cyber-Infrastructure Development, CCT Asst Research Professor, CS e-Science Institute, Edinburgh http://www.cct.lsu.edu/~sjha http://saga.cct.lsu.edu
Outline (1) • Understanding Distributed Applications (DA) • Differ from HPC or || App, Challenges of DA • DA Development Objectives (IDEAS) • Understanding SAGA (and the SAGA-Landscape) • Rough Taxonomy of Distributed Applications • Using SAGA to develop Distributed Applications • Examples: Application & Application Frameworks • Discuss how IDEAS are met • Some SAGA-based Tools and Projects • Adv. Of Standards • Derive (Initial) User Requirements for FutureGrid Text
Understanding Distributed ApplicationsCritical Perspectives • The number of applications that utilize multiple sites sequentially, concurrently or asynchronously is low (~5%): • Not referring to tightly-coupled across multiple-sites • Distributed CI: Is the whole > than the sum of the parts? • Managing data and applications across multiple resources is (increasingly) hard: • Distributed Data/Jobs vs Bring it to the Computing • Compute where data is or Data to where computing is • Challenges qualitatively and quantitatively set to get worse: • Increasing complexity, heterogeneity and scale
Understanding Distributed Applications • Distributed Applications Require: • Coordination over Multiple & Distributed sites: • Scale-up and Scale-out • Peta/Exa/Atta - Scientific Applications requiring multiple-runs, ensembles, workflows etc. • Core characteristics of logically and physically distributed applications are the SAME • Application Usage Mode: • Composed using Application as the UNIT of execution • Not a workflow (i.e., composed using control and data flow) • Usage Mode: Closer to an Abstract Workflow (template) • Examples: Run once; or Set of copies of an application with varied input data (Ensemble); Loosely-Coupled ensembles..
Understanding Distributed Applications Development Challenges • Fundamentally a hard problem: • Dynamical Resource, Heterogeneous resources • Add to it: Complex underlying infrastructure • Programming Systems for Distributed Applications: • Incomplete? Customization? Extensibility? • What should end-user control? Must control? • Computational Models of Distributed Computing • Range of DA, no clear taxonomy • More than (peak) performance • Application Usage Mode • Inter-play of Application, Infrastructure, Usage Mode Text
Understanding Distributed ApplicationsImplicit vs Explicit ? • Which approach (implicit vs explicit) is used depends: • How the application is used? • Need to control/marshall more than one resource? • Why distributed resources are being used? • How much can be kept out of the application? • Can’t predict in advance? • Not obvious what to do, application-specific metric • If possible, Applications should not be explicitly distributed • GATEWAYS approach: • Implicit for the end-users • Supporting Applications? Or Application Usage Modes?
Understanding Distributed Applications Development Objectives • Interoperability: Ability to work across multiple distributed resources • Distributed Scale-Out: The ability to utilize multiple distributed resources concurrently • Extensibility: Support new patterns/abstractions, different programming systems, functionality & Infrastructure • Adaptivity: Response to fluctuations in dynamic resource and availability of dynamic data • Simplicity: Accommodate above distributed concerns at different levels easily… Challenge: How to develop DA effectively and efficiently with the above as first-class objectives?
SAGA: Basic Philosophy • There exists a lack of Programmatic approaches that: • Provide general-purpose common grid functionality for applications and thus hide underlying complexity, varying semantics.. • Hides “bad” heterogeneity, means to address “good” heterogeneity • Building blocks upon which to construct higher-levels of functionality and abstractions • Meets the need for a Broad Spectrum of Application: • Simple Distributed Scripts, Gateways, Smart Applications and Production Grade Tooling, Workflow… • Simple, integrated, stable, uniform and high-level interface • Simple and Stable: 80:20 restricted scope and Standard • Integrated: Similar semantics & style across commonly used distributed functional requirements • Uniform: Same interface for different distributed systems • SAGA: Provides Application* developers with basic units required to compose high-functionality across different distributed systems (*) One person’s Application is another person’s Tool Text
SAGA: Job Submission Role of Adaptors (middleware binding) Text
SAGA: Implementations • Currently there are several implementations under active development: • C++ Reference Implementation (LSU) -- OMII-UKhttp://saga.cct.lsu.edu/cpp/ • Java Implementation (VU Amsterdam), part of the OMII-UK projecthttp://saga.cct.lsu.edu/java/ • JSAGA (IN2P3/CNRS)http://grid.in2p3.fr/jsaga/ • DEISA (partial) job, file package • C++: Currently at v1.3.3 (October 2009) • Python bindings to the C++ available Good faith effort to keep things working
SAGA: Available Adaptors • Job Adaptors • Fork (localhost), SSH, Condor, Globus GRAM2, OMII GridSAM,Amazon EC2, Platform LSF • File Adaptors • Local FS, Globus GridFTP, Hadoop Distributed Filesystem (HDFS),CloudStore KFS, OpenCloud Sector-Sphere • Replica Adaptors • PostgreSQL/SQLite3, Globus RLS • Advert Adaptors • PostgreSQL/SQLite3, Hadoop H-Base, Hypertable
SAGA: Available Adaptors • Other Adaptors • Default RPC / Stream / SD • Planned Adaptors • CURL file adaptor, gLite job adaptor • Open issues: • Consolidating the Adaptor code base and adding rigorous tests in order to improve adaptor quality • Capability Provider Interface (CPI - the ‘Adaptor API’) is not documented or standardized (yet), but looking at existing adaptor code should get you started if you want to develop your own adaptor • Proof by example..
Taxonomy of Distributed Application • Example of Distributed Execution Mode: • Implicitly Distributed • 1000 job submissions on the TG • SAGA shell example/tutorial • Example of Explicit Coordination and Distribution • Explicitly Distributed • DAG-based Workflows • EnKF-HM application • Example of SAGA-based Frameworks • MapReduce, Pilot-Jobs
Development Distributed Application Frameworks • Frameworks: Logical structure for Capturing Application Requirements, Characteristics & Patterns • Pattern: Commonly recurring modes of computation • Programming, Deployment, Execution, Data-access.. • Abstraction: Mechanism to support patterns and application characteristics • Frameworks designed to either: • Support Patterns: Map-Reduce, Master-Worker, Hierarchical Job-Submission • Provide the abstractions and/or support the requirements & characteristics of applications • i.e. Encode a Usage-Mode using a Framework
Abstractions for Distributed Computing (1) BigJob: Container Task Adaptive: Type A: Fix number of replicas; vary cores assigned to each replica. Type B: Fix the size of replica, vary number of replicas (Cool Walking) -- Same temperature range (adaptive sampling) -- Greater temperature range (enhanced dynamics)
Abstractions for Distributed Computing (2)SAGA Pilot-Job (Glide-In)
Distributed Adaptive Replica Exchange (DARE)Scale-Out, Dynamic Resource Allocation and Aggregation
Multi-Physics Runtime FrameworksExtensibility • Coupled Multi-Physics require two distinct, but concurrent simulations • Can co-scheduling be avoided? • Adaptive execution model: Yes • Load-balancing required. Capability comes for free! • First demonstrated multi-platform Pilot-Job: • TG(MD) – Condor (CFD)
Ensemble Kalman FiltersHeterogeneous Sub-Tasks • Ensemble Kalman filters (EnKF), are recursive filters to handle large, noisy data; use the EnKF for history matching and reservoir characterization • EnKF is a particularly interesting case of irregular, hard-to-predict run time characteristics:
Using more machines decreases the TTC and variation between experiments Using BQP decreases the TTC & variation between experiments further Lowest time to completion achieved when using BQP and all available resources Results: Scale-Out Performance
Performance Advantage from Scale-Out But Why does BQP Help?
Understanding Distributed Applications Development Objectives Redux • Interoperability: Ability to work across multiple distributed resources • SAGA: Middleware Agnostic • Distributed Scale-Out: The ability to utilize multiple distributed resources concurrently • Support Multiple Pilot-Jobs: Ranger, Abe, QB • Extensibility: Support new patterns/abstractions, different programming systems, functionality & Infrastructure • Pilot-Job also Coupled CFD-MD, Integrated BQP • Adaptivity: Response to fluctuations in dynamic resource and availability of dynamic data • Simplicity: Accommodate above distributed concerns at different levels easily…
SAGA: Bridging the Gap between Infrastructure and Applications Focus on Application Development and Characteristics, not infrastructure details
SAGA-based Tools and Projects • JSAGA from IN2P3 (Lyon) • http://grid.in2p3.fr/jsaga/index.html • Slides Ack: Sylvain Renaud • GANGA-DIANE (EGEE) • http://faust.cct.lsu.edu/trac/saga/wiki/Applications/GangaSAGA • Slides Ack: Jackub Mosciki, Massimo L, O. Weidner • NAREGI/KEK (Active) • DESHL • DEISA-based Shell and Workflow library • XtreemOS • SD Specification • With gLite adaptors Advantage of Standards Text
JSAGA uses SAGA in a module, which hides heterogeneity of grid infrastructures JSAGA implements SAGA to hide heterogeneity of middlewares Applications JSAGA jobs collection SAGA JSAGA core engine + plug-ins Legacy APIs JSAGA: Implementer and user of SAGA
/ Projects using JSAGA • Elis@ • a web portal for submitting jobs to industrial and research grid infrastructures • SimExplorer • a set of tools for managing simulation experiments • includes a workflow engine that submit jobs to heterogeneous distributed computing resources • JJS • a tool for running efficiently short-life jobs on EGEE • JUX • a multi-protocols file browser JSAGA
DIANE INTEGRATION cont. Diane without SAGA Diane with SAGA
Applications on heterogeneous resources Federating resources! Payload distribution (Not in this demo: cloud resources, additional Grid infrastructures…) Master Application-aware (and resource-aware) scheduling Ganga/SAGA (to *) Ganga/SAGA (to TeraGrid) Ganga/gLite Agents scheduling Heterogeneous resources allocation (Ganga + Ganga/SAGA)
Acknowledgements SAGA Team and DPA Team and the UK-EPSRC (UK EPSRC: DPA, OMII-UK , OMII-UK PAL) People: SAGA D&D: Hartmut Kaiser, Ole Weidner, Andre Merzky, Joohyun Kim, Lukasz Lacinski, João Abecasis, Chris Miceli, Bety Rodriguez-Milla SAGA Users: Andre Luckow, Yaakoub el-Khamra, Kate Stamou, Cybertools (Abhinav Thota, Jeff, N. Kim), Owain Kenway Google SoC: Michael Miceli, Saurabh Sehgal, Miklos Erdelyi Collaborators and Contributors: Steve Fisher & Group, Sylvain Renaud (JSAGA), Go Iwai & Yoshiyuki Watase (KEK) DPA: Dan Katz, Murray Cole, Manish Parashar, Omer Rana, Jon Weissman