1 / 20

Enabling Self-management of Component-based High-performance Scientific Applications

Enabling Self-management of Component-based High-performance Scientific Applications. Hua (Maria) Liu and Manish Parashar The Applied Software Systems Laboratory Department of Electrical and Computer Engineering Rutgers University. Challenges. Emerging scientific applications are

Download Presentation

Enabling Self-management of Component-based High-performance Scientific Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Enabling Self-management of Component-based High-performance Scientific Applications Hua (Maria) Liu and Manish Parashar The Applied Software Systems Laboratory Department of Electrical and Computer Engineering Rutgers University

  2. Challenges • Emerging scientific applications are • Distributed, heterogeneous, long-running, dynamic • Changing user requirements • Changing problem domains • Changing context environments • Emerging execution environments are also • Distributed, heterogeneous, dynamic • Changing workload and communication capabilities

  3. Solution • Applications should be aware of changes in application/system state and execution context, and respond to them. • i.e., applications should be self-managing or autonomic • However, this requires a programming system that can support the development and execution of such autonomic self-managing applications. • Extend computational elements (objects, components, and services) to support autonomic behaviors • Define dynamic composition (interactions) of autonomic elements that responds to changing user requirements and execution context • Provide a runtime infrastructure to achieve self-management

  4. Outline • Challenges and solution • Conceptual model of Accord • Prototype implementation based on CCA Ccaffeine framework • Illustrative applications

  5. Overview of Accord Programming System • Accord supports • Dynamic specification of adaptation behaviors in rules • Runtime enforcement of adaptation behaviors by invoking sensors and actuators • Runtime conflict detection and resolution • Key contributions • Accord provides programming abstractions to define the control port • Accord enables applications to be context-aware and self-managing • Accord enables element behavior adaptation and interaction adaptation at runtime

  6. Autonomic Element Other Interface invocation Event generation Actuator invocation Element Manager Computational Element Functional Port Element Manager Control Port Operational Port Internal state Contextual state Autonomic Element Rules

  7. The Accord Runtime Infrastructure Application workflow Application strategies Application requirements Composition manager Composition rules Composition rules Composition rules Composition rules Component rules Component rules Component rules Component rules

  8. P0 P1 P2 P3 CCA and Ccaffeine Framework • Each process loaded with the same set of components wired the same way • Different components in same process “talk to each” other via ports and the framework • Same component in different processes talk to each other through their favorite communications layer (i.e. MPI, PVM, GA) • The characteristics of scientific applications • These applications are component-based. • The execution of these applications typically consists of a series of computational phases. Components: Blue, Green, Red Framework: Gray Note: this slide is taken from CCA tutorial – www.cca-forum.org

  9. Accord-CCA: Extend Ccaffeine to Enable Self-Management Behaviors C2 C1 C4 C3 Driver Ccaffeine framework + TAU Composition manager Component manager Controllable component

  10. Manager Components • Component managers provide component-level adaptations via • Adapting the runtime behaviors of individual component based on component rules • Dynamically replacing components based on composition rules • Composition managers provide application-level adaptations via • Coordinating component managers’ behaviors C2 events TAU RulePort C3

  11. Rule Rule { on events; when conditions; do actions; } component or system events component or system sensors component or system actuators

  12. Pre-condition Post-condition The Rule Enforcement Engine Sensor-actuator conflict: • Detection: Execution of some rules will change the pre-condition • Resolution: Disable these rules Context Batch condition inquiry Reconciliation Conflict detection and resolution Condition evaluation in parallel Batch action invocation Internal state of elements Actuator-actuator conflict: • Detection: The post-condition contains multiple • Resolution: Relax rule condition until no actuators are invoked with different values by incrementally deleting sensors in a user-specified sequence

  13. Reconciliation Algorithm 1 Algorithm 2 C1 C1 C2 C2 C3 C3 Node x Node y Case2: If all the replacements have a low priority, the replacement with highest performance gain will be propagated. Case1: If the replacement on node z has a high priority and the other two have a low priority: propagate the replacement with C4. If multiple high priority replacements: error. Algorithm 1 C1 C2 C4 Node z

  14. Rule Generator The Self-managing CH4 Ignition Simulation: Self-optimizing Via Component Adaptation A set of algorithms is provided to simulate a set ofreaction processes. Some algorithms may not work at some temperatures. Further, these algorithms demonstrate different performance levels (execution time) at the same temperature. So algorithms have to be dynamically selected to avoid application crash and/or optimize application execution. Export sensor “temperature” and actuator “algorithm” Component Manager Thermo Chemistry Initializer Executor Cvode Ref

  15. The Self-managing Shock Simulation: Self-optimizing Via Component Replacement IF cache miss of GodunovFlux > value THEN REPLACE GodunovFlux EFMFlux 3. evaluate the rule Component Manager 2. collect cache miss of GodunovFlux Performance toolkit (TAU) 4. replace GodunovFlux with EFMFlux 1. register cache miss event GodunovFlux EFMFlux EFMFlux will be used from the next computation

  16. The Self-managing Shock Simulation: Self-optimizing Via Component Adaptation IF bandwidth < threshold THEN algorithm x 4. evaluate the rule Component Manager 3. collect current bandwidth 5. invoke algorithm with x 1. export actuator “algorithm” Performance toolkit (TAU) 2. register communication bandwidth x AMRMesh y Algorithm x will be used from the next computation

  17. The Self-managing Shock Simulation: Self-healing Via Component Replacement IF GodunovFlux error THEN REPLACE GodunovFlux EFMFlux 2. evaluate the rule Component Manager 1. register execution error as a sensor 3. replace GodunovFlux with EFMFlux GodunovFlux EFMFlux

  18. Conclusion • The distribution, heterogeneity, and dynamism of emerging environments and applications impose new requirements on programming systems • To support development and execution of autonomic self-managing applications • Accord programming system extends CCA Ccaffeine framework to meet the requirements • Extends CCA components with component managers to autonomic components • Provides a runtime infrastructure to enforce adaptation behaviors and detect/resolve runtime conflicts

  19. Additional Slides

  20. Centralized vs Decentralized Reconciliation • Centralized approach: one instance collects proposals from other instances and propogates reconciliation result • Converging rate = O(n) • Low scalability • Not robust • Decentralized approach: each instance only communicates with its neighbors to achieve local consensus • Converging rate = O(lg n) • High scalability • Robust • Problems to be solved • Local rules used by individual component instances • How to define neighbors

More Related