1 / 100

Monitoring, Configuration and Resource Management of Service Workflows in Virtualized Clusters and Clouds

Monitoring, Configuration and Resource Management of Service Workflows in Virtualized Clusters and Clouds. Yi Wei Advisor: Prof. M. Brian Blake and Prof. Greg Madey April 11, 2013. Outlines. Introduction and Motivations Backgrounds and Definitions Technical Approaches and Evaluations

shelly
Download Presentation

Monitoring, Configuration and Resource Management of Service Workflows in Virtualized Clusters and Clouds

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Monitoring, Configuration and Resource Management of Service Workflows in Virtualized Clusters and Clouds Yi Wei Advisor: Prof. M. Brian Blake and Prof. Greg Madey April 11, 2013

  2. Outlines • Introduction and Motivations • Backgrounds and Definitions • Technical Approaches and Evaluations • Workflow Configuration • Service Monitoring • Resource Management • Cross-workflow Coordination • Conclusion and Future Work

  3. The Cloud - illustrated Image from cloudtweaks.com

  4. Introduction: Managing Service Workflows in Cloud Environments Cloud B Long-standing Workflows Cloud A VM VM VM VM VM Cloud C Cloud-oriented Resources

  5. Real World Distributed Service Workflows A workflow for detecting tumor types through microarray analysis (part of the CancerGrid environment).

  6. Evolution of the Topic • Integrate Service-oriented • Computing and Cloud • Computing by the means of an • improved service lifecycle • model, a semi-automated • software framework and its • accompanying discovery and • management methods and tools. • SoC  service workflows • Service modeling, discovery and management  management of service workflows • Quantitative evaluations

  7. A More Focused Thesis Deployment and management of service workflows across virtualized clusters and clouds. Relevance: New requirements from hosting complex applications in clouds Illusion of infinite resource pool is just an illusion Benefit of separation of concerns

  8. Major Challenge in Resource Allocation L L • How to differentiate services in a workflow and determine the amount of resources allocated to each service? • Existing approach: SLA or QoS-based allocation (Van et. al, Zhu et. al) • My approach: infer service ranks based on workflow structure and use these values to guide resource allocation S1 S2 S4 S3 M S L M

  9. Major Challenge in Resource Management T0 T1 T2 T3 • How to fulfill the dynamic resource requirements of services in the workflow? • Existing approach: reactive or policy-based scaling (Amazon Cloud) • My approach: proactive scaling, use idle resources from underutilized services to satisfy the needs of busy services, and cross-workflow coordination S1 M S M L S2 M M

  10. Overall Novelty of Dissertation • Considers service workflows instead of individual services • Provides an end-to-end solution for workflow deployment and management • Deploys and manages workflows from a resource perspective • Offers mechanisms for cross-workflow coordination

  11. Contributions Overview: Model, algorithms, and generic, agent-based framework to monitor, configure, and allocate service workflows across virtualized resources in a cloud. • A conceptual framework for managing service workflows • An algorithm to deploy workflows onto a virtualized resource pool • An algorithm to adaptively monitor deployed services • An algorithm to dynamically manage running workflows • An algorithm to realize cross-workflow coordination

  12. Backgrounds & Definitions

  13. Target: Services and Service Workflows • A service is a self-contained software artifact equipped with standardized APIs, designed to finish a specific task. • A service workflow is a series of interdependent and loosely coupled services. Its goal is to accomplish complex business logics or scientific processes.

  14. Environment: Virtualized Clusters and Clouds User Requests A highly virtualized resource pool managed for the users and providing on-demand provisioning based on user requests and resource availabilities. Virtualized Resource Pool PM1 PM2 V1 V2 V3 V4 PM3 … V5 V6 V7 V8

  15. Virtualized Resources • A portion of the hosting physical machine’s capacity • Multiple predefined configurations (different CPU number, RAM size, etc) • Support on-demand creation and termination PM PM XLarge Large Large PM … M M M M

  16. Operations: Deployment and Management L L • Deploy new workflows • Monitor deployed services • Manage resources for deployed workflows S1 S2 S4 S3 L S M M 16

  17. Agent-based Management Framework

  18. Why Agents? • Distributed and autonomous nature • Manifest self-organization and self-steering • Model behaviors of entities of different (maybe conflicting) interests and their interactions with the environment and between themselves

  19. The Management Framework Cloud Cloud Management Agent Resource adjustment requests Decisions Workflow Workflow Management Agent Resource adjustment requests Decisions Service Service Management Agent Monitoring request Status information Instance/VM Instance Monitoring Agent

  20. Workflow Configuration

  21. Workflow Deployment The goal is to produce a mapping plan from services to VMs, then to PMs. S1 S2 Workflow L M M VM PM P1 P2 P3

  22. Considerations and Constraints • Budget of the workflow • Service priority • Capacity of each PM at deployment time • Predicted capacity of each PM

  23. Configuration Algorithm Steps • Service sorting and ranking • PM sorting based on predictive and current free capacities • Expendable budget calculation and VM size search • Residual budget distribution • VM to PM mapping Complete Algorithm VM Splitting

  24. Experiment Configuration • Simulation on traces from real world clusters • Randomly generated DAG workflows as input • Number of overloaded PMs as performance metric • FirstFit and BestFit as baseline comparisons

  25. Algorithm Comparison Additional Results

  26. Service Monitoring

  27. Motivation Different services have different degrees of availability, so it is unnecessary to check them using the same interval.

  28. The Check Period Relaxation (CPR) Algorithm • Inspired by congestion control mechanism in TCP protocol • Successful status check doubles next check interval (to a certain extent) • Failed status check half next check interval

  29. Finite State Automaton for CPR Chk=F | (Chk=S& CP>=FRL) Chk=F & FailCount = 3 FinalChk=F Chk=S & CP<FRL FR CR INV FAIL FinalChk=S Chk=S& SuCount=3 Chk: Check results SuCount: Successful check count FailCount: Fail check count S: Successful F: Fail FinalChk: Final check Chk=S| (Chk=F& FailCount<3) FR: Fast Relax StateCR: Cautious Relax State INV: Inactive FRL: Fast Relax Limit CP: Check Period

  30. Message Count Comparison

  31. Observations • CPR can reduce message count across different availability values. • Performance is not very good at 80% to 90% availability levels.

  32. Two Separate Modifications • Allow one additional failure before reducing the check period to filter out transient errors (CPR_2e) • Add an additional state to filter out unstable services (M_CPR)

  33. New Results Additional Results

  34. Workflow Resource Management

  35. The Problem Dynamically manage resources (allocate or release) of a workflow so that the load levels of its component services stay within the specified range.

  36. Limitation of Existing Approaches State-of-the-art approaches usually rely on reactive scaling or predefined rules. These approaches can be inflexible and inefficient under various situations.

  37. Resource Reallocation Use idle resources of underutilized services to meet the needs of busy services. Workflow S1 S2 L M M VM PM P1 P2 P3

  38. Billing and Management Cycles billing cycle management cycle m1 m2 m3 m4 A releases V1 to workflow agent, B request a VM V1 is allocated to B V1 started and allocated to A B releases V1

  39. Management Algorithm Overview • Service level prediction and decision making • Workflow level matching • Cloud level allocation

  40. Algorithm Process Service Load Prediction Resource Adjustment Calculation Resource Request Processing Service Service Workflow Internal VM Mapping and Request Forwarding Resource Assignment Cloud Workflow

  41. Synthetic Data Generation • Four simple generators • Complex patterns are the linear combinations of its component generators • The generated data is called required capacity

  42. Workflow Level Comparison Additional Results

  43. Average VM Lifespan Comparison

  44. VM Creation/Termination Comparison

  45. Allocation Request Fulfillment Composition

  46. Side Effect : Average VM size Shrinkage Total capacity is 273 at step 801 Total capacity is 242 at step 401

  47. VM Merge Mechanism • Happens when a load level is stable • One merge per request • Merge requests have a lower priority than allocation requests

  48. Evaluation of the Merge Mechanism

  49. Cross-Workflow Coordination

  50. Motivations • Same service is used in multiple workflows • Different workflows have different load patterns • For the same service, resources from one instance group can be used to serve another instance group

More Related