1 / 19

Self-Managing Federated Services

Self-Managing Federated Services. Francisco Matias Cuenca-Acuna and Thu D. Nguyen Department of Computer Science Rutgers University. Federated Computing. Rising Internet connectivity is driving a new model of federated computing Computing systems that span multiple organizations

asaxton
Download Presentation

Self-Managing Federated Services

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Self-Managing Federated Services Francisco Matias Cuenca-Acuna and Thu D. Nguyen Department of Computer ScienceRutgers University

  2. Federated Computing • Rising Internet connectivity is driving a new model of federated computing • Computing systems that span multiple organizations • Sharing of computing resources, data, and services • Federated computing appearing at every level • Peer-to-peer: e.g., Netnews, Gnutella, KaZaA • Business-to-business: e.g., federated web services • Scientific computing: e.g., grids (http://www.gpds.org/), PlanetLab • Federated applications are hard to build and manage because they must execute in environments that are • Inherently decentralized • Widely distributed • Widely heterogeneous • Highly volatile PANIC Lab, Rutgers University

  3. PlanetP • Infrastructure support for federated computing • Data Management • Collaborative organization of shared data • Content search, rank, and retrieval • Automatic replication for predictable availability (SRDS 2003) • Application management • Automatic (self-) management of #components and their placements • Provide a common application container • PlanetP approach • Strongly consistent protocols scale poorly in wide-area environments [Gray et al. 1996, Gupta et al. 2002] • Build on probabilistic and weakly consistent protocols • Weakly consistent shared information • Autonomous actions • Randomized algorithms • Let’s deal with system sizes of up to several thousands first PANIC Lab, Rutgers University

  4. Application Management: Goal • Reduce the deployment and management of an application to • Defining an application “fitness” model • Specify QoS targets such as service availability, max throughput • Self-management of number of components and their placements to achieve QoS targets PANIC Lab, Rutgers University

  5. App App App App Application Management Network App PANIC Lab, Rutgers University

  6. ManagementAgent App App MonitoringAgent • Publish: • A replica running here • Current app load • Publish: • Node description • Current node load • Availability App Our Approach: Infrastructure App App P2P Information Sharing Infrastructure (PlanetP Data Store) App PANIC Lab, Rutgers University

  7. Our Approach: Algorithm • Periodically, each management agent autonomously searches for a “more fit” configuration • Configuration: #components and their placements • Genetic algorithm (directed random search) • If find a configuration that is better than the current configuration by more than a threshold value, deploy it • However, wait for a random delay time before deploying new configuration to avoid collisions • Publish new configuration after it has been successfully deployed PANIC Lab, Rutgers University

  8. Example Application Model • Fitness is defined as a function f(capacity, #replicas) • Expected capacity of a configuration is computed using node availability and CPU idleness • Function designed to achieve • Sufficient capacity to meet current load • Minimize number of replicas to minimize replication overheads • Spare capacity to gracefully handle load spikes • Two QoS parameters • Availability • Maximum load PANIC Lab, Rutgers University

  9. V 0 T Probabilistic boundon inconsistency inshared data Deploymenttime Random Delay • Choose T to achieve a target probability of collision • Case study: • 200 nodes • Gossiping interval = 1 sec • Deployment time small • V = 8secs • Pcollision = 0.1 PANIC Lab, Rutgers University

  10. Evaluation: Environments • Study responsiveness and stability in three environments • Unloaded cluster • 44 PCs interconnected with 100 Mb/s & 1 Gb/s Ethernet • 22 2GHz P4 PCs rated at 4000 bogomips • 22 2.8GHz Pentium Xeon with hyper-threading PCs rated at ~11200 bogomips • Same cluster loaded by other researchers (large simulation jobs) • PlanetLab • Federated system fordistributed systemsand networking research • Widely heterogeneousx86 PC nodes:800 – 5600 bogomips PANIC Lab, Rutgers University

  11. jUDDI HSQLDB/R Evaluation: UDDI Service Tomcat Runtime Site 2 Web server Site 3 PlanetP Manager Agent Site 4 Java Runtime Site 1 Mon.Agent PlanetP community Base gossiping interval = 1sec PANIC Lab, Rutgers University

  12. Evaluation: UDDI Service • Populated UDDI service with data from Xmethods • Web site listing publicly available web services • 400 services  3MB registry database • Clients issue random findBusiness queries • Fitness function for UDDI service set to: • Max desired CPU resources: 60,000 bogomips • ~20 P4 PCs • Max of 4 to 20 replicas PANIC Lab, Rutgers University

  13. Cluster: No. Replicas vs. Load PANIC Lab, Rutgers University

  14. Cluster: Acquired Capacity vs. Load PANIC Lab, Rutgers University

  15. Cluster: Throughput vs. Load PANIC Lab, Rutgers University

  16. PlanetLab: No. Replicas vs. Load PANIC Lab, Rutgers University

  17. PlanetLab: Acquired Capacity vs. Load PANIC Lab, Rutgers University

  18. PlanetLab: Throughput vs. Load PANIC Lab, Rutgers University

  19. Summary and Future Work • Described decentralized management framework for federated applications • Autonomous actions based on weakly consistent shared data for robustness and scalability • Probabilistic serialization to avoid collisions • Responsive and stable when managing an example UDDI service in 3 test-beds • More sophisticated application models? • Predictive model for resources • Account for cost of changing application configurations • Applications with multiple types of components PANIC Lab, Rutgers University

More Related