1 / 21

On the Placement of Web Server Replicas

On the Placement of Web Server Replicas. Lili Qiu, Microsoft Research Venkata N. Padmanabhan, Microsoft Research Geoffrey M. Voelker, UCSD IEEE INFOCOM’2001, Anchorage, AK, April 2001. Outline. Overview Related work Our approach Simulation methodology & results Summary. Motivation.

edric
Download Presentation

On the Placement of Web Server Replicas

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On the Placement of Web Server Replicas Lili Qiu, Microsoft Research Venkata N. Padmanabhan, Microsoft Research Geoffrey M. Voelker, UCSD IEEE INFOCOM’2001, Anchorage, AK, April 2001

  2. Outline • Overview • Related work • Our approach • Simulation methodology & results • Summary

  3. Motivation • Growing interests in Web server replicas • Exponential growth in Web usage • Content providers want to offer better service at lower cost • Solution: replication • Forms of Web server replicas • Mirror sites • Content Distribution Networks (CDNs) • CDN: a network of servers • Examples: Akamai, Digital Island Internet replica replica replica replica replica Content Providers Clients

  4. Placement of Web Server Replicas • Problem specification • Among a set of N potential sites, pick K sites as replicas to minimize users’ latency or bandwidth usage Internet Content Providers Clients

  5. Related Work • Placement of Web proxies [LGI+99] • Cache location [KRS00] • Placement of Internet instrumentation [JJJ+00]

  6. Our Approach • Model Internet as a graph • Parameterize the graph using measured inputs • # requests generated from each region • Distance between different regions • Map the placement problem onto a graph optimization problem • Assumption: • Each client uses a single replica that is closest to it • Solve graph optimization problem • Using various approximation algorithms

  7. Minimum K-median Problem • Given a complete graph G=(V,E), d(j), c(i,j) • d(j): # requests • c(i,j): distance between node i and j • Latency • or hop counts • or other metric to be optimized • Find a subset V’ V with |V’| = K s.t. it minimizes vVminwV’d(v)c(v,w) • NP-hard problem 8 7 4 5 3 2 2 2 4 8 6 3 5 10 6

  8. Placement Algorithms • Tree based algorithm [LGG+99] • Assume the underlying topologies are trees, and model it as a dynamic programming problem • O(N3M2) for choosing M replicas among N potential places • Random • Pick the best among several random assignments • Hot spot • Place replicas near the clients that generate the largest load

  9. Placement Algorithms (Cont.) • Greedy algorithm • Calculate costs of assigning clients to replicas • Select replica with lowest cost • Adjust costs based upon assignment, repeat until done • Super-Optimal algorithm • Lagrangian relaxation + subgradient method

  10. Simulation Methodology • Network topology • Randomly generated topologies • Using GT-ITM Internet topology generator • Real Internet network topology • AS level topology obtained using BGP routing data from a set of seven geographically dispersed BGP peers • Web Workload • Real server traces • MSNBC, ClarkNet, NASA Kennedy Space Center • Performance Metric • Relative performance: costpractical/costsuper-optimal

  11. Simulation Methodology (Cont.) • Simulate a network of N nodes (100  N  3000) • Cluster clients using network aware clustering [KW00] • IP addresses with the same address prefix belong to a cluster • A small number of popular clusters account for most requests • Top 10, 100, 1000, 3000 clusters account for about 24%, 45%, 78%, and 94% of the requests respectively • Pick the top N clusters • Map them to different nodes

  12. Simulation Methodology (Cont.) • Random trees • Random graphs • AS-level topologies • Sensitivity to the error in the input

  13. Random Tree Topologies Tree-based algorithm performs well as expected. Greedy algorithm performs equally as well.

  14. Random Graph Topologies The greedy and hot-spot algorithms out-perform the tree-based algorithm.

  15. Large Random Graph Topologies The greedy performs the best, and the hot-spot performs nearly as well.

  16. AS-level Internet Topologies The greedy performs the best, and the hot-spot performs nearly as well.

  17. Effects of Imperfect Knowledge about Input Data • Predicted workload (using moving window average) • Perfect topology information Within 5% degradation when using predicted workload

  18. Effects of Imperfect Knowledge about Input Data (Cont.) • Predicted workload (using moving window average) • Noisy topology information • Perturb the distance between two nodes i and j by up to a factor of 2 Within 15% degradation when using predicted workload and noisy topology information

  19. Summary • One of the first experimental studies on placement of Web server replicas • Knowledge about client workload and topology is needed for provisioning replicas • The greedy algorithm performs very well • Within a factor of 1.1 – 1.5 of the super-optimal • Insensitive to noise • Stay within a factor of 2 of the super-optimal when the salted error is a factor of 4 • The hot spot algorithm performs nearly as well • Within a factor of 1.6 – 2 of the super-optimal • Obtaining input data • Moving window average for load prediction • Using BGP router data to obtain topology information

  20. Conclusion • Recommend using the greedy algorithm for deciding the placement of Web server replicas

  21. Acknowledgement • Craig Labovitz • Yin Zhang • Ravi Kumar

More Related