210 likes | 296 Views
This study explores strategies for placing web server replicas to minimize user latency or bandwidth usage. Various approximation algorithms are examined, with a focus on the greedy algorithm's effectiveness. Simulation results under different network topologies and placement algorithms are presented, showing the impact of imperfect data on performance.
E N D
On the Placement of Web Server Replicas Lili Qiu, Microsoft Research Venkata N. Padmanabhan, Microsoft Research Geoffrey M. Voelker, UCSD IEEE INFOCOM’2001, Anchorage, AK, April 2001
Outline • Overview • Related work • Our approach • Simulation methodology & results • Summary
Motivation • Growing interests in Web server replicas • Exponential growth in Web usage • Content providers want to offer better service at lower cost • Solution: replication • Forms of Web server replicas • Mirror sites • Content Distribution Networks (CDNs) • CDN: a network of servers • Examples: Akamai, Digital Island Internet replica replica replica replica replica Content Providers Clients
Placement of Web Server Replicas • Problem specification • Among a set of N potential sites, pick K sites as replicas to minimize users’ latency or bandwidth usage Internet Content Providers Clients
Related Work • Placement of Web proxies [LGI+99] • Cache location [KRS00] • Placement of Internet instrumentation [JJJ+00]
Our Approach • Model Internet as a graph • Parameterize the graph using measured inputs • # requests generated from each region • Distance between different regions • Map the placement problem onto a graph optimization problem • Assumption: • Each client uses a single replica that is closest to it • Solve graph optimization problem • Using various approximation algorithms
Minimum K-median Problem • Given a complete graph G=(V,E), d(j), c(i,j) • d(j): # requests • c(i,j): distance between node i and j • Latency • or hop counts • or other metric to be optimized • Find a subset V’ V with |V’| = K s.t. it minimizes vVminwV’d(v)c(v,w) • NP-hard problem 8 7 4 5 3 2 2 2 4 8 6 3 5 10 6
Placement Algorithms • Tree based algorithm [LGG+99] • Assume the underlying topologies are trees, and model it as a dynamic programming problem • O(N3M2) for choosing M replicas among N potential places • Random • Pick the best among several random assignments • Hot spot • Place replicas near the clients that generate the largest load
Placement Algorithms (Cont.) • Greedy algorithm • Calculate costs of assigning clients to replicas • Select replica with lowest cost • Adjust costs based upon assignment, repeat until done • Super-Optimal algorithm • Lagrangian relaxation + subgradient method
Simulation Methodology • Network topology • Randomly generated topologies • Using GT-ITM Internet topology generator • Real Internet network topology • AS level topology obtained using BGP routing data from a set of seven geographically dispersed BGP peers • Web Workload • Real server traces • MSNBC, ClarkNet, NASA Kennedy Space Center • Performance Metric • Relative performance: costpractical/costsuper-optimal
Simulation Methodology (Cont.) • Simulate a network of N nodes (100 N 3000) • Cluster clients using network aware clustering [KW00] • IP addresses with the same address prefix belong to a cluster • A small number of popular clusters account for most requests • Top 10, 100, 1000, 3000 clusters account for about 24%, 45%, 78%, and 94% of the requests respectively • Pick the top N clusters • Map them to different nodes
Simulation Methodology (Cont.) • Random trees • Random graphs • AS-level topologies • Sensitivity to the error in the input
Random Tree Topologies Tree-based algorithm performs well as expected. Greedy algorithm performs equally as well.
Random Graph Topologies The greedy and hot-spot algorithms out-perform the tree-based algorithm.
Large Random Graph Topologies The greedy performs the best, and the hot-spot performs nearly as well.
AS-level Internet Topologies The greedy performs the best, and the hot-spot performs nearly as well.
Effects of Imperfect Knowledge about Input Data • Predicted workload (using moving window average) • Perfect topology information Within 5% degradation when using predicted workload
Effects of Imperfect Knowledge about Input Data (Cont.) • Predicted workload (using moving window average) • Noisy topology information • Perturb the distance between two nodes i and j by up to a factor of 2 Within 15% degradation when using predicted workload and noisy topology information
Summary • One of the first experimental studies on placement of Web server replicas • Knowledge about client workload and topology is needed for provisioning replicas • The greedy algorithm performs very well • Within a factor of 1.1 – 1.5 of the super-optimal • Insensitive to noise • Stay within a factor of 2 of the super-optimal when the salted error is a factor of 4 • The hot spot algorithm performs nearly as well • Within a factor of 1.6 – 2 of the super-optimal • Obtaining input data • Moving window average for load prediction • Using BGP router data to obtain topology information
Conclusion • Recommend using the greedy algorithm for deciding the placement of Web server replicas
Acknowledgement • Craig Labovitz • Yin Zhang • Ravi Kumar