Optimizing Web Server Replica Placement: A Greedy Algorithm Approach

On the Placement of Web Server Replicas Lili Qiu, Microsoft Research Venkata N. Padmanabhan, Microsoft Research Geoffrey M. Voelker, UCSD IEEE INFOCOM’2001, Anchorage, AK, April 2001

Outline • Overview • Related work • Our approach • Simulation methodology & results • Summary

Motivation • Growing interests in Web server replicas • Exponential growth in Web usage • Content providers want to offer better service at lower cost • Solution: replication • Forms of Web server replicas • Mirror sites • Content Distribution Networks (CDNs) • CDN: a network of servers • Examples: Akamai, Digital Island Internet replica replica replica replica replica Content Providers Clients

Placement of Web Server Replicas • Problem specification • Among a set of N potential sites, pick K sites as replicas to minimize users’ latency or bandwidth usage Internet Content Providers Clients

Related Work • Placement of Web proxies [LGI+99] • Cache location [KRS00] • Placement of Internet instrumentation [JJJ+00]

Our Approach • Model Internet as a graph • Parameterize the graph using measured inputs • # requests generated from each region • Distance between different regions • Map the placement problem onto a graph optimization problem • Assumption: • Each client uses a single replica that is closest to it • Solve graph optimization problem • Using various approximation algorithms

Minimum K-median Problem • Given a complete graph G=(V,E), d(j), c(i,j) • d(j): # requests • c(i,j): distance between node i and j • Latency • or hop counts • or other metric to be optimized • Find a subset V’ V with |V’| = K s.t. it minimizes vVminwV’d(v)c(v,w) • NP-hard problem 8 7 4 5 3 2 2 2 4 8 6 3 5 10 6

Placement Algorithms • Tree based algorithm [LGG+99] • Assume the underlying topologies are trees, and model it as a dynamic programming problem • O(N3M2) for choosing M replicas among N potential places • Random • Pick the best among several random assignments • Hot spot • Place replicas near the clients that generate the largest load

Placement Algorithms (Cont.) • Greedy algorithm • Calculate costs of assigning clients to replicas • Select replica with lowest cost • Adjust costs based upon assignment, repeat until done • Super-Optimal algorithm • Lagrangian relaxation + subgradient method

Simulation Methodology • Network topology • Randomly generated topologies • Using GT-ITM Internet topology generator • Real Internet network topology • AS level topology obtained using BGP routing data from a set of seven geographically dispersed BGP peers • Web Workload • Real server traces • MSNBC, ClarkNet, NASA Kennedy Space Center • Performance Metric • Relative performance: costpractical/costsuper-optimal

Simulation Methodology (Cont.) • Simulate a network of N nodes (100  N  3000) • Cluster clients using network aware clustering [KW00] • IP addresses with the same address prefix belong to a cluster • A small number of popular clusters account for most requests • Top 10, 100, 1000, 3000 clusters account for about 24%, 45%, 78%, and 94% of the requests respectively • Pick the top N clusters • Map them to different nodes

Simulation Methodology (Cont.) • Random trees • Random graphs • AS-level topologies • Sensitivity to the error in the input

Random Tree Topologies Tree-based algorithm performs well as expected. Greedy algorithm performs equally as well.

Random Graph Topologies The greedy and hot-spot algorithms out-perform the tree-based algorithm.

Large Random Graph Topologies The greedy performs the best, and the hot-spot performs nearly as well.

AS-level Internet Topologies The greedy performs the best, and the hot-spot performs nearly as well.

Effects of Imperfect Knowledge about Input Data • Predicted workload (using moving window average) • Perfect topology information Within 5% degradation when using predicted workload

Effects of Imperfect Knowledge about Input Data (Cont.) • Predicted workload (using moving window average) • Noisy topology information • Perturb the distance between two nodes i and j by up to a factor of 2 Within 15% degradation when using predicted workload and noisy topology information

Summary • One of the first experimental studies on placement of Web server replicas • Knowledge about client workload and topology is needed for provisioning replicas • The greedy algorithm performs very well • Within a factor of 1.1 – 1.5 of the super-optimal • Insensitive to noise • Stay within a factor of 2 of the super-optimal when the salted error is a factor of 4 • The hot spot algorithm performs nearly as well • Within a factor of 1.6 – 2 of the super-optimal • Obtaining input data • Moving window average for load prediction • Using BGP router data to obtain topology information

Conclusion • Recommend using the greedy algorithm for deciding the placement of Web server replicas

Acknowledgement • Craig Labovitz • Yin Zhang • Ravi Kumar

Optimizing Web Server Replica Placement: A Greedy Algorithm Approach

Optimizing Web Server Replica Placement: A Greedy Algorithm Approach

Presentation Transcript

Web Server

Death of a Web Server

On the Optimal Placement of Mix Zones

Decoupled Storage: “Free the Replicas!”

Studying the Impact of More Complete Server Information on Web Caching

Advanced Web Server/HTTP Server

Analysis of web server logs

Web server

On the Effect of Server Adaptation for Web Content Delivery

Placement of Web-Server Proxies with Consideration of Read and Update Operations on the Internet

Measuring the Capacity of a Web Server

Measuring The Capacity of a Web server

WEB SERVER

TclHttpd The Tcl Web Server

Optimal Placement of Replicas in Trees with Read, Write, and Storage Costs

Bodily Web Server vs. Digital Web Server