1 / 63

Integrated Approach to Improving Web Performance

Integrated Approach to Improving Web Performance. Lili Qiu Cornell University. Outline. Motivation & Open Issues Solutions Study Web workload, and properly provision the content distribution networks Optimizing TCP performance for Web transfers Fast packet classification Summary

hugh
Download Presentation

Integrated Approach to Improving Web Performance

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Integrated Approach to Improving Web Performance Lili Qiu Cornell University

  2. Outline • Motivation & Open Issues • Solutions • Study Web workload, and properly provision the content distribution networks • Optimizing TCP performance for Web transfers • Fast packet classification • Summary • Other Work

  3. Motivation • Web is the dominant traffic in the Internet today • Web performance is often unsatisfactory • WWW – World Wide Wait • Consequence: losing potential customers! Network congestion Overloaded Web server

  4. Why is the Web so slow? • Application layer • Web servers are overloaded … • Transport layer • Web transfers are short and busty, and interact poorly with TCP • Network layer • Routers are not fast enough • Network congestion • Route flaps and routing instabilities • … Inefficiency in any layer of the protocol stack can slow down the Web!

  5. Our Solutions • Application layer • Study Web Workload • Properly provision content distribution networks (CDNs) • Transport layer • Optimize TCP startup performance for Web transfers • Network layer • Speed up packet classification (useful for firewall & diff-serv)

  6. Part I Application Layer Approach • Study the workload of busy Web servers • The Content and Access Dynamics of a Busy Web Site: Findings and Implications. Proceedings of ACM SIGCOMM 2000, Stockholm, Sweden, August 2000. (Joint work with V. N. Padmanabhan) • Properly provision content distribution networks • On the Placement of Web Server Replicas. Submitted to INFOCOM'2001. (Joint work with V. N. Padmanabhan and G. M. Voelker)

  7. Introduction • Solid understanding of Web workload is critical for designing robust and scalable systems • The workload of popular Web servers is not well understood • Study the content and access dynamics of MSNBC web site • alarge newsserver • one of the busiestsites in the Web • 25 million accesses a day (HTML content alone) • Period studied: Aug. – Oct. 99 & Dec. 17, 98 flash crowd • Properly provision content distribution networks • Where to place the edge servers in the CDNs

  8. Temporal Stability of File Popularity • Methodology • Consider the traces from a pair of days • Pick the top n popular documents from each day • Compute the overlap • Results • One day apart:significant overlap (80%) • Two months apart: smaller overlap (20-80%) • Ten months apart: very small overlap (mostly below 20%) The set of popular documents remains stable for days

  9. Spatial Locality inClient Accesses Domain membership is significant except when there is a “hot” event of global interest

  10. Spatial Distribution of Client Accesses • Cluster clients using network aware clustering [KW00] • IP addresses with the same address prefix belongs to a cluster • Top 10, 100, 1000, 3000 clusters account for about 24%, 45%, 78%, and 94% of the requests respectively A small number of client clusters contribute most of the requests.

  11. The Applicability of Zipf-law to Web requests • The Web requests follow Zipf-like distribution • Request frequency  1/i, where i is a document’s ranking • The value of  is much larger in MSNBC traces • 1.4 – 1.8 in MSNBC traces • smaller or close to 1 in the proxy traces • close to 1 in the small departmental server logs [ABC+96] • Highest when there is a hot event

  12. Impact of larger  • Accesses in MSNBC traces are much more concentrated 90% of the accesses are accounted by • Top 2-4% files in MSNBC traces • Top 36% files in proxy traces (Microsoft proxies and the proxies studied in [BCF+99]) • Top 10% files in small departmental server logs reported in [AW96] Popular news sites like MSNBC see much more concentrated accesses  Reverse caching and replication can be very effective!

  13. Introduction to Content Distribution Networks (CDNs) • Content providers want to offer better service to their clients at lower cost • Increasing deployment of content distribution networks (CDNs) • Akamai, Digital Island, Exodus … • Idea: a network of servers • Features: • Outsourcing infrastructure • Improve performance by moving content closer to end users • Flash crowd protection CDN server server server server server Content Providers Clients

  14. Placement of CDN servers • Goal • minimize users’ latency or bandwidth usage • Minimum K-median problem • Select K centers to minimize the sum of assignment costs • Cost can be latency or bandwidth or other metric we want to optimize • NP-hard problem CDN server server server server server Content Providers Clients

  15. Placement Algorithms • Tree based algorithm [LGG+99] • Assume the underlying topologies are trees, and model it as a dynamic programming problem • O(N3M2) for choosing M replicas among N potential places • Random • Pick the best among several random assignments • Hot spot • Place replicas near the clients that generate the largest load

  16. Placement Algorithms (Cont.) • Greedy algorithm Greedy(N,M) { for I = 1 .. M { for each remaining replica R { cost[R] = cost after placing an additional replica at R } select the replica with the lowest cost } } • Super Optimal algorithm • Lagrangian relaxation + subgradient method

  17. Simulation Methodology • Network topology • Randomly generated topologies • Using GT-ITM Internet topology generator • Real Internet network topology • AS level topology obtained using BGP routing data from a set of seven geographically dispersed BGP peers • Web Workload • Real server traces • MSNBC, ClarkNet, NASA Kennedy Space Center • Performance Metric • Relative performance: costpractical/costsuper-optimal

  18. Simulation Results inRandom Tree Topologies

  19. Simulation Results inRandom Graph Topologies

  20. Simulation Results inReal Internet Topologies

  21. Effects of Imperfect Knowledge about Input Data • Predict load using moving window average (a) Perfect knowledge about topology (b) Knowledge about Topology with a factor of 2 accurate

  22. Conclusion • Characterize Web workload using MSNBC traces • Placement of CDN servers • Knowledge about client workload and topology is crucial for provisioning CDNs • The greedy algorithm performs the best • Within a factor of 1.1 – 1.5 of super-optimal • The greedy algorithm is insensitive to noise • Stay within a factor of 2 of the super-optimal when the salted error is a factor of 4 • The hot spot algorithm performs nearly as well • Within a factor of 1.6 – 2 of super-optimal • How to obtain inputs • Moving window average for load prediction • Using BGP router data to obtain topology information

  23. Part II Transport Layer Approach • Speeding Up Short Data Transfers: Theory, Architectural Support, and Simulation Results. Proceedings of NOSSDAV 2000 (Joint work with Yin Zhang and Srinivasan Keshav)

  24. Motivation • Characteristics of Web data transfers • Short & bursty [Mah97] • Use TCP • Problem: Short data transfers interact poorly with TCP !

  25. TCP/Reno Basics • Slow Start • Exponential growth in congestion window, • Slow: log(n) round trips for n segments • Congestion Avoidance • Linear probing of BW • Fast Retransmission • Triggered by 3 Duplicated ACK’s

  26. Related Work • P-HTTP [PM94] • Reuses a single TCP connection for multiple Web transfers, but still pays slow start penalty • T/TCP [Bra94] • Cache connection count, RTT • TCP Control Block Interdependence [Tou97]: • Cache cwnd, but large bursts cause losses • Rate Based Pacing [VH97] • 4K Initial Window [AFP98] • Fast Start [PK98, Pad98] • Need router support to ensure TCP friendliness

  27. Our Approach • Directly enter Congestion Avoidance • Choose optimal initial congestion window • A Geometry Problem: Fitting a block to the service rate curve to minimize completion time

  28. Optimal Initial cwnd • Minimize completion time by having the transfer end at an epoch boundary.

  29. Shift Optimization • Minimize initial cwnd while keeping the same integer number of RTT’s Before optimization:cwnd = 9 After optimization:cwnd = 5

  30. Effect of Shift Optimization

  31. TCP/SPAND • Estimate network state by sharing performance information • SPAND: Shared PAssive Network Discovery [SSK97] • Directly enter Congestion Avoidance, starting with the optimal initial cwnd • Avoid large bursts by pacing Internet Performance Server Web Servers

  32. Implementation Issues • Scope for sharing and aggregation • 24-bit heuristic • network-aware clustering [KW00] • Collecting performance information • Performance reports, New TCP option, Windmill’s approach, … • Information aggregation • Sliding window average • Retrieving estimation of network state • Explicit query, active push, … • Pacing • Leaky bucket based pacing

  33. Opportunity for Sharing • MSNBC: 90% requests arrive within 5 minutes since the most recent request from the same client network (using 24-bit heuristic)

  34. Cost for Sharing • MSNBC: 15,000-25,000 different client networks in a 5-minute interval during peak hours (using 24-bit heuristic)

  35. Simulation Results • Methodology • Download files in rounds • Performance Metric • Average completion time • TCP flavors considered • reno-ssr: Reno with slow start restart • reno-nssr: Reno w/o slow start restart • newreno-ssr: NewReno with slow start restart • newreno-nssr: NewReno w/o slow start restart

  36. Simulation Topologies

  37. T1 Terrestrial WAN Link withSingle Bottleneck

  38. T1 Terrestrial WAN Link withMultiple Bottlenecks

  39. T1 Terrestrial WAN Link with Multiple Bottlenecks and Heavy Congestion

  40. TCP Friendliness (I)Against reno-ssr with 50-ms Timer

  41. TCP Friendliness (II)Against reno-ssr with 200-ms Timer

  42. Conclusions • TCP/SPAND significantly reduces latency for short data transfers • 35-65% compared to reno-ssr / newreno-ssr • 20-50% compared to reno-nssr / newreno-nssr • Even higher for fatter pipes • TCP/SPAND is TCP-friendly • TCP/SPAND is incrementally deployable • Server-side modification only • No modification at client-side

  43. Part III Network Layer Approach • Fast Packet Classification on Multiple Dimensions. Cornell CS Technical Report 2000-1805, July 2000. (Joint work with G. Varghese and S. Suri, in progress)

  44. Motivation • Traditionally, routers forward packets based on the destination field only • Diff-serv and firewall require layer 4 switching • forward packets based on multiple fields in the packet header, e.g. source IP address, destination IP address, source port, destination port, protocol, type of service (tos) … • The general packet classification problem has poor worst-case cost: • Given N arbitrary filters with k packet fields • either the worst-case search time is ((logN)k-1) • or the worst-case storage is O(Nk)

  45. Problem Specification • Given a set of filters (or rules), where each filter specifies • a class of packet headers based on K fields • an associated directive, which specifies how to forward the packet matching this filter • Goal: Find the best matching filter for each incoming packet • A packet P matches a filter F if every field of P matches the corresponding field of F • Exact match, prefix match, or range match • Assume prefix matching

  46. Problem Specification (Cont.) • Example of Cisco Access control List (ACL) • access-list 100 deny udp 26.145.168.192 255.255.255.255 74.199.168.192 255.255.255.0 eq 2049 • access-list 100 permit ip 74.199.191.192 255.255.0.0 255 74.199.168.192.255.0.0 • access-list 100 permit tcp 250.197.149.202 255.0.0.0 74.199.20.76 255.0.0.0 • Packet: tcp 250.19.34.34 74.23.5.12 matches filter 3

  47. Backtracking Search • A trie is a binary branching tree, with each branch labeled 0 or 1 • The prefix associated with a node is the concatenation of all the bits from the root to the node D E

  48. Backtracking Search (Cont.) • Extend to multiple dimensions • Backtracking is a depth-first traversal of the tree which visits all the nodes satisfying the given constraints • Example: search for [00*,0*,0*]

  49. Trie Compression Algorithm • If a path AB satisfies the Compressible Property: • All nodes on its left point to the same place L • All nodes on its right point to the same place R • then we compress the entire branches by 3 edges • Center edge with value (AB) pointing to B • Left edge with value < (AB) pointing to L • Right edge with value > (AB) pointing to R • Advantages of compression: save time & storage

  50. Trading Storage for Time • Smoothly tradeoff storage for time • Selective push • Push down the filters with large backtracking time • Iterate until the worst-case backtracking time satisfies our requirement Exponential Time Exponential Space

More Related