1 / 29

Empirical Quantification of Opportunities for Content Adaptation in Web Servers

Empirical Quantification of Opportunities for Content Adaptation in Web Servers. Michael Gopshtein and Dror Feitelson School of Engineering and Computer Science The Hebrew University of Jerusalem. Supported by a grant from the Israel Internet Association. Capacity Planning.

tulia
Download Presentation

Empirical Quantification of Opportunities for Content Adaptation in Web Servers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Empirical Quantification of Opportunities for Content Adaptation in Web Servers Michael Gopshtein and Dror Feitelson School of Engineering and Computer Science The Hebrew University of Jerusalem Supported by a grant from the Israel Internet Association

  2. Capacity Planning Daily cycle of activity capacity time Utilized capacity Wasted capacity

  3. Capacity Planning Flash crowd capacity time

  4. Capacity Planning • The problem: • Required capacity for flash crowds cannot be anticipated in advance • Even capacity for daily fluctuations is highly wasteful • Academic solution: use admission control • Business practice: unacceptable to reject any clients • Especially in cases of surge in traffic

  5. Content Adaptation • Trade off quality for throughput • Installed capacity matches normal load • Handle abnormal load by reducing quality • But still manage to provide meaningful service to all clients • Assumes normal optimizations have been made already • Compress or combine images, promote caching, … • Empirically this usually is not the case

  6. Content Adaptation smily Low load smily smily

  7. Content Adaptation smily smily High load smily smily smily smily smily smily

  8. Content Adaptation • Maintain the invariant: • Need to change quality (and cost!) of content • Prepare multiple versions in advance

  9. The Questions • What are the main costs in web service? • Bottleneck is CPU / network / disk? • What do we gain by eliminating HTTP requests? • What do we gain by reducing file sizes? • What can realistically be done? • What is the structure of a “random” site? • How much can we reduce quality? Assumption: static web pages only

  10. Costs of Serving Web Pages

  11. Measuring Random Web Sites • http://en.wikipedia.org/wiki/Special:Random • Use title of page as input to Google search • Extract domain of first link to get home page • Retrieve it using IE • Collect statistical data by intercepting system calls to send and receive

  12. Retrieved Component Sizes A ¼ of total data from components larger than 200 KB This is only 0.02% of the components

  13. Download Times Download time (and bandwidth requirements) roughly proportional to image size

  14. Network Bandwidth • Typical Ethernet packets are 1526 bytes • Ethernet and TCP/IP headers require 54 bytes • HTTP response headers require 280-325 • Most components fit into few packets • 43% fit into a single packet • 24% more fit into 2 packets Save bandwidth by reducing number of small components or size of large components

  15. Locality and Caching • Flash crowds typically involve a very small number of pages (possibly the home page) • Servers allocate GB of memory for cache • This is enough for thousands of files Disk is not expected to be a bottleneck

  16. CPU Overhead • CPU usage reflects several activities • Opening TCP connection • Processing request • Sending data • Measure using combinatorical microbenchmarks • Open connection only • One extremely large file • Many small files • Many requests for non-existent file

  17. CPU Overhead Example: single 10KB file • Equal processing and transfer at 240KB • Only 0.3% of files are so big If CPU is bottleneck, need to reduce number of requests

  18. Optimizations

  19. Guidelines • Either CPU or network are the bottleneck • Network bandwidth saved by reducing large components • CPU saved by eliminating small components • Maintaining “acceptable” quality is subjective

  20. Eliminating Images • Images have many functions • Story (main illustrative item) • Preview (for other page) • Commercial • Logo • Decoration (bullets, background) • Navigation (buttons, menus) • Text (special formatting) • Some can be eliminated or replaced

  21. Distribution of Types • Manually classified 959 images from 30 random sites • 50% decoration • 18% preview • 11% commercial • 6% logo • 6% text

  22. Automatic Identification • Decorations are candidates for elimination • Identified by combination of attributes: • Use gif format • Appear in HTML tags other than <IMG> • Appear multiple times in same page • Small original size • Displayed size much bigger than original • Large change in aspect ratio when displayed

  23. Image Sizes Distribution commercial preview decoration

  24. Auxiliary Files • JavaScript • May be crucial for page function • Impossible to understand automatically • CSS (style sheets) • May be crucial for page structure • May be possible to identify those parts that are used

  25. Auxiliary Files • Cannot be eliminated • Common wisdom: use separate files • Allow caching at client • Save retransmission with each page • Alternative: embed in HTML • Reduce number of requests • May be better for flash crowds that do not request multiple pages

  26. Text and HTML • Some areas may be eliminated under extreme conditions • Commercials • Some previews and navigation options • Often encapsulated in <DIV> tags • Sometimes identified by ID or class names, e.g. “sidebanner” • Especially when using modular design

  27. Summary

  28. Content Adaptation • Degraded content usually better than exclusion • Only way to handle flash crowds that overwhelm installed capacity • Empirical results identify main options • Identify and eliminate decorations • Compress large images (story, commercial) • Embed JavaScript and CSS • Hide unnecessary blocks

  29. Next Paper Preview • Implementation in Apache • Monitor CPU utilization and idle threads to switch between modes • Use mod_rewrite to redirect URLs to adapted content • Achieve up to x10 increase in throughput for extreme adaptation

More Related