1 / 50

Web Performance Modeling Issues

Web Performance Modeling Issues. Daniel A. Menascé Department of Computer Science George Mason University http://www.cs.gmu.edu/faculty/menasce.html. ã 1998 Menascé, D. A.. All Rights Reserved. Outline. E-commerce facts. WWW Traffic Characterization. Improving Web Performance.

sumi
Download Presentation

Web Performance Modeling Issues

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Web Performance Modeling Issues Daniel A. Menascé Department of Computer Science George Mason University http://www.cs.gmu.edu/faculty/menasce.html ã 1998 Menascé, D. A.. All Rights Reserved.

  2. Outline • E-commerce facts. • WWW Traffic Characterization. • Improving Web Performance. • Predicting Web Performance. • An Example. • Concluding Remarks. ã 1998 Menascé, D. A.. All Rights Reserved.

  3. Part I E-commerce Facts ã 1998 Menascé, D. A.. All Rights Reserved.

  4. Electronic Commerce: online sales are soaring “… IT and electronic commerce can be expected to drive economic growth for many years to come.” The Emerging Digital Economy, US Dept. of Commerce, 1998. ã 1998 Menascé, D. A.. All Rights Reserved.

  5. Caution Signs Along the Road There will be jolts and delays along the way for electronic commerce: congestion is the most obvious challenge. (Gross & Sager, Business Week, June 22, 1998, p. 166.) ã 1998 Menascé, D. A.. All Rights Reserved.

  6. What people are saying about Web performance… • “Tripod’s Web site is our business. If it’s not fast and reliable, there goes our business.”, Don Zereski, Tripod’s vice-president of Technology (Internet World) ã 1998 Menascé, D. A.. All Rights Reserved.

  7. What people are saying about Web performance… • “Sites have been concentrating on the right content. Now, more of them -- specially e-commerce sites -- realize that performance is crucial in attracting and retaining online customers.” Gene Shklar, Keynote, The New York Times, 8/8/98 ã 1998 Menascé, D. A.. All Rights Reserved.

  8. What people are saying about Web performance… • “Capacity is King.” Mike Krupit, Vice President of Technology, CDnow, 06/01/98 • “Being able to manage hit storms on commerce sites requires more than just buying more plumbing.” Harry Fenik, vice president of technology, Zona Research, LANTimes, 6/22/98 ã 1998 Menascé, D. A.. All Rights Reserved.

  9. E-commerce facts • Businesses will exchange $327 billion in goods and services by the year 2,002. • Cisco Systems sells $4 billion/yr on the Web at a cost savings of $363 million. • General Electric estimates that e-commerce will save them $500 million over the next three years. • Boeing booked $100 million in spare parts in the first seven month of activity of its Web site. • Texas Instruments fills 60,000 orders a month through its Web site meeting delivery deadlines 95% of the time. ã 1998 Menascé, D. A.. All Rights Reserved.

  10. Business in the Internet Age (Business Week, June 22, 1998) ã 1998 Menascé, D. A.. All Rights Reserved.

  11. Part II WWW Traffic Characteristics ã 1998 Menascé, D. A.. All Rights Reserved.

  12. WWW Traffic Characteristics • Unpredictable in nature. • Self-similar, i.e., bursty over several time scales. • Load spikes can be many times higher than average traffic. • Workload characterization studies done at: • client side • proxy cache • server • Web • see http://www.parc.xerox.com/istl/projects/http-ng/web-characterization-reading.html ã 1998 Menascé, D. A.. All Rights Reserved.

  13. Workload Characterization at the Client Side Cunha, Bestavros, and Crovella (1995) • Half a million requests from instrumented Mosaic in an academic setting. • The distribution of document sizes, popularity of documents as a function of size, distribution of user requests for documents, and number of references to documents as a function of overall rank in popularity can be modeled bypower-law distributions. ã 1998 Menascé, D. A.. All Rights Reserved.

  14. Workload Characterization at the Client Side Cunha, Bestavros, and Crovella (1995) • 22% of the requests generated by the browser were cache misses. • 96% of the total requests were for html files and only 1% for CGI bin requests. • Current studies show that dynamically generated pages ranging from 2 to 6% (Almeida98) ã 1998 Menascé, D. A.. All Rights Reserved.

  15. Workload Characterization at the Client Side Cunha, Bestavros, and Crovella (1995) • 79% of requests were for external servers • Less than 10% of requests were for unique URLs, i.e., URLs not previously referenced. • 9.6% of accesses were to html files with an average size of 6.4 KB and 69% to images with an average size of 14KB. ã 1998 Menascé, D. A.. All Rights Reserved.

  16. Workload Characterization at the Client Side Tauscher and Greenberg (1997) • Six weeks of WWW usage by 23 users. • 58% of pages visited are revisits. • Users tend to visit pages just visited more often than pages visited less recently. ã 1998 Menascé, D. A.. All Rights Reserved.

  17. Workload Characterization at the Proxy Server Abrams, Standrige, Abdulla, Williams, and Fox (1995) • Six months of data from 3 educational sites. • Trace-driven simulation of a cache proxy server. • The maximum cache hit rate was between 30 and 50% for infinite size caches regardless of cache design. ã 1998 Menascé, D. A.. All Rights Reserved.

  18. Workload Characterization at the Server Arlitt and Williamson (1996) • Six WWW servers: academic and commercial. • Number of requests ranged from 188K to 3.5M per site. • Search for invariants. ã 1998 Menascé, D. A.. All Rights Reserved.

  19. Workload Characterization at the Server Arlitt and Williamson (1996) • HTML and image files account for 90-100% of requests • The average size of a transferred document does not exceed 21KB • Less than 3% of the requests are for distinct files. • The file size distribution is Pareto with 0.40 <  < 0.63. I.e., this distribution is heavy-tailed. ã 1998 Menascé & Almeida. All Rights Reserved.

  20. Workload Characterization at the Server Arlitt and Williamson (1996) • Ten percent of the files accessed account for 90% of server requests and 90% of the bytes transferred. • File inter-reference times are exponentially distributed and independent. • At least 70% of the requests come from remote sites. These requests account for at least 60% of the bytes transferred. ã 1998 Menascé, D. A.. All Rights Reserved.

  21. Workload Characterization at the Server Crovella and Bestravos (1996) • Traces of users using Mosaic reflecting requests to over half a million documents. • Purpose: show the presence of self-similarity in Web traffic and explain it through the underlying characteristics of the WWW workload. ã 1998 Menascé, D. A.. All Rights Reserved.

  22. Workload Characterization at the Server Crovella and Bestravos (1996) • File sizes have a heavy-tailed distribution. • This distribution may explain the fact that transmission time distributions are also heavy-tailed. ã 1998 Menascé, D. A.. All Rights Reserved.

  23. Workload Characterization at the Server Almeida and Oliveira (1996) • Used fractal models to study the document reference pattern at Web servers. • Used an LRU stack model to study references to documents stored in two Web sites. • Found strong evidence of self-similarity in the document reference pattern. ã 1998 Menascé, D. A.. All Rights Reserved.

  24. Web Traffic Workload Characterization Bray (1996) • Over 11 million Web pages were analyzed in 1995. • The average page size was 6,518 bytes with a standard deviation of 31,678 bytes. • About 50% of the pages were found to have at least one embedded image and 15% were found to have exactly one image. ã 1998 Menascé, D. A.. All Rights Reserved.

  25. Web Traffic Workload Characterization Bray (1996) • Over 80% of the sites are pointed by a few (between 1 and 10) other sites. • Almost 80% of the sites contain no links to off-site URLs. • Around 45% of the files had no extension and 37% were html files. Then .gif and .txt files were the next most popular with 2.5% each. ã 1998 Menascé, D. A.. All Rights Reserved.

  26. Web Workload Characterization • File size and request sizes are heavy tailed. • Popularity: • Zipf’s Law: the number of references, P, to a file tends to be inversely proportional to its rank r: P = k/r • Temporal locality: • refers to the likelihood that once a document has been requested it will be requested again in the near future. ã 1998 Menascé, D. A.. All Rights Reserved.

  27. Web Workload Characterization • SURGE (Barford and Crovella, ACM Sigmetrics 1998): workload generator that mimics real Web users. • SURGE exercises Web servers quite differently from most commonly used benchmarks (i.e., SPECweb96) • maintains a higher number of open connections • results in much higher CPU load ã 1998 Menascé, D. A.. All Rights Reserved.

  28. Part III Improving Web Performance ã 1998 Menascé, D. A.. All Rights Reserved.

  29. Improving Web Performance Through Caching and Prefetching • Prefetching and caching of inlines. (Dodge and Menascé, 1998) • Prefetching Results of Queries to Search Engines. (Foxwell and Menascé, 1998) ã 1998 Menascé, D. A.. All Rights Reserved.

  30. Improving Web Performance Through Caching and Prefetching • Prefetching and caching of inlines. (Dodge and Menascé, 1998) • Prefetching Results of Queries to Search Engines. (Foxwell and Menascé, 1998) ã 1998 Menascé, D. A.. All Rights Reserved.

  31. No Caching/Prefetching of Inlines Browser Server HTTP request server disk HTTP document HTML document parsed by the browser inline 1 request inline 1 file inline 2 request inline 2 file ã 1998 Menascé, D. A.. All Rights Reserved.

  32. Caching/Prefetching of Inlines ã 1998 Menascé, D. A.. All Rights Reserved.

  33. Web browsers Network 1 - h Disk CPU h Cache WEB Server ã 1998 Menascé, D. A.. All Rights Reserved.

  34. Response Time of Inline Files (in sec) vs. Cache Size (KB) ã 1998 Menascé, D. A.. All Rights Reserved.

  35. Improving Web Performance Through Caching and Prefetching • Prefetching and caching of inlines. (Dodge and Menascé, 1998) • Prefetching Results of Queries to Search Engines. (Foxwell and Menascé, 1998) ã 1998 Menascé, D. A.. All Rights Reserved.

  36. Probability of Access for Lycos Queries vs. URL Position ã 1998 Menascé, D. A.. All Rights Reserved.

  37. Hit Ratio of Query Results ã 1998 Menascé, D. A.. All Rights Reserved.

  38. Hit Ratio vs. Threshold for Lycos Queries ã 1998 Menascé, D. A.. All Rights Reserved.

  39. Part IV Predicting Web Performance ã 1998 Menascé, D. A.. All Rights Reserved.

  40. The Impact of Burstiness • As shown by some measurements (Banga and Druschel 1997), the maximum throughput of a Web server decreases as burstiness increases. • How can we represent the effects of burstiness in performance models? • We know that the maximum throughput is equal to the inverse of the maximum service demand or the service demand of the bottleneck resource. ã 1998 Menascé, D. A.. All Rights Reserved.

  41. WWW Traffic Burst Bytes 107 106 Chronological time (slots of 1000 sec) ã 1998 Menascé, D. A.. All Rights Reserved.

  42. Traffic Burstiness on the Web • a: ratio between the maximum observed request rate and the average request rate during an observation period. • b: fraction of time during which the instantaneous arrival rate exceeds the average arrival rate. ã 1998 Menascé, D. A.. All Rights Reserved.

  43. The Impact of Burstiness(Menascé and Almeida, 1998) • To account for burstiness, we write the service demand of the bottleneck resource as: • D = Df +   b • Dfis the portion of the service demand that does not depend on burstiness •  is a factor used to inflate the service demand according to burstiness factor b. It is given by: •  = (U1/X10 - U2/X20)/(b1-b2) • The measurement interval is divided into 2 subintervals 1 and 2to obtain Ui, Xi0, and bi ã 1998 Menascé, D. A.. All Rights Reserved.

  44. 0.0 0.1 0.2 0.3 Effects of Burstiness on Performance ã 1998 Menascé, D. A.. All Rights Reserved.

  45. Part V Predicting Web Performance: An Example ã 1998 Menascé, D. A.. All Rights Reserved.

  46. Upgrading the Capacity of Your Link to the ISP ã 1998 Menascé, D. A.. All Rights Reserved.

  47. Using QN models to predict Web Performance ã 1998 Menascé, D. A.. All Rights Reserved.

  48. Results of QN Model ã 1998 Menascé, D. A.. All Rights Reserved.

  49. Concluding Remarks • The Web is becoming an important element of the IPG. • Understanding the nature of the Web workload is crucial to being able to predict its performance. • New workload characterization studies for e-commerce sites are required (use of dynamic pages, XML, etc). • Need performance models for the Web that capture the effects of Web traffic characteristics on performance. ã 1998 Menascé, D. A.. All Rights Reserved.

  50. Capacity Planning for Web Performance: metrics, models and methods Prentice Hall, June 1998 Daniel Menascé and Virgilio Almeida ã 1998 Menascé, D. A.. All Rights Reserved.

More Related