1 / 40

Characterizing Web Workload of Mobile Clients

Characterizing Web Workload of Mobile Clients. Chuang Yu Juha Raitio. Outline. Web workload analyses What Why How Characteristics of workload Wireline Wireless Case study results Statistical characteristics of Web workload Power laws Self-similarity

swann
Download Presentation

Characterizing Web Workload of Mobile Clients

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Characterizing Web Workload of Mobile Clients Chuang Yu Juha Raitio

  2. Outline • Web workload analyses • What • Why • How • Characteristics of workload • Wireline • Wireless • Case study results • Statistical characteristics of Web workload • Power laws • Self-similarity • Examples of workload analyses tools • Summary

  3. What? • Content Analysis • User behavior analysis • User load distribution • Session duration • Temporal stability • Spatial locality • System load analysis • How do users come to visit the web site? • Why do users leave the web site? • What contents are users interested in? • How do users’ interest vary in time? • How do users’ interest vary across different geographic region?

  4. Why? • Characteristics of user load have significant implications on • Web site design • Content management • Protocol design • Capacity planning • Content provider: Enhance user experience through more effective design and content management • Service provider: Efficient resource allocation, capacity planning, and pricing • System designer: Shed light on performance bottlenecks and effectiveness of protocols

  5. How? • Gathering requirements, what are the goals of the analysis? • Planning and design the data collection • What data to collect? • Over how long period of time? • From where? Web proxies, Web browsers and Web servers • What is the scope? How large? How many? • What methods to use? What analysis needed? How to analyze data? • Collecting data • Analysis the traces with statistic and mathematics approaches • Execute different analysis • Content analysis • User behavior analysis • System load analysis

  6. Wireline user workload characterization (1) Content analysis • Content type • Pure text • Graphics-rich multimedia • Majority mix of both • Content size • Size of all contents in a web server • Size of content that is transferred by a web server • Nonnegligible fraction of files are very large • Median transfer size ~2kB, Median content size a few hundred bytes larger • Content popularity • Highly depends on where traces are collected • Content Modification Pattern • Large variation in modification pattern, lots of contents never modified, some were modified at least once between two consecutive accesses. • Content type dependent, e.g. news web site • Most file modifications are small • Past modification interval, gives a rough prediction about its future modification time

  7. Wireline user workload characterization (2) User behavior analysis • User Request Arrival and duration • Occur at three levels: session, click and request • User dependent • The number of clicks in a session, the number of embedded images in a web page, think time, and active time can be modeled with Pareto distributions with heavy tails. • 8 second rule • Temporal locality and stability • A page is accessed now, what is the likelihood it will be accessed again in the near future? • Stronger temporal locality implies caching would be effective • Access ranking stability, stability is high on the scale of days • Spatial locality • Capture how likely people in the same geographic location or at the same organization request similar set of document • Effectiveness of proxy caching • Organization and domain membership is significant • “hot” event dominant the membership

  8. Wireline user workload characterization (3) System load analysis • Load varies with time and recent event, e.g. World Cup, Sept 11…. • Self-similar web traffic

  9. Wireless user workload characterization • WAP traffic • Access rate is still low, 80,000 entries in 7 months (99) • Amount of data is less than voice • Metropolitan wireless network • Usage behavior shows diurnal and weekly pattern • Users do not move frequently • WLAN • In campus, session-oriented and chat-oriented, incoming traffic exceeds outgoing traffic; high degree roaming within sessions, sessions are short normally • Conference, users are evenly distributed across AP;Web and SSH account 64% traffic; short session, 60% less than 10 min; bandwidth distribution is highly uneven across AP • Corporate, different user impose different load;

  10. Case study • ”A popular commercial Web site designed for Mobile clients” • Provides Web access for wireline, wireless and offline use • Provides notification services • Analyses • Web access • Notifications • Comparison between Web access and notications use • Comparison between wireline and wireless use • Motivation • To give an general overview the analyses process and data • To show some more concrete results • To illustrate possibilities of the analyses • To propose direct implications of results

  11. Case study - architecture

  12. Case study - material • Web access logs • for 12 days (August 2000) • per user • per request • Notification logs • for 6 days • per user • per notification • Types of Web access

  13. Case study – Web content • What content was available for wireless use?

  14. Case study – Web content size • How retrieved content varied in size? • Replies are small: • 98% of replies for wireless are less than 3kB • 98% of replies for offline are less than 6kB • 80% of bytes are carried in replies of size 10kB or more • Implications: systems could be optimized for small replies

  15. Case study – Web content popularity • How popularity varied across documents? • Heavy tailed distribution • 0,1-0,5% of documents returned by 90% of the requests • Implications: caching could be very effective

  16. Case study – Web user load distribution • How did individual users contribute to the load? • Heavy tailed distribution • Small group of users generate majority of the load • Implications: different pricing for different user groups needed

  17. Case study – stability of Web access • How did interest vary during weekdays? • Interests are relatively stable • Of top 100 popular request, 80% remain popular during a week • Of top 1000, 70% • Implications: performance can be optimized over the stable set

  18. Case study – locality of Web access • Did people in the same region issue similar request? • Randomly sampled user groups don’t differ from local users • Geographic locality in requests is insignificant • Implications: geographic distribution of servers/content does not require localization

  19. Case study – notification popularity • What type of content was available as notifications and how popular it was?

  20. Case study – notification size • How notification messages varied in size? • Notification are small • All messages contain less than 256 bytes • Implications: if delivery is not optimized, overhead caused by a network protocols may be considerable

  21. Case study – notification popularity • How popularity varied across notifications? • Heavy tailed distribution • Top 1% notifications accounted for 60% of messages • Implications: multicasting notifications would yield significant savings

  22. Case study – notification load distribution • How did individual users contribute to the notification load? • Heavy tailed distribution • Top 5% of clients received 25% of notification messages • Top 10% received 40% • Implications: different pricing for different user groups needed

  23. Case study – locality of notifications • Did people in the same region receive same notifications? • Randomly sampled user groups differ from local users • Users in same regions share notification content • Implications: regional differences may be utilized in planning of geographic distribution of servers/content

  24. Correlation bwn browsing and notification • Limited correlation between client’s notification and browsing usage • People use two services for different purposes, two services deliver different type of contents • The result is useful to web design and pricing plan Number of users who have overlap between their top N browsing categories and top N notification categories.

  25. Workload comparison bwn wireline and mobile web • Comparison in content • Web content is richer then wireless • Content size is smaller in wireless, limited display and bandwidth • Wireless content shares the Zipf-like popularity distribution as wireline content • Comparison in User behavior • Both user dependent • Both exhibit temporal stability • Wireless user does not exhibit strong spatial locality, limited content • Comparison in system load • Both exhibit a diurnal and weekly variation • Wireless server load is smaller than wireline server • Web site for mobile clients has more heterogeneous population of users

  26. Power laws • Measure y depends on another measure x in linear dependence of the ath power of x • Power law distributions (a.k.a heavy-tail distributions) include e.g. the Zipfian and Pareto distributions • Why? Finding suitable distribution for observed data allows for probabilistic inference on the underlaying phenomenom in closed form

  27. Power laws and the Web • Several distributions derived from the topology of the Internet at router and domain level follow a power law • Number of documents per Web site or file system • Size of documents per Web site or file system • Session durations • Links between web pages • Example (a = -0.46):

  28. Self-similarity • Self-Similar (a.k.a. fractal) data: • Maintains its bursty characteristic even when aggregated over wide range of time scales • Slowly decaying variance • Long range dependence (not memoryless) • Underlaying phonomenom • Data generators which are either ON or OFF • The distribution of ON and OFF times (or message sizes) are heavy tailed • Aggregation of these data leads to self-similarity • Internet/WWW traffic is self-similar

  29. Self-similarity and the Web

  30. WebTraff:A GUI for Web Proxy Cache Workload Modelling and Analysis • An extended and improved version of ProWGen (Proxy Workload Generator), including a GUI interface to a useful set of tools for Web traffic modelling and analysis • Purpose: To facilitate the easy generation and analysis of controllable and representative workloads for Web caching simulations • The WebTraff toolkit provides three main functions: • Web workload trace generation • Web workload trace analysis • Web proxy cache simulation • Graphs displayed in PostScript format

  31. WebTraff GUI Interface

  32. Web Workload Generation

  33. Web Workload Analysis • Two main categories of analysis functions: • Time series analysis (on the left) • Web workload analysis (on the right) • Radio buttons, slide bars and text boxes available to control plotting characteristics

  34. Requests per Interval(time series plot)

  35. Popularity Distribution plot

  36. Document Size Distribution (zoomed)

  37. Web Proxy Cache Simulation • Application-level caching simulation parameters • Cache size • Cache replacement policy • Five replacement policies currently available • Random replacement (RAND) • First-In-First-Out (FIFO) • Least-Recently-Used (LRU) (default setting) • Least-Frequently-Used (LFU) • Greedy-Dual-Size (GDS)

  38. For More Information about WebTraff • WebTraff toolkit: • http://www.cpsc.ucalgary.ca/~carey/software.htm • “ProWGen: A Synthetic Workload Generation Tool for the Simulation Evaluation of Web Proxy Caches” • Busari/Williamson, Computer Networks, Vol 38, No 6, June 2002 • http://www.cpsc.ucalgary.ca/~carey/publications.htm • Contact information: • Email {carey,nayden}@cpsc.ucalgary.ca

  39. Summary • Workload characterization is information that usefull for making better decisions on • Web site/application design • Content management • Protocol design • Capacity planning • Service pricing • etc. • Workload characterization can be gained through • Gathering requirements for the analyses • Planning of data acquisition • Statistical analyses of the data • Mathematical modeling • There are tools for workload characterization • Power-law and self-similarity characteristics of load make the Web different from good old telephony world • Same models and optimization don’t necessarily apply in these two worlds

  40. References • Adya A, Bahl B, Qiu L. ”Characterizing Web Workload of Mobile Clients” in ”Content Networking in the Mobile Internet”, Ch5. Dixit S, Wu T (eds), 2004 • Adya A, Bahl B, Qiu L. ”Characterizing Alert and Browse Services for Mobile Clients”, 2002 • Kramer G., ”Self-similar Network Traffic”, 2001 • Martin J. Fischer, Thomas B. Fowler. ”Fractals, Heavy-Tails, and the Internet”, 2001 • Markatchev N, Williamson C. ” WebTraff:A GUI for Web Proxy Cache Workload Modelling and Analysis”, Department of Computer Science, University of Calgary, 2002

More Related