1 / 44

On Filter Effects in Web Caching Hierarchies

On Filter Effects in Web Caching Hierarchies. Carey Williamson Department of Computer Science University of Calgary. Introduction. “The Web is both a blessing and a curse…” Blessing: Internet available to the masses Seamless exchange of information Curse: Internet available to the masses

utecht
Download Presentation

On Filter Effects in Web Caching Hierarchies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On Filter Effects inWeb Caching Hierarchies Carey Williamson Department of Computer Science University of Calgary

  2. Introduction • “The Web is both a blessing and a curse…” • Blessing: • Internet available to the masses • Seamless exchange of information • Curse: • Internet available to the masses • Stress on networks, protocols, servers, users • Motivation: techniques to improve the performance and scalability of the Web

  3. Why is the Web so slow? • Client-side bottlenecks (PC, modem) • Solution: better access technologies • Server-side bottlenecks (busy Web site) • Solution: faster, scalable server designs • Network bottlenecks (Internet congestion) • Solutions: caching, replication; improved protocols for client-server communication

  4. Web Client Web Client Web Client Web Client Web Client Example of a Web Proxy Cache Web server Web server Web server Proxy server

  5. Our Previous Work • Evaluation of Canada’s national Web caching infrastructure for CANARIE’s CA*net II backbone • Workload characterization and evaluation of CA*net II Web caching hierarchy (IEEE Network, May/June 2000) • Developed Web proxy caching simulator for trace-driven simulation evaluation of Web proxy caching architectures • Developed synthetic Web proxy workload generator called ProWGen [Busari/Williamson INFOCOMM 2001]

  6. CA*net II Web Caching Hierarchy (Dec 1998) (selected measurement points for our traffic analyses; 6-9 months of data from each) USask CANARIE (Ottawa) To NLANR

  7. Caching Hierarchy Overview Cache Hit Ratios Top-Level/International (20-50 GB) 5-10% Proxy (empirically observed) Proxy National (10-20 GB) Proxy 15-20% Regional/Univ. (5-10 GB) 30-40% Proxy Proxy Proxy ... ... C C C C C C C

  8. Some Observationson Multi-Level Caching... • Caching hierarchy not very effective, due to a “diminishing returns” effect • Reason: workload characteristics change as you move up the caching hierarchy (due to filtering effects, etc) • Bigger caches aren’t really the answer • Better caching system design might be...

  9. Research Goals • Develop better understanding of cache filter effects (intuitively, quantitatively) • Try to do something about it! • Idea #1: Try different cache replacement policies at different levels of hierarchy • Idea #2: Try partitioning cache content in overall hierarchy based on size or type to limit replication, etc.

  10. Talk Overview • Background/Motivation • Understanding Cache Filtering Effects • Exploiting Cache Filtering Effects • Summary and Conclusions

  11. Part I: UnderstandingCache Filter Effects

  12. Upper Level (Parent) Lower Level (Children) Proxy server Proxy server Proxy server Simulation Model Web Servers Web Clients

  13. Experimental Methodology • Trace-driven simulation (empirical traces) • Multi-factor experimental design • Cache size • 1 MB to 32 GB • Cache Replacement Policy • Recency-based LRU (currently active docs) • Frequency-based LFU-Aging (popular docs) • Size-based GD-Size (favours smaller docs) • Analyze workload characteristics

  14. Web Workload Characteristics • “One-timers” (60-70% docs are useless!!!) • Zipf-like document referencing popularity • Heavy-tailed file size distribution (i.e., most files small, but most bytes are in big files) • Zero correlations between document size and document popularity (debate!) • Temporal locality (temporal correlation between recent past and near future references) [Mahanti et al. PER 2000]

  15. Zipf-Like Referencing • An intrinsic “power-law” relationship in the way that humans organize, access, and use information (e.g., library books, English words in text, movie rentals, Web sites, Web pages, ...) • Plot item popularity versus relative rank, on a log-log scale, results in straight line

  16. Example: Zipf-Like Document Popularity Profile for UofS Trace

  17. Quiz Time: What do you get AFTER the cache?

  18. Quiz Time: What do you get AFTER the cache?

  19. Quiz Time: What do you get AFTER the cache? (a)

  20. Quiz Time: What do you get AFTER the cache? (a) (b)

  21. Quiz Time: What do you get AFTER the cache? (a) (b) (c)

  22. Quiz Time: What do you get AFTER the cache? (a) (b) (d) (c)

  23. Quiz Time: What do you get AFTER the cache? Answer: (c) (c)

  24. Simulation Results for Input Workload Traces with Different Initial Zipf Slopes

  25. The Magnitude of the Filter Effect Depends on Cache Size

  26. Filter Effect Depends on Cache Replacement Policy

  27. Filter Effect is Most Pronounced at First-Level Cache

  28. Part II: ExploitingCache Filter Effects

  29. Research Questions:Multi-Level Caches • In a multi-level caching hierarchy, can overall caching performance be improved by using different cache replacement policies at different levels of the hierarchy? • In a multi-level caching hierarchy, can overall performance be improved by keeping disjoint document sets at each level of the hierarchy?

  30. Upper Level (Parent) Complete Overlap No Overlap Lower Level (Children) Partial Overlap (50%) Proxy server Proxy server Proxy server Simulation Model Web Servers Web Clients

  31. Performance Metrics • Document Hit Ratio • Percent of requested docs found in cache (HR) • Byte Hit Ratio • Percent of requested bytes found in cache (BHR)

  32. Parent Parent Children Children Experiment 1: Different Policies at Different Levels of the hierarchy (a) Hit Ratio (b) Byte Hit Ratio

  33. Parent Children

  34. Experiment 2:Sensitivity to Workload Overlap • The greater the degree of workload overlap amongst the child proxies, the greater the role for the parent cache • In the “no overlap” scenario, the parent cache has negligible hit ratios, particularly when child caches are large

  35. Experiment 3:Size-based Partitioning • Partition files across the two levels of the hierarchy based on size (e.g., keep small files at the lower level and large files at the upper level) (or vice versa) • Three size thresholds for “small”... • 5,000 bytes • 10,000 bytes • 100,000 bytes

  36. Small files at the lower level; Large files at the upper level Children Size threshold = 10,000 bytes Parent Size threshold = 5,000 bytes

  37. Large files at the lower level; Small files at the upper level Children Parent Size threshold = 10,000 bytes Size threshold = 5,000 bytes

  38. Summary: Multi-Level Caches • Different Policies at different levels • LRU/LFU-Aging at the lower level + GD-Size at the upper level provided improvement in performance • GD-Size + GD-Size provided better performance in hit ratio, but with some penalty in byte hit ratio • Size-threshold approach • small files at the lower level + large files at the upper level provided improvement in performance • reversing this policy offered no perf advantage

  39. Conclusions • Existing multi-level caching hierarchies are not always that effective, due to cache filtering effects • “Heterogeneous” caching architectures may better exploit workload characteristics and improve Web caching performance

  40. For More Information... • M. Busari, “Simulation Evaluation of Web Caching Hierarchies”, M.Sc. Thesis, Dept of Computer Science, U. Saskatchewan, June 2000 • C. Williamson, “On Filter Effects in Web Caching Hierarchies”, ACM Transactions on Internet Technology, 2002 (to appear). • Email: carey@cpsc.ucalgary.ca • http://www.cpsc.ucalgary.ca/~carey/

More Related