1 / 19

Professor: Wan-Shiou Yang Presenter: He-Min Chu Date: 2005/11/04

Mining Web Logs for Prediction Models in WWW Caching and Prefetching Qiang Yang Haining Henry Zhang Carolina Ruiz. Professor: Wan-Shiou Yang Presenter: He-Min Chu Date: 2005/11/04. Outline. Introduction Previous Work In Proxy caching And Prefetching

varana
Download Presentation

Professor: Wan-Shiou Yang Presenter: He-Min Chu Date: 2005/11/04

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mining Web Logs for Prediction Models in WWW Caching and PrefetchingQiang Yang Haining Henry Zhang Carolina Ruiz Professor: Wan-Shiou Yang Presenter: He-Min Chu Date: 2005/11/04

  2. Outline • Introduction • Previous Work In Proxy caching And Prefetching • Building Association-based Prediction Models • Experimental Results • Integrated Predictive Caching And Prefetching • Conclusions And Future Work

  3. Introduction • WWW is growing fast, researchers need to contain network traffic --> web caching • Performance improvement Strategy • Web caching maintain highly efficient but small set of retrieved results in a cache. • Prefetch documents that are highly likely to occur in the near future.

  4. Introduction (Con.) • That many web servers keep a server access log of its users. • Logs can be used to train a prediction model for future document accesses. • Obtain frequent access patterns in web logs and mine association rules for path prediction. • Using association-based prediction model into proxy caching and prefetching algorithms.

  5. Previous Work In Proxy caching And Prefetching • “page replacement policy” : which a new page will replace an existing one. • Rank objects according to a key value computed by factors such as size, frequency and cost. When a replacement is to be made, lower-ranked objects will be evicted from the cache. • EX.GDSF as K(p)= L + F(p) * C(p) / S(p)

  6. Previous Work In Proxy caching And Prefetching (Con.) • Previous work • prefetching popular documents • prefetch the referenced pages from hyperlinks • considering the frequency of accesses of the hyperlinks • This Work • extracts useful knowledge from large-scale web logs and application in web caching and prefetching.

  7. Building Association-based Prediction Models • Extracting Embedded Objects • as images, audio and video files • Mining Frequent Sequences • accumulating the occurrence counts of sequence and pruning sequence with support lower than minimum support

  8. Building Association-based Prediction Models (Con.) • Constructing Association Rules • S1S2…SK-1->SK (conf) • S1S2…SK-1->Oi (conf)

  9. Building Association-based Prediction Models (Con.) • Prediction Algorithm

  10. Experimental Results • future access frequency • Rank key value K(p)= L + ( W(p) + F(p) ) * C(p) / S(p)

  11. Experimental Results (Con.)

  12. Experimental Results (Con.) • Data logs source • EPA 24 hours • NASA 17 days • GDSP • Hit ratio : access_hits / access_times • Byte hit ratio : hit_bytes / access_bytes

  13. Experimental Results (Con.) • N-gram-based algorithm outperforms the other algorithms using all of the selected cache sizes. • Users' access patterns are much more stable over this extended period of time.

  14. Experimental Results (Con.)

  15. Integrated Predictive Caching And Prefetching • Hit rate or byte hit rate does not increase as much as the cache size does. • Trade the minor hit rate loss in caching with the greater reduction of network latency in prefetching. • Almost all prefetching methods require a prediction model -> n-gram model

  16. Integrated Predictive Caching And Prefetching (Con.) • Partition memory • cache-buffer • prefetch-buffer • A prefetching agent keeps pre-loading the prefetch-buffer with documents predicted to have the highest Wi. • If a hit occurs in the prefetch-buffer, the requested object will be moved into the cache-buffer according to original replacement algorithm.

  17. Integrated Predictive Caching And Prefetching (Con.) • Reduce network latency

  18. Integrated Predictive Caching And Prefetching (Con.) • Increase network loading

  19. Conclusions And Future Work • Applied association rules minded from web logs to improve the GDSF algorithm. • By integrating path-based prediction caching and prefetching, it is possible to improve both the hit rate and byte hit rate. • Can extend by taking into account other statistical features such as the data transmission rates.

More Related