1 / 18

Mining Web Logs for Prediction Models in WWW Cashing and Prefetching

Mining Web Logs for Prediction Models in WWW Cashing and Prefetching. Coming from : In The Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining KDD ’ 01, August 26 - 29, 2001 San Francisco, California, USA

nelson
Download Presentation

Mining Web Logs for Prediction Models in WWW Cashing and Prefetching

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mining Web Logs for Prediction Models in WWW Cashing and Prefetching Coming from : In The Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining KDD’01, August 26 - 29, 2001 San Francisco, California, USA Author : Qiang Yang , Haining ,Henry Zhang , Tianyi Li Professor : Dr.Yang Student : Gun –Ren Wang

  2. Outline • Introduction • Page replacement policy • GD-Size & GDSF • Extracting Embedded Object • Mining Frequent Sequences • Prediction Algorithm • Conclusion

  3. Introduction • As the World Wide Web is growing at a very rapid rate, researchers have designed various effective caching algorithms to contain network traffic. • An important advantage of the WWW is that many web servers keep a server access log of its users. These logs can be used to train a prediction model for future document accesses.

  4. Performance Metrics • Hit Rate (HR) The rate between the number of requests that hit in the proxy cache and the total number of requests. • Byte Hit Rate (BHR) The rate between the number of bytes that hit in the proxy cache and the total number of byte requested.

  5. Page replacement policy • Least-Recently-Used (LRU) : Evicts the document that was requested least recently • Least-Frequently-Used (LFU) : replaces the document that has been accessed for the least number of times • Size replaces the large document • Lowest-Latency-First is aimed to minimized the average latency

  6. GD-Size Based on the original GD algorithm , Cao and Iran incorporated the size factor and introduced Greedy-Dual-Size algorithm for web caching to improved the efficiency of the original GD algorithm. K(P)= L + C(P) / S(P) C(P) is the cost to bring document P into the cache ; S(P) is the document size ; L is an aging factor that start at 0 and is updated to the key value of the last replaced document.

  7. Algorithm GD-Size

  8. GDSF Cherkasova improved GD-Size algorithm by incorporating a frequency count in the computation of key values .GDSF is called Greedy-Dual-Size-Frequency algorithm. K(P) = L + F(P) * C(P) / S(P) ,where F(P) is the access count of document p , F(P) =F(P) + 1 We denote this replacement policy as GDSF ,When the cost function is set to the document size, K(P)=L+F(P) will achieve the best byte hit rate .

  9. Extracting Embedded Object HTML documents also act as containers of other web objects ,such as images, audio and video files. There objects are called as part of their HTML documents are called embedded objects. Embedded object

  10. Mining Frequent Sequences From the graph ,We generate N-gram prediction rule : S1.S2.S3….Sk-1 Sk The condition probability P(Sk|S1.S2…Sk-1), Conf = count(S1.S2…Sk)/count(S1.S2…Sk-1) If Sk has embedded objects, the following rule can be deducted immediately from EOT S1.S2.S3….Sk-1 Oi 0-->i-->n , where Conf(i) = conf * Pi

  11. Algorithm of mining frequent sequences

  12. Embedded object Table 1

  13. Embedded object Table 2

  14. Prediction Algorithm • The process of building a set of association rule and EOT is called training. Once the training is finished, we can apply these rule to predict the future requests by matching the longest path first. • Let O(i) denote a web object on the server ,S(j) be a session for object O(i), let W(i) be the future frequency of requests to object O(i).

  15. Example

  16. Prediction Algorithm

  17. Extend GDSF we extend GDSF to incorporate the W(p): K(p)=L+( W(p) + F(p) )*C(p)/S(p) which implies that the key value of a page p is determined not only by its past occurrence frequency, but also effected by its future frequency. K(p)=L+( K*W(p) + (1-K)*F(p) )*C(p)/S(p) ,K will be a weight value between 0 to 1.

  18. Conclusion We applied association rules minded from web logs to improve the well-known GDSF algorithm. By integrating path-based predcition caching and prefetching , it is possible to dramatically improve both the hit rate while reducing the network latency.

More Related