1 / 28

Hybrid Prefetching for WWW Proxy Servers

Hybrid Prefetching for WWW Proxy Servers. Yui-Wen Horng , Wen-Jou Lin , Hsing Mei Department of Computer Science and Information Engineering Fu Jen Catholic University, Taiwan, R.O.C International Conference on Parallel and Distributed Systems,1998 Mikt Tien Miketien@syslab.cse.yzu.edu.tw

earl
Download Presentation

Hybrid Prefetching for WWW Proxy Servers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hybrid Prefetching for WWW Proxy Servers Yui-Wen Horng , Wen-Jou Lin , Hsing Mei Department of Computer Science and Information Engineering Fu Jen Catholic University, Taiwan, R.O.C International Conference on Parallel and Distributed Systems,1998 Mikt Tien Miketien@syslab.cse.yzu.edu.tw Syslab Yan Zen

  2. Outline • 1.Introduction • 2.Related work • 3.Prefetching Mechanism • 4.Experiment Result • 5.Conclusion and Future Work

  3. 1.Introduction • Depend on the location of cache,We can classify cache into three types: client cache,server cache,proxy cache • Some studies show that, the maximum possible hit rate of a proxy cache is about 30%-50%.To overcome  prefetch is clear solution • So we classify prefetcher into three types: client prefetcher,server prefetcher,proxy prefetcher • Client Prefetcher can analyze personal requests to predict future request, proxy prefetcher can gather information from multi-client to multi-server.

  4. 2.Related Work • Interactive Prefetching proxy Server(Wcol) (Content Parsing) -- To get linked documents by parsing HTML pages(include images). -- advantage: Hit rate of the cache is more than 60% -- disadvantage: the traffic is 4.12 times larger than a normal caching proxy and task to parse HTML also adds overhead to the server..

  5. Related Work(cont.) • Top-10 Approach --Requires cooperation between web server,proxy and client browser. The higher level servers know the popular documents to their lower level clients. -- advantage: Hit rate more than 40% and increase traffic is no more than 10% in most case. -- disadvantage: In order to achieve good prediction, every proxies and servers need to follow the same policy. That is the major problem in implementation.

  6. Related Work(cont.) • Predictive Prefetching -- The prefetcher install in client, but communicates to a prediction engine ehich is part of web server. This engine tracks client request sequences and builds a dependency graph which contains probability information,the prefetcher can prefetch files with high probability. -- disadvantage: Requires specially designed protocol or modification to HTTP.

  7. Related Work(cont.) • Prefetching Files System for WWW Servers -- It utilizes “referer” information contains in HTTP request message to build access probability graph. “Referer” is a header in HTTP request message, it indicates that the requested URL is linked from which URL. -- advantage: the response time can be reduced more than 20%. -- disadvantage: Not all requests contain this information and it takes time to accumulate enough data to build the graph.

  8. Related Work(cont.) • Our approach -- Hybrid prefetcher that both parse HTML and build access probability graph. To make more intelligent prefetching, both access popularity and probability are considered.

  9. 3.Prefetching Mechanism

  10. 3.1 Problem 1:How to find more documents that may be requested in the near future? • Prefetch by Parsing HTML -- It does not need information from past request history and can find related URLs even the request URL was never retrieved before. -- But ,it increase overhead of server,and increase the traffic

  11. 3.1 Problem 1:How to find more documents that may be requested in the near future?(cont.) • Prefetch by Referer -- Building “Referer link graph” -- The accumulated weight value of each node and edge can also be used to calculate access probability which is useful for prefetching. -- disad: Maintain the graph increase memory overhead and not all requests contain referer information.

  12. 3.1 Problem 1:How to find more documents that may be requested in the near future?(cont.) • Hybrid Prefetch -- If referer exist ,use referer to build “referer link graph” ,else pasing the HTML file to build the link graph. -- The HTML files require parsing are less than first approach, so the CPU overhead is smaller.

  13. 3.1 Problem 1:How to find more documents that may be requested in the near future?(cont.) • Prefetch by Directory -- Assumption: related documents are usually put in the same directory in the web server. -- If the directory structure of the web site does not agree with our assumption, the ratio of successful prefetchinf may be low.

  14. 3.2 Problem 2: How to increase the ratio of prefetched documents that are actually be requested? • Popularity Constraint -- Building a table to track popularity of each requested document.The table is updated when new requested is coming. • Probability Constraint --

  15. 3.2 Problem 2: How to increase the ratio of prefetched documents that are actually be requested?(cont.) • Combined Constraint -- Combination of both constraints by “OR” them. That is ,prefetch a document if it can pass either constraint.

  16. 4.Experiment Results Experiment A

  17. Experiment B-Popularity Constraint(threshold) prefetch level=2 , cache size =10MB

  18. Experiment B—Probability Constraint

  19. 5.Conclusion and Future Work • Hybrid prefetching technique, which is effective to imprpove hit rate of cache proxy and the accuracy of prediction is higher than other methods. • It can accomplish more than 70% cache hit rate and the increased traffic rate is below 40%. • Our experiments also show that separated caches is better than one common cache if total size is small.

More Related