1 / 46

Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol

Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol. By Abuzafor Rasal and Vinoth Rayappan. Web caching. HTTP request. 1. 2. HTTP response. 1. Client1. 2. 1. 1. 2. Client2. 2. 1. Cache. 2. Server. Client3. Web Cache Sharing. Rest of Internet. Bottleneck.

nguyet
Download Presentation

Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Summary Cache: A Scalable Wide-Area Web Cache Sharing Protocol By Abuzafor Rasal and Vinoth Rayappan

  2. Web caching HTTP request 1 2 HTTP response 1 Client1 2 1 1 2 Client2 2 1 Cache 2 Server Client3

  3. Web Cache Sharing . . . . . . Rest of Internet Bottleneck Regional Network Proxy Caches Users

  4. Web Cache Sharing: Internet Cache Protocol (ICP) • Internet Cache Protocol is currently implemented technique of web cache sharing • Internet Cache Protocol = the proxy multicastsa query message to all other proxies whenever a cache miss occurs.

  5. Proxy Proxy Proxy Proxy Cache Cache Cache Cache Internet Cache Protocol Client Internet

  6. Internet Cache Protocol First request: document is available in local proxy. INTERNET … Proxy Proxy Proxy 1 2 N HIT HTTP HTTP ….. Client 1 Client 2 Client n

  7. Internet Cache Protocol Second Request: document is not available in local proxy. INTERNET … ICP Proxy Proxy Proxy 1 2 N HTTP HTTP ….. Client 1 Client 2 Client n

  8. Problem of ICP • As the number of collaborating proxies increase the overhead dramatically increases, thus not scalable. • A proxy multicasts a query message to all other proxies whenever a cache miss occurs

  9. Problem of ICP • UDP = ICP query and replay messages • TCP = HTTP traffic between proxies, servers, and clients • Total Packets or IP = UDP + TCP

  10. Problem of ICP + = ;

  11. Summary Cache • Each proxy maintains a Bloom Filter (data in compressed form) representing its local cache. • Also, it holds Bloom Filters representing caches of other proxies. • Updates to Bloom Filters are exchanged periodically or after a certain percentage of the documents in the cache was replaced. • Request is sent only to proxy who most likely holds the requested document.

  12. Proxy Proxy Proxy Proxy Cache Cache Cache Cache Summary Cache First request: document is in other proxy Client Internet

  13. Proxy Proxy Proxy Proxy Cache Cache Cache Cache Summary Cache Second request: the document is not in any proxy Client Internet

  14. Proxy Proxy Proxy Proxy Cache Cache Cache Cache Summary Cache Third request: summary gives false hit Client Internet

  15. Summary Cache • Two Parameter to design of Summary Cache protocol: • The frequency of summary updates. (inter-proxy traffic, overhead) • The representation of summary (memory). • Above Solution: • Delay update summaries until a fixed percentage i.e. 1% of the cached documents are new. • Positive: Reduce overhead (traffic) • Negative: Introduce “false miss” error • Store summaries as a “Bloom Filter”. This is efficient hash-based probabilistic scheme that represent URLs of cached document. • Positive: Reduce memory requirement • Negative: Introduce “false hit” error

  16. Summary Cache • false misses: • Definition : • the document requested is cached at some other proxy but its summary does not reflect the fact. • Effect: • In this case, a remote cache hit is lost, and the total hit ratio within the collection of caches is reduced. • Improvement: • can be eliminated/improved with higher frequency of update • false hits: • Definition: • the document requested is not cached at some other proxy but its summary indicates that it is. The proxy will send a query message to the other proxy, only to be noticed that the document is not cached there. • Effect: • In this case, a query message is wasted. • Improvement: • can be eliminated/improved by increasing the vector size of Bloom Filter or increase memory size of representation

  17. Summary Cache • Remote Stale Hits: document is cached at another proxy but the cached copy is stale. (Not because of update delay) • Delta compression can be used to transfer the new document. Delta compression transfers only the difference between the old and the new document instead of downloading the whole document.

  18. Summary Cache • Two factors limit the scalability: • The network overhead, the inter-proxies communication. • Determined by update frequency, false hits and remote hits • Memory required to store the summaries. • Determined by size of individual summary and # of proxies.

  19. Impact of Update Delay: Explanation of the Graph ICP = Hit ratio when no update delay is introduced exact_dir = Hit ratio with update delay introduced false_hit = No delay – delay = ICP – exact_dir stale-hit = Remote stale hit due to the document is stale (out dated) but not reflected in summary

  20. Impact of Update Delay: Observation of the Graph exact_dir = hit ratio decrease linearly as threshold increases. stale-hit = not effected by threshold because stale-hit error exist for both ICP and Summary Cache. False-hit = increases as threshold increases because deleted document in cache may still be show present in summary.

  21. Summary Representations • Summary Representation = how to store the summaries in proxies. • Summary needs to be stored in DRAM (main memory) • Disk arms become bottlenecks in proxy cache • DRAM price continues to drop • DRAM is faster

  22. Summary Representations: Naïve approach • Exact-directory = the summary is essentially the list of URLs of cached documents, with each URL represented by its 16-byts MD5 signature. • Positive: Less errors • Negative: Consumes too much memory • Server-name = web server names in the URLs of cached documents. • Positive: Cut down memory requirement by a factor of 10 but introduces errors • Negative: Generate too many false hit thus increase network traffic

  23. Summary Representations: Bloom Filters • Process • Step 1: Take each URL as an input to four different hash functions. • Step 2: Take each output of hash function (32 bits) and convert to 1 bit. • Step 3: Store 4 bits from four different hash functions and stores into a vector. • Positive: Consumes much less memory • Negative: Introduce insignificant errors

  24. Summary Representations • Server name produces too much traffic in network because request is send to any proxies that has server name.

  25. Bloom filter Bloom filter is type technique used for compression of memory space( To avoid false hit) Summary cache : uses the bloom technique to do compression A method of representing a set of “A” of n elements to support the membership queries. It is a mechanism for identifying which pages have associated comments stored with in common knowledge server

  26. Problem? • Place A place B arbitrary URI Bloom cnn.com/index.html wayne.edu/ ? Compact Representation

  27. How the bloom works? • Pick a large bit array with all ‘0’s • Pick # of independent hash function , in this case we have four(4) • Every URL in the bag (Proxy summary cache) , you apply the four hash function, and we will be getting four integers. • Use the four integers in to the bit array • Turn all the bits to 1 • Repeat this to all URL in Proxy summary cache • The above is the Encryption process. • Repeat above steps in reverse for decrypting.

  28. How does hash works? • Hash function turns data into a relatively small number that may serve as a digital "fingerprint" of the data.

  29. Bloom filter • A hashing technique • m bit • k independent hashing function • many to one mapping • “false positive

  30. Bloom filter • False positive - Given the query to b, we check bits at position h1(b), h2(b)…..,hk(b)..if any of them is 0, b is not in the set of A. - Other wise we know b is in a set A, although there is a certain probability that we are wrong. • If fall positive increases number of access will go up, but when the fall negative increase , probability of getting wrong doc will go up. • The salient feature of Bloom is there is a trade of between memory size(array) and false positive.

  31. Probability of false positive • upper graph: for 4 hash functions • lower graph: optimal integral number of hash functions(5 hash function)

  32. Bloom filter as summaries • Provides straight forward mechanism to built summaries • Proxy build bloom from the URL of cached docs • Thus increasing the memory can decrease flase positive and other wise • provides the clear trade between the above two

  33. How the hash function built? www.abc.com MD5 32 bit hash 32 bit hash 32 bit hash 32 bit hash 101101110101010111100 …… 010111 128 bit

  34. Hit ratio

  35. Obeservations of the cache hit ratio • Exact_dir and bloom filter_8, _16,_32 is have virtually the same hit ratio compared to server name. • Exact_dir will give same hit as bloom, but it will consume more memory to store all the informations of URL. • Incase of Bloom filter_8_16_32,it will consume less memory than exact_dir, because of hash function.

  36. False hit ratio under different summary representations

  37. Observation of false hit (miss) ratio • Server name has a much higher false hit (miss) ratio. Why? • Because it just got the server name and don’t have a specific address of the requested URL. • So the request will be sent to all other proxies, but the hit will be in any of the one proxy and obviously false hit is high. • Exact_dir will have less false hit ratio compared to all (but it does need large cache size (memory).

  38. Message per request

  39. Observations on Msg/request • We included ICP in for a comparative study. • In case of ICP( With out the summary cache) the request will sent to all proxy to find the requested URL. So obviously messages/client request will be high compared to others. • In the other extreme the bloom_8_16_32 and exact_dir will spend much less msg/client request to find the URL. It is good and economical to go with. • Server name will be in the mid the above, because it got more false hit (miss). So higher the msg/client request.

  40. Bytes of Msg size per request

  41. Observations on size of inter network msg in bytes • We are considering this issue because, update messages is of higher size than the query messages. • So, Summary caches uses the occasional burst of large messages in between the small query messages. So it reduces CPU overhead and network interface packet (Results are table 2 and 4) significantly

  42. Memory requirments in terms of % of Proxy cache: NLANR 4 proxies

  43. Memory requirments in terms of % of Proxy cache: DEC 16 proxies

  44. Summary • Web caching is an active research area. • Directory server: Approach uses the a central server to keep track of the cache directories of all the proxies query the server for the cache hits in other proxies • The above approach is failed because being a centralized server the network overhead will be high because of serving the all request. • To over come the above we got a summary cache enabled ICP web-cache sharing protocol. • Our inspection of the Quesnet traces showed that the chid to parent ICP queries can be a significant portion of the messages that the parent proxy has to process. So in this case applying the summary cache will significantly reduce the # of queries and overhead.

  45. Future work • Plan to investigate the impact of the protocol on the parent – child proxy cooperation and the optimal hierarchy configuration for a given work load • Plan to investigate the application of summary cache in various web-cache consistency protocol • Plan to design new method for summary cache implementation in proxy to speed up the look up.

  46. Conclusion • We proposed the summary-cache enhanced ICP, a scalable world wide web cache sharing protocol and proved it is the best to go with compared all other techniques. • Our study has two key concepts effects of delayed updates of summary cache, and the representation of summary. • Solution to first is, we can delayed the updates1 % to 10 % (Proved based on trace driven simulation) and it will cause errors but it is bearable. • Solution to second problem, we introduced bloom filter technique for representation of summary cache. • We achieve over 50 % reduction in bandwidth, and reduces the inter-proxy communication messages by a factor of 25 to 60.

More Related