1 / 64

Objective-Optimal Algorithms for Long-term Web Prefetching

Objective-Optimal Algorithms for Long-term Web Prefetching. Ajay Kshemkalyani ( jointly with Bin Wu ) Univ. of Illinois at Chicago ajayk@cs.uic.edu. Outline. Prefetching: definition and background Survey of web prefetching algorithms Performance metrics

aziza
Download Presentation

Objective-Optimal Algorithms for Long-term Web Prefetching

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Objective-Optimal Algorithms for Long-term Web Prefetching Ajay Kshemkalyani (jointly with Bin Wu) Univ. of Illinois at Chicago ajayk@cs.uic.edu

  2. Outline • Prefetching: definition and background • Survey of web prefetching algorithms • Performance metrics • Objective-Greedy algorithms (O(n) time) • Hit rate greedy (also hit rate optimal) • Bandwidth greedy (also bandwidth optimal) • H/B greedy • H/B-Optimal algorithm (expected O(n) time) • Simulation results • Variants under different constraints

  3. Introduction Web caching reduces user-perceived latency • Client-server mode • Bottleneck occurs at server side • Means of improving performance: • local cache, proxy server, server farm, … • Cache management: LRU, Greedy dual-size, … On-demand caching vs. (long-term) prefetching • Prefetching is effective in dynamic environments. • Clients subscribe to web objects • Server “pushes” fresh copies into web caches • Selection of prefetched objects based on long-term statistical characteristics, maintained by CDS

  4. Introduction • Web prefetching • Caches web objects in advance • Updated by web server • Reduces retrieval latency and user access time • Requires more bandwidth and increases traffic. • Performance metrics • Hit rate • Bandwidth usage • Balance of the two

  5. Object Selection Criteria • Popularity (Access frequency) • Lifetime • Good Fetch • APL

  6. Web Object Characteristics • Access frequency • Zipf-like request model is used in web traffic modeling. • The relationship between access frequency p and popularity rank i of web object:

  7. Web Object Characteristics • The generalized “Zipf’s-like” distribution of web requests is calculated as: • k is a normalization constant, iis the object ID (popularity rank), and α is a Zipf’s parameter: • 0.986 (Cunha et al.), • 0.75 (Nishikawa et al.) and • 0.64 (Breslau et al.)

  8. Web Object Characteristics • Size of objects (heavy-tailed Pareto, lognormal) • Average object size:10–15 KB. • No strong correlation between object size si and its access frequency pi. • Access (read) pattern of objects: (Poisson) • Average access rate api • Lifetime of web objects (exponential) • Average time interval between updates li • Weak correlation between access frequency pi and lifetime li.

  9. Caching/Prefetching Architecture Reuters NYSE Prefetching algorithm Cache BBC BSE

  10. Caching Architecture • Prefetching selection algorithms use as an input these global statistics: • estimates of object reference frequencies • estimates of object lifetimes • Content distribution servers cooperate to maintain these statistics • When an object is updated in the original server, the new version will be sent to any cache that has subscribed to it.

  11. Solution space for web prefetching • Two extreme cases: • Passive caches (non-prefetching) • Least network bandwidth and lowest cache hit rate • Prefetching all objects • 100% cache hit rate • Huge amount of unnecessary bandwidth • Existing algorithms use different object-selecting criteria and fetch objects exceeding some threshold.

  12. Existing Prefetching Algorithms • Popularity [Markatos et al.] • Keeps the most popular objects in the system • Updates these objects immediately when they change • Criterion – object’s popularity • Expected to achieve high hit rate • Lifetime [Jiang et al.] • Keeps objects with longest lifetimes • Mostly considers the network resource demands • Threshold – the expected lifetime of object • Expected to minimize bandwidth usage

  13. Existing Prefetching Algorithms • Good Fetch [Venkataramani et al.] • Computes the probabilitythat an object is accessed before it changes. • Prefetches objects with “high probability of being accessed during their average lifetime” • Prefetches object iif the probability exceeds threshold. • Objects with higher access frequencies and longer update intervals are more likely to be prefetched • Balances the benefit (hit rate increase) against the cost (bandwidth increase) of keeping an object.

  14. Existing Prefetching Algorithms • APL [Jiang et al.] • Computes apl values of web objects. • apl of an objectrepresents “expected number of accesses during its lifetime” • Prefetches object iif its apl exceeds threshold. • Tends to improve hit rate; attempts to balance benefit (hit rate) against cost (bandwidth). • Enhanced APL: apkl • k>1, prefers objects with higher popularity (emphasize hit rate) • k<1, prefers objects with longer lifetime (emphasize network bandwidth)

  15. Objective-Greedy Algorithms • Existing algorithms choose prefetching criteria based on intuitions • not aimed at any specific performance metrics • consider only individual objects’ characteristics, not the global impact • None gives optimal performance based on any metric • Simple counter-examples can be shown

  16. Objective-Greedy Algorithms • Objective-Greedy algorithms select criteria to intentionally improve performance based on various metrics. • E.g., Hit Rate-Greedy algorithm aims to improve the overall hit rate, thus, reduce the latency of object requests.

  17. Steady State Properties • Steady state hit rate for object i is defined as freshness factor, f(i) • Overall hit rate: • On-demand hit rate:

  18. Steady State Properties • Steady state bandwidth for object i • Total bandwidth: • On-demand bw:

  19. Objective Metrics • Hit rate – benefit • Bandwidth – cost • H/B model – balance of benefit and cost • Basic H/B • Enhanced H/B

  20. H/B-Greedy Prefetching • Considers the H/B value of on-demand caching: • If object j is prefetched, then H/B is updated to:

  21. H/B-Greedy Prefetching • We define as the increase factor of object j, incr(j). • incr(j) indicates the factor by which H/B can be increased if object j is selected.

  22. H/B-Greedy Prefetching • H/B-Greedy prefetching prefetches those m objects with greatest increase factors. • The selection is based on the effect of prefetching individual objects on the hit rate. • H/B-Greedy is still not an optimal algorithm in terms of H/B value.

  23. Hit Rate-Greedy Prefetching • To maximize the overall hit rate given the number of objects to prefetch, m, we select the m objects with the greatest hit rate contribution: • This algorithm is optimal in terms of hit rate.

  24. Bandwidth-Greedy Prefetching • To minimize the total bandwidth given m, the number of objects to prefetch, we select the m objects with least bandwidth contribution: • Bandwidth-Greedy Prefetching is optimal in terms of bandwidth consumption.

  25. H/B-Optimal Prefetching • Optimal algorithm for H/B metric provided by a solution to the following selection problem. • This is equivalent to maximum weighted average problem with pre-selected items.

  26. Maximum Weighted Average Maximum Weighted Average Problem: • Totally n courses, with different credit hours and scores • select m (m < n ) courses • maximize the GPA of m selected courses Solution: • If m=1 Then select course with highest score What if m>1? • Misleading intuition: select m courses with highest scores.

  27. A Course Selection Problem (example) • If m=2 If we select the 2 courses with highest scores: C and B. then GPA: 93.33 But if we select C and D, then GPA: 93.57 • Question: how to select m courses such that the GPA is maximized? Answer: Eppstein & Hirschberg solved this

  28. With Pre-selected Items Maximum Weighted Average with pre-selected items: • Totally n courses, with different credit hours and scores • Example: • Courses A and E must be selected, plus: • Select additional m (m is given, m<n) courses, such that: the resulting GPA is maximized (m=1): with D, GPA=77.7, with C, GPA=74.3, with B, GPA=77

  29. Pre-selection clause … (example) • Selection domain B~I, no pre-selection, m=2 optimal subset: {B,C}, GPA: 88.33 • Selection domain B~I, A is pre-selected, m=2 one candidate subset: {A,D,H}, GPA: 75.61 better than: {A,B,C}, GPA: 70.625 Conclusion: {B,C} not contained in optimal subset for pre-selection problem.

  30. H/B-Optimal v.s. Course Selection • The problem is formulated as: Where v0=5.0*70+2.0*75=500, and w0=5.0+2.0=7.0, in the previous example. • Equivalent to H/B-Optimal selection problem:

  31. H/B-Optimal v.s. Course Selection

  32. H/B-Optimal Algorithm Design • The selection of m courses is not trivial • For course i, we define auxiliary function • And for a given number m, we define a Utility function

  33. H/B-Optimal Algorithm Design • Lemma 1 Suppose A* is the maximum GPA we are computing, then for any subset S’S and |S|=m Thus, the optimal subset contains those courses that have the m largestri (A*) values

  34. H/B-Optimal Algorithm Design • n=6, m=4 • Each line is ri (x) • Assume we know A* • Optimal subset has the 4 courses with largest ri (A*) values. • Dilemma: A* is unknown

  35. Lemma 2: Lemma 2 used to narrow range of A* (Xl , Xr) is the current A*-range H/B-Optimal Algorithm Design

  36. H/B-Optimal Algorithm Design • If F (xl) > 0 and F (xr) < 0, then A* in (xl, xr) • Compute F((xl+xr)/2) -if F((xl+xr)/2) > 0, then A* > (xl+xr)/2 - if F((xl+xr)/2) < 0, then A* < (xl+xr)/2 - if F((xl+xr)/2) = 0, then A* = (xl+xr)/2; (Lemma 2) • Narrow the range of A* by half (use binary search)

  37. H/B-Optimal Algorithm Design (Idea) • Why keep on narrowing down the range of A*? • If intersection of rj (x) and rk (x) falls out of range, then • the ordering of rj (x) and rk (x) is determined within the range, so is rj (A*) and rk (A*), by comparing their slopes. • If the range is narrow enough that there are no intersections of r (x) lines within the range then • the total ordering of all r (A*) values is determined. • Now our optimal problem is solved: just select the m candidates with highest r (A*) values.

  38. H/B-Optimal Algorithm Design • However, the total ordering requires O(n2) time complexity • A randomized approach is used instead, this randomized algorithm: • Iteratively reduces the problem domain into a smaller one. • The algorithm maintains 4 sets: • X, Y, E, Z, initially empty • (larger, smaller, equal, or undetermined r)

  39. H/B-Optimal Algorithm Design In each iteration, randomly select a course i, compare it with each other course k. One of 4 possibilities: 1). if rk(A*) > ri(A*): insert k in set X 2). if rk(A*) < ri(A*): insert k in set Y 3). if wk=wi and vk=vi: insert k in set E 4). if undetermined: insert k in set Z Now do the following loop: loop: narrow the range of A* by half compare ri(A*) with rk’(A*) for k’ in Z if appropriate, move k’ to X or Y, accordingly until |Z| is sufficiently small (i.e., |Z| < |S|/32)

  40. H/B-Optimal Algorithm Design • After the loop, either X or Y has “enough” members to ensure speedy “convergence”. • Next, examine and compare the sizes of X, Y and E: • |X|+|E| > m // delete Y • |Y|+|E| > |S|-m //combine X and E into 1 course

  41. H/B-Optimal Algorithm Design 1). If |X|+|E| > m: At least m courses whose r(A*) values are greater than r(A*) value of all courses in Y. All members in Y may be removed. Then: |S| = |S| - |Y|

  42. H/B-Optimal Algorithm Design 2). If |Y|+|E| > |S|-m: All members in X are among the top m courses. All members in X must be in the optimal set. Collapse X into a single course (This course is included in the final optimal set). Then: |S| = |S| - |X| + 1; m = m - |X| + 1.

  43. H/B-Optimal Algorithm Design • In either case, the resulting domain has reduced size. • By iteratively removing or collapsing courses, the problem domain finally has only one course remaining: formed by collapsing all courses in optimal set. • Expected time complexity: (Assume Sb is the domain before iteration and Saafter.) 1). Each iteration takes expected time O(|Sb|) 2). Expected size |Sa| = (207/256) |Sb| The recurrence relation of the iteration: T(n) = O(n) + T[(207/256)n] Resolves to linear time complexity.

  44. H/B-Greedy v.s. H/B-Optimal • H/B-Greedy is an approximation to H/B-Optimal • H/B-Greedy achieves higher H/B metric than any existing algorithms. • H/B-Greedy easier to implement than H/B-Optimal. • Lower constant • Easily adjust to updates of object characteristics

  45. Simulation Results • Evaluation of H/B Greedy Prefetching • Figure 1: H/B,for total object number =1,000. • Figure 2: H/B,for total object number =10,000. • Figure 3: H/B,for total object number =100,000. • Figure 4: H/B,for total object number =1,000,000. • Evaluation of H-Greedy and B-Greedy algorithm • Figure 5: H-Greedy algorithm. • Figure 6: B-Greedy algorithm. • Figure 7: B-Greedy algorithm, zoomed in.

  46. Figure 1: H/B, for total object number=1,000

  47. Figure 2: H/B, for total object number=10,000

  48. Figure 3: H/B, total object number=100,000

  49. Figure 4: H/B, total object number=1,000,000

More Related