A hybrid caching strategy for streaming media files
This presentation is the property of its rightful owner.
Sponsored Links
1 / 29

A Hybrid Caching Strategy for Streaming Media Files PowerPoint PPT Presentation


  • 96 Views
  • Uploaded on
  • Presentation posted in: General

A Hybrid Caching Strategy for Streaming Media Files. Jussara M. Almeida Derek L. Eager Mary K. Vernon University of Wisconsin-Madison University of Saskatchewan November 2001. Outline. Characteristics of Streaming Media (SM) files Delivery of SM files

Download Presentation

A Hybrid Caching Strategy for Streaming Media Files

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


A Hybrid Caching Strategy for Streaming Media Files

Jussara M. Almeida Derek L. Eager Mary K. Vernon

University of Wisconsin-Madison

University of Saskatchewan

November 2001


Outline

  • Characteristics of Streaming Media (SM) files

  • Delivery of SM files

  • Hypothesis and Assumptions

  • Previous Caching Policies

  • New Policy Performance Comparison

  • New Caching Policies

  • Conclusions and Future Work


Characteristics of SM Files

  • Large file size

    • cache on disk

  • Sustained I/Obandwidth

    • inserting and reading new content

  • Clients access partial files

    • initial portion

    • favored segment

    • base + variable number of layers of layered encoding


Delivery of SM Files

  • Unicast streaming:

    • server bandwidth is linear in client request rate

    • goal: maximize byte hit ratio

  • Multicast streaming

    • save bandwidth

    • cost sharing introduces new tradeoffs


Caching for Multicast Streams: Tradeoffs

  • example:

    10 distributed proxy servers each serving a local region,

    100 requests (on avg) arrive per region during a given popular video

    need 7 streams per region, or 12 streams at the remote server


Caching for Multicast Streams: Tradeoffs

  • caching popular content reduces the load on the remote server and network

  • delivering popular content from the remote server amortizes the cost of a stream over more clients

  • earlier portions of a popular video require more bandwidth and have less cost-sharing than later portions


New Caching Policies Research

  • Hypothesis: popularity-based strategy will outperform replacement-based strategy

    • significant fraction of requests to uncached files may be for files that are accessed very sporadically

  • Assumptions:

    • limited disk space implies limited disk bandwidth

    • proxy bandwidth for delivering cached streams is equal to min of proxy disk bw and proxy network bw

      (call this proxy disk bandwidth)


Current Web Caching Policies

  • Replacement based (cache on each miss)

  • Top replacement candidate is an ad-hoc combination of:

    • large files

    • least recently access or lower access frequency

    • miss penalty (server latency, bandwidth)

  • Cache whole file or none

  • Unicast

  • Ignore limited disk bandwidth


Previous SM Caching Policies

  • Interval Caching [DaSi93, KaRT95]

  • Resource Based Caching (RBC) [TVDS98]

  • Least Frequently Used (LFU)

  • Block-based insertion and deletion [AcSm00]

  • Popularity-based caching for layered encoding [RYHE00]

  • Prefix and Segment Caching for smoothing [SeRT99,WZDS98]


0

T

S1

S2

S1

Time

0

T

S2

S1

0

T

S3

S2

S1

0

T

Interval Caching

  • Cache smallest intervals

  • Target: memory caches (lots of insertions)

Filef


Resource Based Caching

  • Cache entire files and intervals/runs

  • Goal: efficiently utilize the limited resource

    • limited space: cache smallest space requirement

    • limited bandwidth: cache smallest write overhead

  • Pre-allocate bandwidth to each cached entity

  • Complex algorithm

    • Complex implementation

    • High time complexity


Step 1: Selecting entity x  {interval, run, file} of file i

1) If Ubw > Uspace + 

Choose the entity with lowest

2) If Uspace > Ubw + 

Choose the entity with minimum space requirement Si,x

3) If Uspace -  < Ubw < Uspace + 

Choose the entity with largest

Step 2: Caching decision for entity x

1) If enough unallocated space and unallocated bandwidth:

Cache entity x

2) If enough unallocated space but bandwidth constrained:

Use bandwidth goodness list to select candidates for eviction

3) If enough unallocated bandwidth but space constrained:

Use space goodness list to select candidates for eviction

4) If both bandwidth and space constrained:

Walk on both lists: at each step, remove entity from bandwidth goodness list or from space goodness list.

Step 3: Allocate spaceandbandwidth for entity x

RBC Algorithm


Least Frequently Used

  • Different implementation options:

    • What to do when receive first access to an object?

    • How to estimate frequency?

  • Version studied: Currently Most Popular (CMP)

    • Insert only most frequently accessed (file or segment)

    • On-line popularity estimate: future research


Previous comparison : RBC vs. CMP [TVDS98]

  • Fixed fileaccess frequencies

  • RBC outperforms CMP for all parameter values studied

  • Limited design space

    • e.g.: total cache size  16GB

  • Inconsistent results


New Performance Comparison

  • Re-evaluate byte hit ratio of CMP and RBC

    • Simulation with synthetic workload

    • Broad design space

  • New Pooled RBC

  • New simple hybrid CMP/interval caching (CMP/IC) policy


System Assumptions

  • Arrivals: Poisson()

    • extra experiments with Pareto(,k)

  • File access frequency: Zipf()

  • Perfect File popularity

    • extra experiments with approximate file popularity

  • Uniform file size and delivery rate

    • extra experiments with variable file size and delivery rate

  • Load balanced across multiple disks


System Parameters

  • n : number of files

  •  : Zipf parameter

  • N : arrival rate (avg. number of requests per avg. file duration T)

    N =   T

  • C : cache size (fraction of media data accessed)


B: normalized disk bandwidth

(fraction of the average number of simultaneous streams needed to deliver data that is cached by CMP)

B depends on N, , n, C and disk technology

Relative performance of policies depends mainly on B

B = 1.0 : CMP system is bandwidth balanced

B  1.0 : CMP system is bandwidth deficient

B  1.0: CMP system is bandwidth abundant

System Parameters


Normalized Disk Bandwidth (B)Example

  • Ultrastar 72ZX disk :

    • disk space: 116.76hours of MPEG-1 video (73.4GB)

    • disk bandwidth: 108 MPEG-1 streams (22-37 MB/s )

  • Assume: 100 requests / hour for cached files

  • If cache contains 2-hour movies:

    • Need 200 streams

    • B =108/200 = 0.54

  • If cache contains 30-minute TV shows:

    • Need 50 streams for cache content

    • B =108/50 = 2.16


RBC vs. CMP

  • CMP outperforms RBC if B  1.0

  • RBC slightly outperforms CMP if B  1.0 and small caches

N = 450, n= 100, =0


Files Cached by RBC

  • Average fraction of each file cached by RBC (N = 450, n = 100, C=0.25)

B = 0.75

B = 1.0

B = 2.0


Space and Bandwidth Utilization

B = 0.75

B = 1.0

B = 2.0


Pooled RBC

  • Three improvements over RBC

    • simpler rule to select entity to cache

    • can keep cached intervals when deleting a full file

    • pool of pre-allocated bandwidth

  • Similar complexityas RBC


Pooled RBC, RBC and LFU

  • Pooled RBC  CMP

  • BUT, Pooled RBC is much more complex than CMP

N = 450, n= 100, =0


Hybrid CMP/IC Policies

  • Do interval caching on a separate (small) cache

    • Interval Cache in Main Memory: CMP/ICmem and Pooled RBC/ICmem

    • Interval Cache on Disk: CMP/ICdisk

      • e.g. 5% of disk cache


CMP/ICmem vs. Pooled RBC/ICmem

N = 450, n= 100, =0

  • Memory cache improves CMP and Pooled RBC

  • B  1.0 : greater improvement for CMP


CMP/ICdisk vs. Pooled RBC

N = 450, n= 100, =0

  • CMP/ICdisk  Pooled RBC  CMP


Conclusions

  • Simple CMP

    • simple to implement

    • performance similar to Pooled RBC, CMP/ICdisk (static file popularities)

  • Hybrid CMP/IC policy

    • Performance  Pooled RBC

    • simple to implement

    • possibly more robust (imperfect and dynamic popularity measures)


Future Work

  • Develop on-line estimate of file popularity

  • Server log analysis

    • client behavior and workloads (NOSSDAV’01 paper)

    • More logs!!!!

  • Caching Policies for Multicast Streams

    • popular file has greater cache-sharing if not cached

    • determine cache content that minimizes per-client cost

    • caching principles / on-line policy

    • (coming up soon)

  • Prototype, experimental ( live ) workloads


  • Login