A Hybrid Caching Strategy for Streaming Media Files

A Hybrid Caching Strategy for Streaming Media Files Jussara M. Almeida Derek L. Eager Mary K. Vernon University of Wisconsin-Madison University of Saskatchewan November 2001

Outline • Characteristics of Streaming Media (SM) files • Delivery of SM files • Hypothesis and Assumptions • Previous Caching Policies • New Policy Performance Comparison • New Caching Policies • Conclusions and Future Work

Characteristics of SM Files • Large file size • cache on disk • Sustained I/Obandwidth • inserting and reading new content • Clients access partial files • initial portion • favored segment • base + variable number of layers of layered encoding

Delivery of SM Files • Unicast streaming: • server bandwidth is linear in client request rate • goal: maximize byte hit ratio • Multicast streaming • save bandwidth • cost sharing introduces new tradeoffs

Caching for Multicast Streams: Tradeoffs • example: 10 distributed proxy servers each serving a local region, 100 requests (on avg) arrive per region during a given popular video need 7 streams per region, or 12 streams at the remote server

Caching for Multicast Streams: Tradeoffs • caching popular content reduces the load on the remote server and network • delivering popular content from the remote server amortizes the cost of a stream over more clients • earlier portions of a popular video require more bandwidth and have less cost-sharing than later portions

New Caching Policies Research • Hypothesis: popularity-based strategy will outperform replacement-based strategy • significant fraction of requests to uncached files may be for files that are accessed very sporadically • Assumptions: • limited disk space implies limited disk bandwidth • proxy bandwidth for delivering cached streams is equal to min of proxy disk bw and proxy network bw (call this proxy disk bandwidth)

Current Web Caching Policies • Replacement based (cache on each miss) • Top replacement candidate is an ad-hoc combination of: • large files • least recently access or lower access frequency • miss penalty (server latency, bandwidth) • Cache whole file or none • Unicast • Ignore limited disk bandwidth

Previous SM Caching Policies • Interval Caching [DaSi93, KaRT95] • Resource Based Caching (RBC) [TVDS98] • Least Frequently Used (LFU) • Block-based insertion and deletion [AcSm00] • Popularity-based caching for layered encoding [RYHE00] • Prefix and Segment Caching for smoothing [SeRT99,WZDS98]

0 T S1 S2 S1  Time 0 T S2 S1  0 T  S3 S2 S1 0 T  Interval Caching • Cache smallest intervals • Target: memory caches (lots of insertions) Filef

Resource Based Caching • Cache entire files and intervals/runs • Goal: efficiently utilize the limited resource • limited space: cache smallest space requirement • limited bandwidth: cache smallest write overhead • Pre-allocate bandwidth to each cached entity • Complex algorithm • Complex implementation • High time complexity

Step 1: Selecting entity x  {interval, run, file} of file i 1) If Ubw > Uspace +  Choose the entity with lowest 2) If Uspace > Ubw +  Choose the entity with minimum space requirement Si,x 3) If Uspace -  < Ubw < Uspace +  Choose the entity with largest Step 2: Caching decision for entity x 1) If enough unallocated space and unallocated bandwidth: Cache entity x 2) If enough unallocated space but bandwidth constrained: Use bandwidth goodness list to select candidates for eviction 3) If enough unallocated bandwidth but space constrained: Use space goodness list to select candidates for eviction 4) If both bandwidth and space constrained: Walk on both lists: at each step, remove entity from bandwidth goodness list or from space goodness list. Step 3: Allocate spaceandbandwidth for entity x RBC Algorithm

Least Frequently Used • Different implementation options: • What to do when receive first access to an object? • How to estimate frequency? • Version studied: Currently Most Popular (CMP) • Insert only most frequently accessed (file or segment) • On-line popularity estimate: future research

Previous comparison : RBC vs. CMP [TVDS98] • Fixed fileaccess frequencies • RBC outperforms CMP for all parameter values studied • Limited design space • e.g.: total cache size  16GB • Inconsistent results

New Performance Comparison • Re-evaluate byte hit ratio of CMP and RBC • Simulation with synthetic workload • Broad design space • New Pooled RBC • New simple hybrid CMP/interval caching (CMP/IC) policy

System Assumptions • Arrivals: Poisson() • extra experiments with Pareto(,k) • File access frequency: Zipf() • Perfect File popularity • extra experiments with approximate file popularity • Uniform file size and delivery rate • extra experiments with variable file size and delivery rate • Load balanced across multiple disks

System Parameters • n : number of files •  : Zipf parameter • N : arrival rate (avg. number of requests per avg. file duration T) N =   T • C : cache size (fraction of media data accessed)

B: normalized disk bandwidth (fraction of the average number of simultaneous streams needed to deliver data that is cached by CMP) B depends on N, , n, C and disk technology Relative performance of policies depends mainly on B B = 1.0 : CMP system is bandwidth balanced B  1.0 : CMP system is bandwidth deficient B  1.0: CMP system is bandwidth abundant System Parameters

Normalized Disk Bandwidth (B)Example • Ultrastar 72ZX disk : • disk space: 116.76hours of MPEG-1 video (73.4GB) • disk bandwidth: 108 MPEG-1 streams (22-37 MB/s ) • Assume: 100 requests / hour for cached files • If cache contains 2-hour movies: • Need 200 streams • B =108/200 = 0.54 • If cache contains 30-minute TV shows: • Need 50 streams for cache content • B =108/50 = 2.16

RBC vs. CMP • CMP outperforms RBC if B  1.0 • RBC slightly outperforms CMP if B  1.0 and small caches N = 450, n= 100, =0

Files Cached by RBC • Average fraction of each file cached by RBC (N = 450, n = 100, C=0.25) B = 0.75 B = 1.0 B = 2.0

Space and Bandwidth Utilization B = 0.75 B = 1.0 B = 2.0

Pooled RBC • Three improvements over RBC • simpler rule to select entity to cache • can keep cached intervals when deleting a full file • pool of pre-allocated bandwidth • Similar complexityas RBC

Pooled RBC, RBC and LFU • Pooled RBC  CMP • BUT, Pooled RBC is much more complex than CMP N = 450, n= 100, =0

Hybrid CMP/IC Policies • Do interval caching on a separate (small) cache • Interval Cache in Main Memory: CMP/ICmem and Pooled RBC/ICmem • Interval Cache on Disk: CMP/ICdisk • e.g. 5% of disk cache

CMP/ICmem vs. Pooled RBC/ICmem N = 450, n= 100, =0 • Memory cache improves CMP and Pooled RBC • B  1.0 : greater improvement for CMP

CMP/ICdisk vs. Pooled RBC N = 450, n= 100, =0 • CMP/ICdisk  Pooled RBC  CMP

Conclusions • Simple CMP • simple to implement • performance similar to Pooled RBC, CMP/ICdisk (static file popularities) • Hybrid CMP/IC policy • Performance  Pooled RBC • simple to implement • possibly more robust (imperfect and dynamic popularity measures)

Future Work • Develop on-line estimate of file popularity • Server log analysis • client behavior and workloads (NOSSDAV’01 paper) • More logs!!!! • Caching Policies for Multicast Streams • popular file has greater cache-sharing if not cached • determine cache content that minimizes per-client cost • caching principles / on-line policy • (coming up soon) • Prototype, experimental ( live ) workloads

A Hybrid Caching Strategy for Streaming Media Files

A Hybrid Caching Strategy for Streaming Media Files

Presentation Transcript

Streaming Media

Congestion Control for Streaming Media

Streaming Media

WWWI Media Files

A Row Buffer Locality-Aware Caching Policy for Hybrid Memories

Streaming Media

Streaming Media Protocols

Caching Strategies in Transcoding-Enabled Proxy System for Streaming Media Distribution Networks

Streaming Media For Mobile Devices

Media Streaming

Multihoming Media Streaming

Media Streaming

A Hybrid Architecture for Cost-Effective On-Demand Media Streaming

A Dynamic Caching Algorithm Based on Internal Popularity Distribution of Streaming Media

Streaming Media

Optimization of Data Caching and Streaming Media

Streaming Media

Media Streaming Protocols