On the sensitivity of web proxy cache performance to workload characteristics
Download
1 / 35

On the Sensitivity of Web Proxy Cache Performance to Workload Characteristics - PowerPoint PPT Presentation


  • 166 Views
  • Uploaded on

On the Sensitivity of Web Proxy Cache Performance to Workload Characteristics. Mudashiru Busari Carey Williamson Department of Computer Science University of Saskatchewan. Talk Outline. Introduction and Motivation ProWGen: Proxy Workload Generator Tool for Synthetic Web Proxy Workloads

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'On the Sensitivity of Web Proxy Cache Performance to Workload Characteristics' - cherie


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
On the sensitivity of web proxy cache performance to workload characteristics

On the Sensitivity of Web Proxy Cache Performance to Workload Characteristics

Mudashiru Busari

Carey Williamson

Department of Computer Science

University of Saskatchewan


Talk outline
Talk Outline Workload Characteristics

  • Introduction and Motivation

  • ProWGen: Proxy Workload Generator

    • Tool for Synthetic Web Proxy Workloads

  • Simulation Study

    • Simulation Evaluation of Web Proxy Caches

  • Conclusions and Future Work


Introduction
Introduction Workload Characteristics

  • “The Web is both a blessing and a curse…”

  • Blessing:

    • Internet available to the masses

    • Seamless exchange of information

  • Curse:

    • Internet available to the masses

    • Stress on networks, protocols, servers, users

  • Motivation: techniques to improve the performance and scalability of the Web


Why is the web so slow
Why is the Web so slow? Workload Characteristics

  • Client-side bottlenecks (PC, modem)

    • Solution: better access technologies

  • Server-side bottlenecks (busy Web site)

    • Solution: faster, scalable server designs

  • Network bottlenecks (Internet congestion)

    • Solutions: caching, replication; improved protocols for client-server communication


Our previous work
Our Previous Work Workload Characteristics

  • Evaluation of Canada’s national Web caching infrastructure for CANARIE’s CA*net II backbone

  • Workload characterization and evaluation of CA*net II Web caching hierarchy (IEEE Network, May/June 2000)

  • Developed Web proxy caching simulator for trace-driven simulation evaluation of Web proxy caching architectures


Ca net ii web caching hierarchy dec 1998
CA*net II Web Caching Hierarchy (Dec 1998) Workload Characteristics

(selected

measurement points

for our traffic analyses;

3-6 months of data

from each)

USask

CANARIE

(Ottawa)

To NLANR


Caching hierarchy overview
Caching Hierarchy Overview Workload Characteristics

Cache Hit Ratios

Top-Level/International

(20-50 GB)

5-10%

Proxy

(empirically

observed)

Proxy

National

(10-20 GB)

Proxy

15-20%

Regional/Univ.

(5-10 GB)

30-40%

Proxy

Proxy

Proxy

...

...

C

C

C

C

C

C

C


Overview of this paper
Overview of This Paper Workload Characteristics

  • Constructed synthetic Web proxy workload generation tool (ProWGen) that captures the salient characteristics of empirical Web proxy workloads

  • Use ProWGen to evaluate sensitivity of proxy caches to selected Web proxy workload characteristics


Research methodology
Research Methodology Workload Characteristics

  • Design, construction, and parameterization of aggregate workload models, based on empirical traces (Web proxy access logs)

  • Validation of ProWGen (statistically, and versus empirical workloads)

  • Simulation evaluation of single-level caches

    • Sensitivity to workload characteristics

    • Effect of cache size

    • Effect of cache replacement policy


Prowgen key workload characteristics
ProWGen: Workload CharacteristicsKey Workload Characteristics

  • “One-timers” (60-70% docs are useless!!!)

  • Zipf-like document referencing popularity

  • Heavy-tailed file size distribution (i.e., most files small, but most bytes are in big files)

  • Correlations (if any) between document size and document popularity (debate!)

  • Temporal locality (temporal correlation between recent past and near future references) [Mahanti et al. Perf.Eval. 2000]


Prowgen conceptual view
ProWGen (Conceptual View) Workload Characteristics

ProWGen Software

Input

Parameters

Synthetic

Workload

1

Z

a

c

L


Prowgen conceptual view1
ProWGen (Conceptual View) Workload Characteristics

Zipf

P

r

ProWGen Software

Input

Parameters

Synthetic

Workload

1

Z

a

c

L


Prowgen conceptual view2

Zipf Workload Characteristics

P

r

ProWGen (Conceptual View)

ProWGen Software

Input

Parameters

Synthetic

Workload

1

Z

a

c

L


Prowgen conceptual view3

Zipf Workload Characteristics

LLCD

P

F

r

s

ProWGen (Conceptual View)

ProWGen Software

Input

Parameters

Synthetic

Workload

1

Z

a

c

L


Prowgen conceptual view4

Zipf Workload Characteristics

LLCD

P

F

Correlation

r

s

-1 0 +1

ProWGen (Conceptual View)

ProWGen Software

Input

Parameters

Synthetic

Workload

1

Z

a

C

L


Prowgen workload modeling details
ProWGen: Workload Modeling Details Workload Characteristics

  • Modeled workload characteristics

    • One-time referencing

    • Zipf-like referencing behaviour (Zipf’s Law)

    • File size distribution

      • Body – lognormal distribution

      • Tail – Pareto Distribution

    • Correlation between file size and popularity

    • Temporal locality

      • Static probabilities in finite-size LRU stack model

      • Dynamic probabilities in finite-size LRU stack model


Validation of prowgen
Validation of ProWGen Workload Characteristics

  • To establish that the synthetic workloads possess the desired characteristics (quantitative and qualitative), and that the characteristics are similar to those in empirical workloads

  • Example: analyze 5 million requests from a proxy server trace and parameterize ProWGen to generate a similar workload


Workload synthesis

Parameter Workload Characteristics

Value

Total number of requests

Unique documents (of total requests)

One-timers (of unique documents)

Zipf slope

Tail Index

Documents in the tail

Beginning of the tail (bytes)

Mean of the lognormal file size distribution

Standard deviation

Correlation between file size and popularity

LRU Stack Model for temporal locality

LRU Stack Size

5,000,000

34%

72%

0.807

1.322

22%

10,000

7,000

11,000

Zero

Static and Dynamic

1,000

Workload Synthesis


Zipf like referencing behaviour
Zipf-like Referencing Behaviour Workload Characteristics

Empirical Trace Slope = 0.81

Synthetic Trace Slope = 0.83


Transfer size distribution

References Workload Characteristics

Bytes transferred

Transfer Size Distribution


Simulation evaluation of single level web proxy caches some research questions
Simulation Evaluation of Workload CharacteristicsSingle-Level Web Proxy Caches:Some Research Questions

  • In a single-level proxy cache, how sensitive is Web proxy caching performance to certain workload characteristics (one-timers, Zipf slope, heavy-tail index)?

  • How does the degree of sensitivity change depending on the cache replacement policy?


On the sensitivity of web proxy cache performance to workload characteristics

Simulation Model Workload Characteristics

Aggregate Workload

Proxy server

Web Servers

Web Clients


Experimental design factors and levels
Experimental Design: Factors and Levels Workload Characteristics

  • Cache size

    • 1 MB to 32 GB

  • Cache Replacement Policy

    • Recency-based LRU

    • Frequency-based LFU-Aging

    • Size-based GD-Size

  • Workload Characteristics

    • One-timers, Zipf slope, tail index, correlation, temporal locality model


Performance metrics
Performance Metrics Workload Characteristics

  • Document Hit Ratio

    • Percent of requested docs found in cache (HR)

  • Byte Hit Ratio

    • Percent of requested bytes found in cache (BHR)


Simulation results preview
Simulation Results (Preview) Workload Characteristics

  • Cache performance is very sensitive to:

    • Slope of Zipf-like doc referencing popularity

    • Temporal locality property

    • Correlations between size and popularity

  • Cache performance relatively insensitive to:

    • One-timers

    • Tail index of heavy-tailed file size distribution


Sensitivity to one timers lru
Sensitivity to One-timers (LRU) Workload Characteristics

(a) Doc Hit Ratio

(a) Byte Hit Ratio


Sensitivity to zipf slope lru
Sensitivity to Zipf Slope (LRU) Workload Characteristics

Difference of 0.2 in Zipf slope impacts performance

by as much as 10-15% in hit ratio and byte hit ratio

(a) Hit Ratio

(b) Byte Hit Ratio


Sensitivity to heavy tail index lru replacement policy
Sensitivity to Heavy Tail Index (LRU Replacement Policy) Workload Characteristics

(a) Doc Hit Ratio

(b) Byte Hit Ratio


On the sensitivity of web proxy cache performance to workload characteristics

Sensitivity to Heavy Tail Index (GD-Size Replacement Policy)

Difference of 0.2 in heavy tail index impacts performance

by less than 3%

(a) Hit Ratio

(a) Byte Hit Ratio


Sensitivity to correlation lru
Sensitivity to Correlation (LRU) Policy)

(a) Doc Hit Ratio

(a) Byte Hit Ratio


On the sensitivity of web proxy cache performance to workload characteristics

Sensitivity to Temporal Locality (LRU) Policy)

(a) Doc Hit Ratio

(b) Byte Hit Ratio


Summary single level caches
Summary: Single-Level Caches Policy)

  • Cache performance is sensitive to:

    • Slope of Zipf-like document referencing popularity (steeper slope implies better caching)

    • Temporal locality

    • Correlation between size and popularity

  • Cache Performance is insensitive to:

    • One-timers

    • Tail index of heavy-tailed file size distribution


Conclusions
Conclusions Policy)

  • ProWGen is a useful tool for the generation of synthetic Web proxy workloads for the evaluation of Web proxy caches and Web proxy caching architectures

  • Web proxy cache performance is quite sensitive to Zipf slope, temporal locality, and correlations (if any) between document size and document popularity


Future work
Future Work Policy)

  • Extend and improve ProWGen

    • Request arrival process (timestamps)

    • File modifications, types, and lifetimes

    • Web page structure (spatial locality)

    • Scaling the workload model(s)...

  • Evaluate multi-level Web proxy caches

  • Port to network emulation testbed


For more information
For More Information... Policy)

  • M. Busari, “Simulation Evaluation of Web Caching Hierarchies”, M.Sc. Thesis, Dept of Computer Science, U. Saskatchewan, June 2000

  • ProWGen tool:

    • http://www.cs.usask.ca/faculty/carey/software/

  • Email: carey@cs.usask.ca

    • http://www.cs.usask.ca/faculty/carey/