Squirrel a peer to peer web cache
Download
1 / 43

- PowerPoint PPT Presentation


  • 305 Views
  • Updated On :

Squirrel: A peer-to-peer web cache. Sitaram Iyer (Rice University) Joint work with Ant Rowstron (MSR Cambridge) Peter Druschel (Rice University). PODC 2002 / Sitaram Iyer / Tuesday July 23 / Monterey, CA. Web Caching. Latency, External traffic, Load on web servers and routers.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about '' - Anita


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Squirrel a peer to peer web cache l.jpg

Squirrel: A peer-to-peer web cache

Sitaram Iyer (Rice University)

Joint work with

Ant Rowstron (MSR Cambridge)

Peter Druschel (Rice University)

PODC 2002 / Sitaram Iyer / Tuesday July 23 / Monterey, CA


Web caching l.jpg
Web Caching

  • Latency,

  • External traffic,

  • Load on web servers and routers.

    Deployed at: Corporate network boundaries, ISPs, Web Servers, etc.


Web cache l.jpg
Web Cache

Browser

Cache

Browser

Web

Server

Client

Centralized Web Cache

Browser

Cache

Browser

Client

Corporate LAN

Internet


Cooperative web cache l.jpg
Cooperative Web Cache

Browser

Cache

Web Cache

Browser

Web Cache

Web

Server

Client

Web Cache

Web Cache

Web Cache

Browser

Cache

Browser

Client

Corporate LAN

Internet


Decentralized web cache l.jpg
Decentralized Web Cache

Squirrel

Browser

Cache

Browser

Web

Server

Client

Browser

Cache

Browser

Client

Corporate LAN

Internet


Distributed hash table l.jpg
Distributed Hash Table

Peer-to-peer location service: Pastry

nodes

k1,v1

k2,v2

k3,v3

Operations:

Insert(k,v)

Lookup(k)

Peer-to-peer routing and location substrate

k4,v4

k5,v5

k6,v6

<key,value>

  • Completely decentralized and self-organizing

  • Fault-tolerant, scalable, efficient


Why peer to peer l.jpg
Why peer-to-peer?

  • Cost of dedicated web cache

    No additional hardware

  • Administrative effort

    Self-organizing network

  • Scaling implies upgrading

    Resources grow with clients


Setting l.jpg
Setting

  • Corporate LAN

  • 100 - 100,000 desktop machines

  • Located in a single building or campus

  • Each node runs an instance of Squirrel

  • Sets it as the browser’s proxy


Mapping squirrel onto pastry l.jpg
Mapping Squirrel onto Pastry

Two approaches:

  • Home-store

  • Directory


Home store model l.jpg

Internet

LAN

Home-store model

client

URL hash

home


Home store model11 l.jpg
Home-store model

client

home

…that’s how it works!


Directory model l.jpg
Directory model

Client nodes always cache objects locally.

Home-store: home node also stores objects.

Directory: the home node only stores pointers to recent clients, and forwards requests.


Directory model13 l.jpg

Internet

LAN

Directory model

client

home


Directory model14 l.jpg
Directory model

client

home

Randomly choose entry from table


Directory advantages l.jpg
Directory: Advantages

Avoids storing unnecessary copies of objects.

Rapidly changing directory for popular objects seems to improve load balancing.

Home-store scheme can incur hotspots.


Directory disadvantages l.jpg
Directory: Disadvantages

Cache insertion only happens at clients, so:

  • active clients store all the popular objects,

  • inactive clients waste most of their storage.

    Implications:

  • Reduced cache size.

  • Load imbalance.


Directory load spike example l.jpg
Directory: Load spike example

  • Web page with many embedded images, or

  • Periods of heavy browsing.

    Many home nodes point to such clients!

Evaluate …



Total external traffic l.jpg
Total external traffic

Redmond

105

No web cache

100

95

Directory

Total external traffic (GB)

[lower is better]

Home-store

90

Centralized cache

85

0.001

0.01

0.1

1

10

100

Per-node cache size (in MB)


Total external traffic20 l.jpg
Total external traffic

Cambridge

6.1

No web cache

6

5.9

Directory

Total external traffic (GB)

5.8

[lower is better]

Home-store

5.7

5.6

Centralized cache

5.5

0.001

0.01

0.1

1

10

100

Per-node cache size (in MB)


Lan hops l.jpg
LAN Hops

Redmond

100%

80%

60%

% of cacheable requests

40%

20%

0%

0

1

2

3

4

5

6

Total hops within the LAN

Centralized

Home-store

Directory


Lan hops22 l.jpg
LAN Hops

Cambridge

100%

80%

60%

% of cacheable requests

40%

20%

0%

0

1

2

3

4

5

Total hops within the LAN

Centralized

Home-store

Directory


Load in requests per sec l.jpg
Load in requests per sec

Redmond

100000

Home-store

Directory

10000

1000

Number of times observed

100

10

1

0

10

20

30

40

50

Max objects served per-node / second


Load in requests per sec24 l.jpg
Load in requests per sec

Cambridge

1e+07

Home-store

Directory

1e+06

100000

10000

Number of times observed

1000

100

10

1

0

10

20

30

40

50

Max objects served per-node / second


Load in requests per min l.jpg
Load in requests per min

Redmond

100

Home-store

Directory

10

Number of times observed

1

0

50

100

150

200

250

300

350

Max objects served per-node / minute


Load in requests per min26 l.jpg
Load in requests per min

Cambridge

Home-store

Directory

10000

1000

Number of times observed

100

10

1

0

20

40

60

80

100

120

Max objects served per-node / minute


Fault tolerance l.jpg
Fault tolerance

Sudden node failures result in

partial loss of cached content.

Home-store: Proportional to failed nodes.

Directory: More vulnerable.


Fault tolerance28 l.jpg
Fault tolerance

If 1% of Squirrel nodes abruptly crash, the fraction of lost cached content is:


Conclusions l.jpg
Conclusions

  • Possible to decentralize web caching.

  • Performance comparable to a centralized web cache,

  • Is better in terms of cost, scalability, and administration effort, and

  • Under our assumptions, the home-store scheme is superior to the directory scheme.


Other aspects of squirrel l.jpg
Other aspects of Squirrel

  • Adaptive replication

    • Hotspot avoidance

    • Improved robustness

  • Route caching

    • Fewer LAN hops





Backup full home store protocol l.jpg
(backup) Full home-store protocol

other

req

other

req

req

(LAN)

(WAN)

a : object or notmod from home

b : req

home

client

1

b

b : object or notmod from origin

2

3

origin

server


Backup full directory protocol l.jpg

other

req

other

req

req

a : no dir, go to origin. Also d

a , d : req

1

1

home

2

2

client

b : not-modified

dir

a , d

origin

c ,e : object

3

3

2

4

c ,e : req

server

1

1

dele-

gate

object or

e

3

not-modified

origin

e : cGET req

2

server

(backup) Full directory protocol


Backup peer to peer computing l.jpg
(backup) Peer-to-peer Computing

Decentralize a distributed protocol:

  • Scalable

  • Self-organizing

  • Fault tolerant

  • Load balanced

    Not automatic!!


Decentralized web cache37 l.jpg
Decentralized Web Cache

Browser

Cache

Browser

Web

Server

Browser

Cache

Browser

Internet

LAN


Challenge l.jpg
Challenge

Decentralized web caching algorithm:

Need to achieve those benefits in practice!

Need to keep overhead unnoticeably low.

Node failures should not become significant.


Peer to peer routing e g pastry l.jpg
Peer-to-peer routing, e.g., Pastry

Peer-to-peer object location and routing substrate = Distributed Hash Table.

Reliably maps an object key to a live node.

Routes in log16(N)steps

(e.g. 3-4 steps for 100,000 nodes)


Home store is better l.jpg
Home-store is better!

Simpler home-store scheme achieves load balancing by hash function randomization.

Directory scheme implicitly relies on access patterns for load distribution.


Directory scheme seems better l.jpg
Directory scheme seems better…

Avoids storing unnecessary copies of objects.

Rapidly changing directory for popular objects results in load balancing.


Interesting difference l.jpg
Interesting difference

Consider:

  • Web page with many images, or

  • Heavily browsing node

    Directory: many pointers to some node.

    Home-store: natural load balancing.

Evaluate …


Fault tolerance43 l.jpg
Fault tolerance

When a single Squirrel node crashes, the fraction of lost cached content is:


ad