wide area cooperative storage with cfs n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Wide-Area Cooperative Storage with CFS PowerPoint Presentation
Download Presentation
Wide-Area Cooperative Storage with CFS

Loading in 2 Seconds...

play fullscreen
1 / 32

Wide-Area Cooperative Storage with CFS - PowerPoint PPT Presentation


  • 66 Views
  • Uploaded on

Wide-Area Cooperative Storage with CFS. Robert Morris Frank Dabek, M. Frans Kaashoek, David Karger, Ion Stoica MIT and Berkeley. Target CFS Uses. node. node. Serving data with inexpensive hosts: open-source distributions off-site backups tech report archive efficient sharing of music.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Wide-Area Cooperative Storage with CFS' - skyler


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
wide area cooperative storage with cfs

Wide-Area Cooperative Storage with CFS

Robert Morris

Frank Dabek, M. Frans Kaashoek,

David Karger, Ion Stoica

MIT and Berkeley

target cfs uses
Target CFS Uses

node

node

  • Serving data with inexpensive hosts:
    • open-source distributions
    • off-site backups
    • tech report archive
    • efficient sharing of music

node

Internet

node

node

how to mirror open source distributions
How to mirror open-source distributions?
  • Multiple independent distributions
    • Each has high peak load, low average
  • Individual servers are wasteful
  • Solution: aggregate
    • Option 1: single powerful server
    • Option 2: distributed service
      • But how do you find the data?
design challenges
Design Challenges
  • Avoid hot spots
  • Spread storage burden evenly
  • Tolerate unreliable participants
  • Fetch speed comparable to whole-file TCP
  • Avoid O(#participants) algorithms
    • Centralized mechanisms [Napster], broadcasts [Gnutella]
  • CFS solves these challenges
the rest of the talk
The Rest of the Talk
  • Software structure
  • Chord distributed hashing
  • DHash block management
  • Evaluation
  • Design focus: simplicity, proven properties
cfs architecture
CFS Architecture
  • Each node is a client and a server (like xFS)
  • Clients can support different interfaces
    • File system interface
    • Music key-word search (like Napster and Gnutella)

server

client

client

server

Internet

node

node

client server interface
Client-server interface
  • Files have unique names
  • Files are read-only (single writer, many readers)
  • Publishers split files into blocks
  • Clients check files for authenticity [SFSRO]

Insert file f

Insert block

FS Client

server

server

Lookup block

Lookup file f

node

node

server structure
Server Structure

DHash

DHash

Chord

Chord

Node 1

Node 2

  • DHash stores, balances, replicates, caches blocks
  • DHash uses Chord [SIGCOMM 2001] to locate blocks
chord hashes a block id to its successor
Chord Hashes a Block ID to its Successor

Block ID Node ID

N10

B112, B120, …, B10

B100

N100

Circular

ID Space

N32

B11, B30

B65, B70

N80

N60

B33, B40, B52

  • Nodes and blocks have randomly distributed IDs
  • Successor: node with next highest ID
basic lookup
Basic Lookup

N5

N10

“Where is block 70?”

N110

N20

N99

N32

N40

“N80”

N80

N60

  • Lookups find the ID’s predecessor
  • Correct if successors are correct
successor lists ensure robust lookup
Successor Lists Ensure Robust Lookup

10, 20, 32

N5

20, 32, 40

N10

5, 10, 20

N110

32, 40, 60

N20

110, 5, 10

N99

40, 60, 80

N32

N40

60, 80, 99

99, 110, 5

N80

N60

80, 99, 110

  • Each node stores r successors, r = 2 log N
  • Lookup can skip over dead nodes to find blocks
chord finger table allows o log n lookups
Chord Finger Table Allows O(log N) Lookups

½

¼

1/8

1/16

1/32

1/64

1/128

N80

  • See [SIGCOMM 2000] for table maintenance
dhash chord interface
DHash/Chord Interface
  • lookup() returns list with node IDs closer in ID space to block ID
    • Sorted, closest first

Lookup(blockID)

List of <node-ID, IP address>

DHash

server

Chord

finger table with <node IDs, IP address>

dhash uses other nodes to locate blocks
DHash Uses Other Nodes to Locate Blocks

N5

N10

N110

N20

N99

1.

2.

N40

3.

N50

N80

N60

N68

Lookup(BlockID=45)

storing blocks
Storing Blocks
  • Long-term blocks are stored for a fixed time
    • Publishers need to refresh periodically
  • Cache uses LRU

cache

Long-term block storage

disk:

replicate blocks at r successors
Replicate blocks at r successors

N5

N10

N110

N20

N99

Block

17

N40

N50

N80

N68

N60

  • Node IDs are SHA-1 of IP Address
  • Ensures independent replica failure
lookups find replicas
Lookups find replicas

N5

N10

N110

2.

N20

1.

3.

N99

Block

17

N40

4.

RPCs:

1. Lookup step

2. Get successor list

3. Failed block fetch

4. Block fetch

N50

N80

N68

N60

Lookup(BlockID=17)

first live successor manages replicas
First Live Successor Manages Replicas

N5

N10

N110

N20

N99

Copy of

17

Block

17

N40

N50

N80

N68

N60

  • Node can locally determine that it is the first live successor
dhash copies to caches along lookup path
DHash Copies to Caches Along Lookup Path

N5

N10

N110

1.

N20

N99

2.

N40

4.

RPCs:

1. Chord lookup

2. Chord lookup

3. Block fetch

4. Send to cache

N50

N80

3.

N60

N68

Lookup(BlockID=45)

caching at fingers limits load
Caching at Fingers Limits Load

N32

  • Only O(log N) nodes have fingers pointing to N32
  • This limits the single-block load on N32
virtual nodes allow heterogeneity
Virtual Nodes Allow Heterogeneity
  • Hosts may differ in disk/net capacity
  • Hosts may advertise multiple IDs
    • Chosen as SHA-1(IP Address, index)
    • Each ID represents a “virtual node”
  • Host load proportional to # v.n.’s
  • Manually controlled

N10

N60

N101

N5

Node B

Node A

fingers allow choice of paths
Fingers Allow Choice of Paths

N115

N18

10ms

N25

50ms

N96

100ms

12ms

N37

N90

N48

N80

Lookup(47)

N55

N70

  • Each node monitors RTTs to its own fingers
  • Tradeoff: ID-space progress vs delay
why blocks instead of files
Why Blocks Instead of Files?
  • Cost: one lookup per block
    • Can tailor cost by choosing good block size
  • Benefit: load balance is simple
    • For large files
    • Storage cost of large files is spread out
    • Popular files are served in parallel
cfs project status
CFS Project Status
  • Working prototype software
  • Some abuse prevention mechanisms
  • SFSRO file system client
    • Guarantees authenticity of files, updates, etc.
  • Napster-like interface in the works
    • Decentralized indexing system
  • Some measurements on RON testbed
  • Simulation results to test scalability
experimental setup 12 nodes
Experimental Setup (12 nodes)

To vu.nl lulea.se ucl.uk

  • One virtual node per host
  • 8Kbyte blocks
  • RPCs use UDP

To kaist.kr, .ve

  • Caching turned off
  • Proximity routing turned off
cfs fetch time for 1mb file
CFS Fetch Time for 1MB File

Fetch Time (Seconds)

Prefetch Window (KBytes)

  • Average over the 12 hosts
  • No replication, no caching; 8 KByte blocks
distribution of fetch times for 1mb
Distribution of Fetch Times for 1MB

40 Kbyte Prefetch

24 Kbyte Prefetch

8 Kbyte Prefetch

Fraction of Fetches

Time (Seconds)

cfs fetch time vs whole file tcp
CFS Fetch Time vs. Whole File TCP

40 Kbyte Prefetch

Whole File TCP

Fraction of Fetches

Time (Seconds)

robustness vs failures
Robustness vs. Failures

Six replicas

per block;

(1/2)6 is 0.016

Failed Lookups (Fraction)

Failed Nodes (Fraction)

future work
Future work
  • Test load balancing with real workloads
  • Deal better with malicious nodes
  • Proximity heuristics
  • Indexing
  • Other applications
related work
Related Work
  • SFSRO
  • Freenet
  • Napster
  • Gnutella
  • PAST
  • CAN
  • OceanStore
cfs summary
CFS Summary
  • CFS provides peer-to-peer r/o storage
  • Structure: DHash and Chord
  • It is efficient, robust, and load-balanced
  • It uses block-level distribution
  • The prototype is as fast as whole-file TCP

http://www.pdos.lcs.mit.edu/chord