Storage management and caching in PAST

Storage management and caching in PAST Antony Rowstron and Peter Druschel Presented to cs294-4 by Owen Cooper

Outline • PAST goals • PAST api • File storage overview • File and replica diversion • Replica management • Caching • Performance • Discussion

PAST (non)goals • P2P global storage network • Use properties of existing p2p systems (Pastry) • Support for strong persistence • Via a core set of replicas • High availability • Via local caching • Scalable • Obtain high storage utilization via local cooperation • Secure • Design goals do not include • Replacing the file system • Updatable files • Directory or lookup service

Security Model • Pastry node ids are a hash of a public key • Smartcard based security • Provides keys • Quota management • Nodeid and fileid generation controlled • Try to stop nodes from getting consecutive ids • Or clients from overloading parts of the network • But node id and real world identity may not be linked • Data not encrypted

PAST API’s • In PAST, files are immutable • Fileid=Insert(filename,credentials, k, file) • Insert k copies of the file into the network, or fail. • Fileid a signed (filename, credentials, salt) • Successful if ack with receipts from k nodes • File=lookup(fileid) • Return a copy of the file if it exists • Reclaim(fileid, cradentials) • Reclaim accepted if requested by the owner • Allows, but does not require, storage reclamation

File insertion • Insert(name, c, k, file) • Computes a storage certificate • Contains fileid, hash of content, k, salt • Deducts k*filesize from quota • Routes file and storage certificate using pastry using fileid. • Node verifies the integrity of the file, stores it, and asks k-1 closest nodes to store the file. • K-1 nodes in leaf set (k-1 <= l) • Node returns ack with k signed storage receipts, or a nak.

Lookup and Reclamation • Pastry ensures replica is found • Since a lookup is routed to the closest nodeid • Reclamation • Client generates a reclaim certificate • Sends it to the fileid via pastry • Recipients verify the certificate & issue receipt • Client reclaims quota

Diversion • A file or replica can be relocated • For a replica, to another close node • If one of the K closest is overloaded • For a file, to another set of nodes in the idspace • If the nodes around a fileid are (possibly locally) congested • Why is this necessary? • Differing storage capacity at nodes • Differing file size for inserted files

Replica Diversion • Node responsible for fileid asks k-1 neighbors to store the file • Neighbor (N) may divert a copy to a node in its leaf set • Pointer to copy inserted at N • N issues storage certificate • N also inserts a pointer on the k+1th closest node • No orphan if N fails • N remains responsible for pointer maintenance

File Diversion • Replica diversion is local • Allows storage choice between nodes around fileid • File Diversion • Triggered when an insert with a fileid fails • Insert is tried a total of three times • New fileid generated by changing the salt

Storage Policy • How does a node choose to accept or reject a replica? • Computes sizeof(file)/sizeof(free_space) • Compares to Tpri or Tdiv depending node’s role • Tpri > Tdiv • How is node chosen for replica diversion • Search leaf set for the node that • Has maximal free space • Doesn’t already hold a diverted or primary replica • File diversion • K copies cannot be located (via primary or diversion)

Replica maintenance • Node join/leave causes responsibility shift • Pastry node failure detection will cause leaf set updates • Past detects responsibility shifts this way • Newly responsible node must copy files • Make a copy immediately, OR • pointer to old owner & copy lazily • Diverted replicas • Target of diversion may move out of leaf set • Node to store repica can be any one in leaf set • Must exchange keepalive messages themselves • Should be relocated

Replica maintenance (2) • Node failure may cause storage shortage • No node in leaf set can take over ownership • Search space is widened • Ask most extreme nodes to locate storage • Increases search space to 2l nodes • If no storage space found, fail.

Caching • Pastry’s locality based routing will tend to direct requests to nearby copies • PAST also stores cached copies • Along routing path between client and fileid • For insert and lookup operations • Cache maintained using GD-size algorithm • Weight per file: 1/size(file) • Eviction: • Pick file with minimum weight • Subtract weight of evicted file from all others

Experiments: without diversion • Experiments use • Large trace from web server • Files from local web server • The case for diversion with web trace • Without diversion: • 51.1% of insertions failed • 60.8% storage utilization

Experiments (2): with diversion • With diversion • Bigger leaf set size a plus

Experiments (3):varying Tpri • Effects of varying Tpri • # files stored v.s. size of file

Experiments (4): Varying Tdiv • Varying Tdiv • Tpri is constant

File and Replica Diversion

caching • 8 traces combined • Requests from clients in each trace are mapped to close PAST nodes

Storage management and caching in PAST

Storage management and caching in PAST

Presentation Transcript

Pest management in storage

Storage Management

Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility Antony Rowstron, Peter Dr

Optimized Caching Policies for Storage Systems

Storage Management

Storage-Aware Caching: Revisiting Caching for Heterogeneous Systems

Caching in HTTP

Storage Management

Storage Management

Storage Management

Storage Technology and Management

Storage Management

STORAGE MANAGEMENT

State management and Caching in ASP.NET

Caching and Buffering in HDF5

Caching and Buffering in HDF5

Caching and Buffering in HDF5

Storage management and caching in PAST, a large-scale, persistent peer-to-peer storage utility

Locality and Caching

Storage-Aware Caching: Revisiting Caching for Heterogeneous Systems

storage management