1 / 20

Storage management and caching in PAST

Storage management and caching in PAST. Antony Rowstron and Peter Druschel Presented to cs294-4 by Owen Cooper. Outline. PAST goals PAST api File storage overview File and replica diversion Replica management Caching Performance Discussion. PAST (non)goals. P2P global storage network

koko
Download Presentation

Storage management and caching in PAST

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Storage management and caching in PAST Antony Rowstron and Peter Druschel Presented to cs294-4 by Owen Cooper

  2. Outline • PAST goals • PAST api • File storage overview • File and replica diversion • Replica management • Caching • Performance • Discussion

  3. PAST (non)goals • P2P global storage network • Use properties of existing p2p systems (Pastry) • Support for strong persistence • Via a core set of replicas • High availability • Via local caching • Scalable • Obtain high storage utilization via local cooperation • Secure • Design goals do not include • Replacing the file system • Updatable files • Directory or lookup service

  4. Security Model • Pastry node ids are a hash of a public key • Smartcard based security • Provides keys • Quota management • Nodeid and fileid generation controlled • Try to stop nodes from getting consecutive ids • Or clients from overloading parts of the network • But node id and real world identity may not be linked • Data not encrypted

  5. PAST API’s • In PAST, files are immutable • Fileid=Insert(filename,credentials, k, file) • Insert k copies of the file into the network, or fail. • Fileid a signed (filename, credentials, salt) • Successful if ack with receipts from k nodes • File=lookup(fileid) • Return a copy of the file if it exists • Reclaim(fileid, cradentials) • Reclaim accepted if requested by the owner • Allows, but does not require, storage reclamation

  6. File insertion • Insert(name, c, k, file) • Computes a storage certificate • Contains fileid, hash of content, k, salt • Deducts k*filesize from quota • Routes file and storage certificate using pastry using fileid. • Node verifies the integrity of the file, stores it, and asks k-1 closest nodes to store the file. • K-1 nodes in leaf set (k-1 <= l) • Node returns ack with k signed storage receipts, or a nak.

  7. Lookup and Reclamation • Pastry ensures replica is found • Since a lookup is routed to the closest nodeid • Reclamation • Client generates a reclaim certificate • Sends it to the fileid via pastry • Recipients verify the certificate & issue receipt • Client reclaims quota

  8. Diversion • A file or replica can be relocated • For a replica, to another close node • If one of the K closest is overloaded • For a file, to another set of nodes in the idspace • If the nodes around a fileid are (possibly locally) congested • Why is this necessary? • Differing storage capacity at nodes • Differing file size for inserted files

  9. Replica Diversion • Node responsible for fileid asks k-1 neighbors to store the file • Neighbor (N) may divert a copy to a node in its leaf set • Pointer to copy inserted at N • N issues storage certificate • N also inserts a pointer on the k+1th closest node • No orphan if N fails • N remains responsible for pointer maintenance

  10. File Diversion • Replica diversion is local • Allows storage choice between nodes around fileid • File Diversion • Triggered when an insert with a fileid fails • Insert is tried a total of three times • New fileid generated by changing the salt

  11. Storage Policy • How does a node choose to accept or reject a replica? • Computes sizeof(file)/sizeof(free_space) • Compares to Tpri or Tdiv depending node’s role • Tpri > Tdiv • How is node chosen for replica diversion • Search leaf set for the node that • Has maximal free space • Doesn’t already hold a diverted or primary replica • File diversion • K copies cannot be located (via primary or diversion)

  12. Replica maintenance • Node join/leave causes responsibility shift • Pastry node failure detection will cause leaf set updates • Past detects responsibility shifts this way • Newly responsible node must copy files • Make a copy immediately, OR • pointer to old owner & copy lazily • Diverted replicas • Target of diversion may move out of leaf set • Node to store repica can be any one in leaf set • Must exchange keepalive messages themselves • Should be relocated

  13. Replica maintenance (2) • Node failure may cause storage shortage • No node in leaf set can take over ownership • Search space is widened • Ask most extreme nodes to locate storage • Increases search space to 2l nodes • If no storage space found, fail.

  14. Caching • Pastry’s locality based routing will tend to direct requests to nearby copies • PAST also stores cached copies • Along routing path between client and fileid • For insert and lookup operations • Cache maintained using GD-size algorithm • Weight per file: 1/size(file) • Eviction: • Pick file with minimum weight • Subtract weight of evicted file from all others

  15. Experiments: without diversion • Experiments use • Large trace from web server • Files from local web server • The case for diversion with web trace • Without diversion: • 51.1% of insertions failed • 60.8% storage utilization

  16. Experiments (2): with diversion • With diversion • Bigger leaf set size a plus

  17. Experiments (3):varying Tpri • Effects of varying Tpri • # files stored v.s. size of file

  18. Experiments (4): Varying Tdiv • Varying Tdiv • Tpri is constant

  19. File and Replica Diversion

  20. caching • 8 traces combined • Requests from clients in each trace are mapped to close PAST nodes

More Related