1 / 15

Peer-to-Peer (P2P) Distributed Storage

Peer-to-Peer (P2P) Distributed Storage. Peer-to-Peer Systems. Definition : “Peer-to-peer systems can be characterized as distributed systems in which all nodes have identical capabilities and responsibilities, and all communication is symmetric.” – Rowstron- Popular Examples: Napster

hamish
Download Presentation

Peer-to-Peer (P2P) Distributed Storage

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Peer-to-Peer (P2P) Distributed Storage Dennis Kafura – CS5204 – Operating Systems

  2. Peer-to-Peer Systems • Definition: “Peer-to-peer systems can be characterized as distributed systems in which all nodes have identical capabilities and responsibilities, and all communication is symmetric.” –Rowstron- • Popular Examples: • Napster • Gnutella • Goals (from Dabek, et. al.) • Symmetric and decentralized • Operate with unmanaged voluntary participants • Fast location of data • Tolerate frequent joining/leaving by servers • Balanced load CS 5204 – Operating Systems

  3. CFS: Properties • Decentralized control (use ordinary Internet hosts) • Scalability (overhead at most logarithmic in the number of servers) • Availability (placement of replicas on unrelated servers) • Load balance (block distribution and caching) • Persistence (renewable lifetimes) • Quotas (source-limited insertions) • Efficiency (comparable to FTP access) CS 5204 – Operating Systems

  4. CFS • Read-only file system interface • Block distribution/fetching • Caching/replication • Quota enforcement Chord client server • Block lookup DHash A generic, distributed block store CFS: Architecture CS 5204 – Operating Systems

  5. data block directory block H(B1) B1 root block Inode block H(D) H(F) D F H( public key ) B2 H(B2) timestamp data block signature CFS: Content-hash indexing • Each block (except for the root block of a file system) is identified by an index obtained from a hash (e.g., SHA-1) of its contents • A root block is signed by the author; the index of the root block is a hash of the user’s public key CS 5204 – Operating Systems

  6. H(block) successor H(IP address + virtual index) block server Chord: Mapping • server s stores all values indexed by key k for which s is the successor of k (successor(k) is the node whose identifier is the smallest one greater than k ) • each Chord server maintains two lists: • a finger table for searching • r immediate successors and their latency information CS 5204 – Operating Systems

  7. Chord: Searching (1) D n3 M finger table for n1 C B n1 n2 A CS 5204 – Operating Systems

  8. Chord: Searching (2) CS 5204 – Operating Systems

  9. Chord: performance CS 5204 – Operating Systems

  10. Chord: Adding Servers (1) Two Invariants maintained: • Successor information is correct • Successor(k) is responsible for key k Steps: • By out-of-band means, locate an existing server, n • Update tables • Update successor/predecessor links • Creates finger tables for new server • Update other server’s finger tables • Redistribute responsibility for keys to n from its successor • Call higher (DHash) layer CS 5204 – Operating Systems

  11. Chord: Adding Servers (2) Adding a new node at 6 assuming that node 6 knows, by out-of-band means, of node 2 CS 5204 – Operating Systems

  12. DHash: Interface • put_h(block) – stores block using content-hashing • put_s(block, pubkey) – stores block as a root block; key is hash of pubkey • get(key) – finds/returns block associated with key CS 5204 – Operating Systems

  13. DHash: replication • Places replicas on k servers following successor • Note: each Chord server maintains a list of r immediate successors. By keeping r >= k, it is easy for DHash to determine replica locations • Existence of replicas eases reallocation when node leaves the system • By fetching the successor list from Chord, the DHash layer can select the most efficient node from which to access a replica of a desired block CS 5204 – Operating Systems

  14. DHash: caching, load balancing, quotas • Caching is effective because searches from different clients converge toward the end of the search • Virtual servers hosted on one machine allow for more capable machines to store a larger portion of the identifier space • Each server enforces a fixed, per-IP address quota on publishing nodes CS 5204 – Operating Systems

  15. DHash: replication and caching server target server identifier replicas CS 5204 – Operating Systems

More Related