1 / 38

Lecture XIV: P2P

Lecture XIV: P2P. CMPT 401 Summer 2008 Dr. Alexandra Fedorova. Outline. Definition of peer-to-peer systems Motivation and challenges of peer-to-peer systems Early P2P systems (Napster, Gnutella) Structured overlays (Pastry) P2P applications: Squirrel, OceanStore. Definition of P2P.

Download Presentation

Lecture XIV: P2P

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture XIV: P2P CMPT 401 Summer 2008 Dr. Alexandra Fedorova

  2. Outline • Definition of peer-to-peer systems • Motivation and challenges of peer-to-peer systems • Early P2P systems (Napster, Gnutella) • Structured overlays (Pastry) • P2P applications: Squirrel, OceanStore

  3. Definition of P2P Peer to Peer Client Server P2P systems motivated by massive computing resources connected over the network available all over the world

  4. Definition of Peer-2-Peer

  5. Why P2P? • Enable the sharing of data and resources. • Computer and Internet usage has exploded in the recent years • Massive computing resource available at the edges of the Internet – storage, cycles, content, human presence.

  6. Benefits and Challenges • Benefits • Massive resources • Load balancing • Anonymity • Fault tolerance • Locality • Challenges • Security • Failure handling (nodes coming and leaving – churn) • Efficiency – massive system: how to search it efficiently • Support data mutation

  7. Evolution of P2P Systems • Three generations: • Generation 1: Early music exchange services (Napster, Gnutella) • Generation 2: Offers greater scalability, anonymity and fault tolerance (Kazaa) • Generation 3: Emergence of middleware layers for the application-independent management (Pastry, Tapestry)

  8. Architecture Peer to Peer Hybrid (Napster, SETI@Home) Pure Super-peer (Kazaa) Unstructured (Gnutella) Structured (Pastry)

  9. Overlay Routing versus IP Routing • Routing overlays: route from one node in the P2P system to another • At each hop deliver to the next P2P node • Another layer or routing on top of existing IP routing

  10. 0. Upload Song Names • Query song A 2.Return IP(B) 0. Upload Song Names 3. Download from B 0. Upload Song Names Search in Hybrid P2P: Napster • Lookup centralized • Peers provide meta-information to Lookup server • Data exchange between peers Peer D Lookup Server, Index table Peer A Peer C Peer B (song A)

  11. 1. Query song A 2. query 3. [File found] Download Search in Unstructured P2P Peer D Peer E Peer F Peer C Peer B Peer A TTL= N TTL= N-1 Peer I (song A) Peer G Peer H

  12. Common Issues • Organize, maintain overlay network • node arrivals • node failures • Resource allocation/load balancing • Resource location • Network proximity routing Idea: provide a generic P2P substrate (Pastry, Chord, Tapestry)

  13. Architecture Event notification Network storage ? P2P application layer P2P substrate (self-organizing overlay network) Pastry TCP/IP Internet

  14. Pastry: Object distribution • Globally Unique IDs (GUIDs) • 128 bit circular GUID space • nodeIds(uniform random) • objIds (uniform random) • Invariant: node with numerically closest nodeId maintains object 2128-1 O objId nodeIds

  15. Pastry: Object insertion/lookup 2128-1 O Msg with key X is routed to live node with nodeId closest to X Problem: complete routing table not feasible X Route(X)

  16. Pastry Routing • Leaf sets – closest nodes • Routing table – subset of nodes that are far away • If you are far from the target node/object, route using routing table • Once you get closer use the leaf set • Routing table has to be well populated, so you can reach many far-away destinations • A complete routing table can be very large • How to make routing table size feasible?

  17. Pastry: Routing Properties • log16 N steps • O(log N) state d471f1 d467c4 d462ba d46a1c d4213f Route(d46a1c) d13da3 65a1fc

  18. Pastry: Routing table (# 65a1fc) Row 0 Row 1 Row 2 Row 3 log16 N rows

  19. Pastry Routing Table • Each row icorresponds to the length of the common prefix • row 0 – 0 hex digits in common • row 1 – 1 common hex digit in common • Each column corresponds to (i+1)st digit that’s not in common • column 0 – first uncommon digit is 0 • column A – first uncommon digit is A • Corresponding entries are [GUID, IP] pairs • You go as far down the rows in routing table as possible • When you can’t go anymore (no more matching digits), forward request to [GUID, IP] in the column containing the first uncommon digit

  20. Pastry Routing: What’s the Next Hop? Row 0 Row 1 Row 2 Row 3 log16 N rows

  21. Pastry: Routing Algorithm if (destination D is within range of our leaf set) forward to numerically closest member else let l = length of shared prefix let d = value of l+1-th digit in D’s address let Rld=table entry at row=l, column=d if (Rldexists) forward to IP address at Rld else forward to a known node that (a) shares at least as long a prefix (b) is numerically closer than this node

  22. Let’s Play Pastry! • User at node 65a1fc • Wants to get to object with GUID d46a1c • We will see how each next hop is found using a routing table or leaf set • So, let’s start with routing table and leaf set at node 65a1fc

  23. Node: 65a1fc Destination: d46a1c Leaf set: 65a123 65abba 65badd 65cafe GUID = d13da3

  24. Node: d13da3Destination: d46a1c Leaf set: d13555 d14abc da1367 dbcdd5 GUID = d4213f

  25. Node: d4213fDestination: d46a1c Leaf set: d42cab d42fab dacabb ddaddd GUID = d462ba

  26. Node: d462baDestination: d46a1c Leaf set: d46cab d46fab dacada deaddd GUID = empty? Forward to any GUID with longest common prefix that’s numerically closer than current node GUID = d469ab

  27. Node: d469abDestination: d46a1c Leaf set: d469ac d46a00 d46a1c dcadda We are done!

  28. A New Node Joining Pastry • Compute its own GUID X – apply SHA-1 hash function to its public key • Get IP address of at least one Pastry node (publicly available) • Find a nearby Pastry node A (by repeatedly querying nodes in a leaf set of a known Pastry node) • Send a join message to A, with destination X • A will route message to node Z numerically closest to X • Nodes along the route are: B, C, … • Each node on the route send to X a part of its routing table and leaf set • X constructs its own routing table and leaf set, requests additional info if needed

  29. Node Failure or Departure • Repairs to leaf set • Members of leaf set are monitored with heartbeat messages • If a member has failed, • The node searches for another node • numerically closest the failed member • The node • asks that other node for its leaf set • adds members from that leaf set to its own leaf set • The node also informs its other neighbours of the failure • Repairs to routing table • Done on “when discovered basis”

  30. Pastry Evaluation: Experimental Setup • Evaluated on a simulator • A single machine simulates a large network of nodes • Message passing replaced by simulated transmission delay • Model join/leave behaviour of hosts • IP delays and join/leave behaviour parameters and based on real measurements • Simulator validated using a real installation of 52 nodes

  31. Pastry Evaluation: Dependability • With IP message loss rate of 0% • Pastry failed to deliver 1.5 in 100,000 requests (due to unavailability of destination host) • All requests that were delivered arrived at the correct node • With IP message loss rate of 5% • Pastry lost 3.3 in 100,000 requests • 1.6 in 100,000 requests were delivered to the wrong node

  32. Pastry Evaluation: Performance • Performance metric: relative delay penalty (RDP) • RDP: ratio between delay in delivering request by the routing overlay and in delivering that request via UDP/IP • A direct measure of the extra cost incurred in employing an overlay routing • RDP in Pastry: • 1.8 with zero network message loss • 2.2 with 5% network message loss

  33. Squirrel • Web cache. Idea: P2P caching of web objects • Cache web objects on nodes in a local network organized in a P2P network over Pastry • Motivation: no need for a centralized proxy cache • Each Squirrel node has a Pastry GUID • Each URL has a Pastry GUID (computed by applying SHA-1 hash to the URL) • Squirrel node whose GUID is numerically closest to the URL GUID becomes the home node for that URL, i.e., caches that URL • Simulation-based evaluation concluded that performance is comparable to that of the centralized cache • Squirrel was subsequently employed for real at a local network of 52 nodes

  34. OceanStore • Massive storage system • Incrementally scalable persistent storage facility • Replicated storage of both mutable and immutable objects • Built on top of P2P middleware Tapestry (based on GUIDs, similar to Pastry) • OceanStore objects: like files – data stored in a set of blocks • Each object is an ordered sequence of immutable versions that are (in principle) kept forever • Any update to an object results in the generation of a new version

  35. OceanStore

  36. OceanStore, Update • Clients contact primary replicas to make update requests • Primary replicas are powerful stable machines. They reach an agreement of accepting the update or not • The update data will be sent to archive servers for permanent storages • Meanwhile, the update data will be propagated to secondary replicas for queries issued by other clients • Clients must periodically check for new copies

  37. Summary • P2P systems harness massive computing resources available at the edges of the Internet • Early systems partly depended on a central server (Napster) or used unstructured routing, e.g., flooding, (Gnutella) • Later it was identified that common requirements for P2P systems could be solved by providing P2P middleware (Pastry, Tapestry, Chord) • P2P middleware enables routing, self organization, node arrival and departure, failure recovery • Most P2P applications support sharing of immutable objects (Kazaa, BitTorrent) • Some support mutable objects (OceanStore, Ivy) • Other uses of P2P technology include Internet telephony (Skype)

More Related