1 / 96

Comparing P2P File Sharing Systems: Structured vs Unstructured

Explore the different types of P2P file sharing systems, including hybrid, unstructured, and structured overlays, and learn about the challenges and advantages of each approach.

Download Presentation

Comparing P2P File Sharing Systems: Structured vs Unstructured

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Structured P2P Overlays

  2. Classification of theP2P File Sharing Systems • Hybrid(Broker-mediated) • Unstructured+ centralized • Ex.: Napster • Unstructured + super peer notion • Ex.: KazaA, Morpheus • Unstructureddecentralized (or loosely controlled) • Files can be anywhere • Support of partial name and keyword queries • Inefficient search (some heuristics exist) & no guarantee of finding • Ex.: Gnutella • Structured (or tightly controlled, DHT) • Files are rigidly assigned to specific nodes • Efficient search & guarantee of finding • Lack of partial name and keyword queries • Ex.: Chord, CAN, Pastry, Tapestry

  3. Resource Discovery Centralized • One or few central coordinator(s) • e. g. Napster, Instant Messengers Fully Decentralized • All peers (or none) contain routing information • e. g. Freenet, Gnutella Hybrid • Some superpeers carry indexing information • e. g. FastTrack (Kazaa, Morpheus), Gnutella derivates Comparing Some File Sharing Methods

  4. Peer ID Search query peer peer Peer ID GET file Search query peer peer Search query Resource Discovery in P2P Systems 1st Generation: Central server Central index Napster 2nd Generation: No central server Flooding Gnutella 3rd Generation: Start Interv Succ 1 [1,2) 1 2 [2,4) 3 4 [4,0) 0 CAN, Chord, Pastry, etc. Distributed Hash Table (self organizing overlay network: topology, document routing) - structured 6 0 Start Interv Succ 2 [2,3) 3 3 [3,5) 3 5 [5,1) 0 7 1 1 2 6 5 3 4 Start Interv Succ 4 [4,5) 0 5 [5,7) 0 7 [7,3) 0 2

  5. Challenges • Duplicated Messages • From loop and flooding • Missing some contents • From loop and TTL • Oriented to File • Why? • Unstructured Network • Too Specific

  6. Structured P2P • Second generation P2P overlay networks • Self-organizing • Load balanced • Fault-tolerant • Scalable guarantees on numbers of hops to answer a query • Major difference with unstructured P2P systems • Based on a distributed hash table interface

  7. Distributed Hash Tables (DHT) • Distributed version of a hash table data structure • Stores (key, value) pairs • The key is like a filename • The value can be file contents • Goal: Efficiently insert/lookup/delete (key, value) pairs • Each peer stores a subset of (key, value) pairs in the system • Core operation: Find node responsible for a key • Map key to node • Efficiently route insert/lookup/delete request to this node

  8. DHT Applications • Many services can be built on top of a DHT interface • File sharing • Archival storage • Databases • Naming, service discovery • Chat service • Rendezvous-based communication • Publish/Subscribe

  9. DHT Desirable Properties • Keys mapped evenly to all nodes in the network • Each node maintains information about only a few other nodes • Messages can be routed to a node efficiently • Node arrival/departures only affect a few nodes

  10. DHT Routing Protocols • DHT is a generic interface • There are several implementations of this interface • Chord [MIT] • Pastry [Microsoft Research UK, Rice University] • Tapestry [UC Berkeley] • Content Addressable Network (CAN) [UC Berkeley] • SkipNet [Microsoft Research US, Univ. of Washington] • Kademlia [New York University] • Viceroy [Israel, UC Berkeley] • P-Grid [EPFL Switzerland] • Freenet [Ian Clarke] • These systems are often referred to as P2P routing substratesor P2P overlaynetworks

  11. Structured Overlays • Properties • Topology is tightly controlled • Well-defined rules determine to which other nodes a node connects • Files placed at precisely specified locations • Hash function maps file names to nodes • Scalable routing based on file attributes

  12. Second generation P2P systems • They guarantee a definite answer to a query in a bounded number of network hops. • They form a self-organizing overlay network. • They provide a load balanced,fault-tolerant distributed hash table, in which items can be inserted and looked up in a bounded number of forwarding hops.

  13. Approach to Structured P2P Network • Contribute to a way • to construct structured and general P2P network without loops and TTL • to know knowledge about constructed P2P network • 2-D Space • Mapping each nodes’ network identifier into 2-D space • Zone • Each node occupies allocated area • Aggregate nodes with same network identifier into a zone • Maintain a binary tree • Core • Represent each zone • Manage it’s zone • Gateway between neighbor zones and it’s member • Member • Belonged to a zone • Each message should be sent to its zone and members in its zone

  14. Resource Discovery Document Routing Shortly 001 012 • Chord, CAN, Tapestry, Pastry model • Benefits: • More efficient searching • Limited per-node state • Drawbacks: • Limited fault-tolerance vs redundancy 212 ? 212 ? 332 212 305

  15. Scalability of P2P Systems • Peer-to-peer (P2P) file sharing systems are now one of the most popular Internet applications and have become a major source of Internet traffic • Thus, it is extremely important that these systems be scalable • Unfortunately, the initial designs for P2P systems have significant scaling problems: • Napster has a centralized directory service • Gnutella employs a flooding-based search mechanism that is not suitable for large systems

  16. N1 N2 N3 N5 N4 Motivation How to find data in a distributed file sharing system? Publisher Key=“LetItBe” Value=MP3 data Internet ? Client Lookup(“LetItBe”) • Lookup is the key problem

  17. N1 N2 N3 N5 N4 DB Centralized Solution • Central server (Napster) Publisher Key=“LetItBe” Value=MP3 data Internet Client Lookup(“LetItBe”) • Requires O(M) state • Single point of failure

  18. N1 N2 N3 N5 N4 Distributed Solution (1) • Flooding (Gnutella, Morpheus, etc.) Publisher Key=“LetItBe” Value=MP3 data Internet Client Lookup(“LetItBe”) • Worst case O(N) messages per lookup

  19. N1 N2 N3 N5 N4 Distributed Solution (2) • Routed messages (Freenet, Tapestry, Chord, CAN, etc.) Publisher Key=“LetItBe” Value=MP3 data Internet Client Lookup(“LetItBe”) • Only exact matches

  20. Distributed Hash Table (DHT) Based Systems • In response to these scaling problems, several research groups have proposed a new generation of scalable P2P systems that support a DHT functionality • Tapestry • Pastry • Chord • Content-Addressable Networks (CAN) • In these systems: • files are associated with a key (produced, e.g., by hashing the file name) and • each node in the system is responsible for storing a certain range of keys

  21. Structured P2P Applications • A fundamental problem that confronts P2P applications is to efficiently locate the node that stores a particular data (file) item • Data location can be easily implemented by associating a key with each data item, and storing the key/data item pair at the node to which the key maps • The algorithms support the next operation: Given a key, it maps the key onto a node • Hash tables are used to map keys onto values that represent nodes

  22. Example P2P problem: lookup N2 N1 N3 Key=“title” Value=file data… Internet ? Client Publisher Lookup(“title”) N4 N6 N5 • At the heart of all P2P systems

  23. Structured P2P Applications • P2P routing protocols like Chord, Pastry, CAN, and Tapestry induce a connected overlay network across the Internet, with a rich structure that enables efficient key lookups • Such protocols have 2 parts: • Looking up file item in a specially constructed overlay structure • A protocol is specified that allows a node to join or leave the network, properly rearranging the ideal overlay to account for their presence or absence

  24. Looking Up • It is a basic operation in these DHT systems • lookup(key) returns the identity (e.g., the IP address) of the node storing the object with that key • This operation allows nodes to put and get files based on their key, thereby supporting the hash-table-like interface

  25. Document Routing • The core of these DHT systems is the routing algorithm • The DHT nodes form an overlay network with each node having several other nodes as neighbors • When a lookup(key) is issued, the lookup is routed through the overlay network to the node responsible for that key • The scalability of these DHT algorithms is tied directly to the efficiency of their routing algorithms

  26. Document Routing Algorithms • They take, as input, a key and, in response, route a message to the node responsible for that key • The keys are strings of digits of some length • Nodes have identifiers, taken from the same space as the keys (i.e., same number of digits) • Each node maintains a routing table consisting of a small subset of nodes in the system • When a node receives a query for a key for which it is not responsible, the node routes the query to the neighbour node that makes the most “progress” towards resolving the query • The notion of progress differs from algorithm to algorithm, but in general is defined in terms of some distance between the identifier of the current node and the identifier of the queried key

  27. Content-Addressable Network (CAN) • A typical document routing method • Virtual Cartesian coordinate space is used • Entire space is partitioned amongst all the nodes • every node “owns” a zone in the overall space • Abstraction • can store data at “points” in the space • can route from one “point” to another • Point = node that owns the enclosing zone

  28. Basic Concept of CAN Data stored in the CAN is addressed by name (i.e. key), not location (i.e. IP address)Task of the routing: how find the place of a file?

  29. CAN Example: Two Dimensional Space • Space divided between nodes • All nodes cover the entire space • Each node covers either a square or a rectangular area of ratios 1:2 or 2:1 • Example: • Node n1:(1, 2) first node that joins  cover the entire space 7 6 5 4 3 n1 2 1 0 0 2 3 4 6 7 5 1

  30. CAN Example: Two Dimensional Space • Node n2:(4, 2) joins  space is divided between n1 and n2 7 6 5 4 3 n2 n1 2 1 0 0 2 3 4 6 7 5 1

  31. CAN Example: Two Dimensional Space • Node n3:(3, 5) joins  space is divided between n1 and n3 7 6 n3 5 4 3 n2 n1 2 1 0 0 2 3 4 6 7 5 1

  32. CAN Example: Two Dimensional Space • Nodes n4:(5, 5) and n5:(6,6) join 7 6 n5 n4 n3 5 4 3 n2 n1 2 1 0 0 2 3 4 6 7 5 1

  33. CAN Example: Two Dimensional Space • Nodes: n1:(1, 2); n2:(4,2); n3:(3, 5); n4:(5,5);n5:(6,6) • Items: f1:(2,3); f2:(5,1); f3:(2,1); f4:(7,5); 7 6 n5 n4 n3 f4 5 4 f1 3 n2 n1 2 f3 1 f2 0 0 2 3 4 6 7 5 1

  34. CAN Example: Two Dimensional Space • Each item is stored by the node who owns its mapping in the space 7 6 n5 n4 n3 f4 5 4 f1 3 n2 n1 2 f3 1 f2 0 0 2 3 4 6 7 5 1

  35. CAN: Query Example • Each node knows its neighbours in the d-space • Forward query to the neighbour that is closest to the query id • Example: assume Node n1 queries File Item f4 7 6 n5 n4 n3 f4 5 4 f1 3 n2 n1 2 f3 1 f2 0 0 2 3 4 6 7 5 1

  36. CAN: Query Example • Each node knows its neighbours in the d-space • Forward query to the neighbour that is closest to the query id • Example: assume Node n1 queries File Item f4 7 6 n5 n4 n3 f4 5 4 f1 3 n2 n1 2 f3 1 f2 0 0 2 3 4 6 7 5 1

  37. CAN: Query Example • Each node knows its neighbours in the d-space • Forward query to the neighbour that is closest to the query id • Example: assume Node n1 queries File Item f4 7 6 n5 n4 n3 f4 5 4 f1 3 n2 n1 2 f3 1 f2 0 0 2 3 4 6 7 5 1

  38. CAN: Query Example • Each node knows its neighbours in the d-space • Forward query to the neighbour that is closest to the query id • Example: assume Node n1 queries File Item f4 7 6 n5 n4 n3 f4 5 4 f1 3 n2 n1 2 f3 1 f2 0 0 2 3 4 6 7 5 1

  39. Resource Discovery Document Routing – CAN • Associate to each node and item a unique id(nodeId and fileId) in an d-dimensional space • Goals • Scales to hundreds of thousands of nodes • Handles rapid arrival and failure of nodes • Properties • Routing table size O(d) • Guarantees that a file is found in at most d*n1/d steps, where n is the total number of nodes

  40. zone Core node Member node Member Tree Tx within a zone Tx between zones Tx between zones Overview of Structured P2P Network 216 0 216

  41. Elements of Structured P2P Network • Core/Member Nodes • Neighboring zone information • core info, zone info, direction • Member information • member node information, routing table • Strategies • Routing Messages • Constructing Structured P2P Network • Managing Zone • Constructing Member Tree • Discovering Contents

  42. Core/Member nodes • 7 neighboring zone information • Core node (IP, Port#) • Zone Range (x1,y1)~(x2,y2) • Numbering zone • 4 bits • 00 : less than • 01 : belong to • 10 : greater than • Member information • IP, Port# • Member Tree • Uplink node info (only 1) • Downlink node info (limited by 2) 1001 1010 1000 0110 0100 0001 0010 0000

  43. Routing Messages • Within a zone • Depends on the Member Tree (Binary Tree) • Between zones • If not a core, just send its core • Then core route this message along X coordinate until reaching destination x • After that, route the message along Y coordinate • Every Message should have originator’s IP and Port 1001 1010 1000 0110 0100 0001 0010 0000

  44. Constructing Structured P2P Network(JOIN) Node Core Bootstrapping RP JOIN/(JOIN_FWD) Routing Message Zone Management Join As Core/Join As Member Inform Neighboring Zones Inform Members Inform Neighboring Zones

  45. Managing Zone(1) same network identifier? Msg Type? Yes No AsMember AsCore Split Zone & Rearrange Neighbors Accept as a Member Set itself & Inform Neighbors Inform Members Inform Neighbors Msg AsCore Msg AsMember Reply Join Completed

  46. Managing Zone(2) Y • Splitting Zone • Network ID of New node is within its zone range, but Network ID is different • Direction of Split • X or Y direction • Depends on Difference of X and Y between two network IDs • Rearrange neighboring zones • Two nodes inform neighbors of this change X

  47. Constructing Member Tree 6 • Each node • Maintain information of all members • Creates a binary tree • Using sorted IP address • Rule • one link between core and a member • Uplink is only one • Downlink is limited by 2 2 4 Core 1 5 3 7

  48. Y 216-1 X 0 216-1 Discovering Content • Content Discovery • Send the Msg to its Member and it’s core • Core • On receiving it, Send it neighbor zones along X coordinate • Also send it Neighboring Y zones with flooding • DiscoveryHit

  49. d471f1 d467c4 d462ba d46a1c d4213f Route(d46a1c) d13da3 65a1fc Other type of P2P Storage Systems • Example: Chord, Pastry or Tapestry P2P systems • Every node is responsible for a subset of the data • Routing algorithm locates data, with small per-node routing state • Volunteer nodes join and leave system at any time • All nodes have identical responsibilities • All communication is symmetric

  50. P2P Ring • Nodes are arranged in a ring based on id • Ids are assigned randomly • Very large id space

More Related