Comparing P2P File Sharing Systems: Structured vs Unstructured

Structured P2P Overlays

Classification of theP2P File Sharing Systems • Hybrid(Broker-mediated) • Unstructured+ centralized • Ex.: Napster • Unstructured + super peer notion • Ex.: KazaA, Morpheus • Unstructureddecentralized (or loosely controlled) • Files can be anywhere • Support of partial name and keyword queries • Inefficient search (some heuristics exist) & no guarantee of finding • Ex.: Gnutella • Structured (or tightly controlled, DHT) • Files are rigidly assigned to specific nodes • Efficient search & guarantee of finding • Lack of partial name and keyword queries • Ex.: Chord, CAN, Pastry, Tapestry

Resource Discovery Centralized • One or few central coordinator(s) • e. g. Napster, Instant Messengers Fully Decentralized • All peers (or none) contain routing information • e. g. Freenet, Gnutella Hybrid • Some superpeers carry indexing information • e. g. FastTrack (Kazaa, Morpheus), Gnutella derivates Comparing Some File Sharing Methods

Peer ID Search query peer peer Peer ID GET file Search query peer peer Search query Resource Discovery in P2P Systems 1st Generation: Central server Central index Napster 2nd Generation: No central server Flooding Gnutella 3rd Generation: Start Interv Succ 1 [1,2) 1 2 [2,4) 3 4 [4,0) 0 CAN, Chord, Pastry, etc. Distributed Hash Table (self organizing overlay network: topology, document routing) - structured 6 0 Start Interv Succ 2 [2,3) 3 3 [3,5) 3 5 [5,1) 0 7 1 1 2 6 5 3 4 Start Interv Succ 4 [4,5) 0 5 [5,7) 0 7 [7,3) 0 2

Challenges • Duplicated Messages • From loop and flooding • Missing some contents • From loop and TTL • Oriented to File • Why? • Unstructured Network • Too Specific

Structured P2P • Second generation P2P overlay networks • Self-organizing • Load balanced • Fault-tolerant • Scalable guarantees on numbers of hops to answer a query • Major difference with unstructured P2P systems • Based on a distributed hash table interface

Distributed Hash Tables (DHT) • Distributed version of a hash table data structure • Stores (key, value) pairs • The key is like a filename • The value can be file contents • Goal: Efficiently insert/lookup/delete (key, value) pairs • Each peer stores a subset of (key, value) pairs in the system • Core operation: Find node responsible for a key • Map key to node • Efficiently route insert/lookup/delete request to this node

DHT Applications • Many services can be built on top of a DHT interface • File sharing • Archival storage • Databases • Naming, service discovery • Chat service • Rendezvous-based communication • Publish/Subscribe

DHT Desirable Properties • Keys mapped evenly to all nodes in the network • Each node maintains information about only a few other nodes • Messages can be routed to a node efficiently • Node arrival/departures only affect a few nodes

DHT Routing Protocols • DHT is a generic interface • There are several implementations of this interface • Chord [MIT] • Pastry [Microsoft Research UK, Rice University] • Tapestry [UC Berkeley] • Content Addressable Network (CAN) [UC Berkeley] • SkipNet [Microsoft Research US, Univ. of Washington] • Kademlia [New York University] • Viceroy [Israel, UC Berkeley] • P-Grid [EPFL Switzerland] • Freenet [Ian Clarke] • These systems are often referred to as P2P routing substratesor P2P overlaynetworks

Structured Overlays • Properties • Topology is tightly controlled • Well-defined rules determine to which other nodes a node connects • Files placed at precisely specified locations • Hash function maps file names to nodes • Scalable routing based on file attributes

Second generation P2P systems • They guarantee a definite answer to a query in a bounded number of network hops. • They form a self-organizing overlay network. • They provide a load balanced,fault-tolerant distributed hash table, in which items can be inserted and looked up in a bounded number of forwarding hops.

Approach to Structured P2P Network • Contribute to a way • to construct structured and general P2P network without loops and TTL • to know knowledge about constructed P2P network • 2-D Space • Mapping each nodes’ network identifier into 2-D space • Zone • Each node occupies allocated area • Aggregate nodes with same network identifier into a zone • Maintain a binary tree • Core • Represent each zone • Manage it’s zone • Gateway between neighbor zones and it’s member • Member • Belonged to a zone • Each message should be sent to its zone and members in its zone

Resource Discovery Document Routing Shortly 001 012 • Chord, CAN, Tapestry, Pastry model • Benefits: • More efficient searching • Limited per-node state • Drawbacks: • Limited fault-tolerance vs redundancy 212 ? 212 ? 332 212 305

Scalability of P2P Systems • Peer-to-peer (P2P) file sharing systems are now one of the most popular Internet applications and have become a major source of Internet traffic • Thus, it is extremely important that these systems be scalable • Unfortunately, the initial designs for P2P systems have significant scaling problems: • Napster has a centralized directory service • Gnutella employs a flooding-based search mechanism that is not suitable for large systems

N1 N2 N3 N5 N4 Motivation How to find data in a distributed file sharing system? Publisher Key=“LetItBe” Value=MP3 data Internet ? Client Lookup(“LetItBe”) • Lookup is the key problem

N1 N2 N3 N5 N4 DB Centralized Solution • Central server (Napster) Publisher Key=“LetItBe” Value=MP3 data Internet Client Lookup(“LetItBe”) • Requires O(M) state • Single point of failure

N1 N2 N3 N5 N4 Distributed Solution (1) • Flooding (Gnutella, Morpheus, etc.) Publisher Key=“LetItBe” Value=MP3 data Internet Client Lookup(“LetItBe”) • Worst case O(N) messages per lookup

N1 N2 N3 N5 N4 Distributed Solution (2) • Routed messages (Freenet, Tapestry, Chord, CAN, etc.) Publisher Key=“LetItBe” Value=MP3 data Internet Client Lookup(“LetItBe”) • Only exact matches

Distributed Hash Table (DHT) Based Systems • In response to these scaling problems, several research groups have proposed a new generation of scalable P2P systems that support a DHT functionality • Tapestry • Pastry • Chord • Content-Addressable Networks (CAN) • In these systems: • files are associated with a key (produced, e.g., by hashing the file name) and • each node in the system is responsible for storing a certain range of keys

Structured P2P Applications • A fundamental problem that confronts P2P applications is to efficiently locate the node that stores a particular data (file) item • Data location can be easily implemented by associating a key with each data item, and storing the key/data item pair at the node to which the key maps • The algorithms support the next operation: Given a key, it maps the key onto a node • Hash tables are used to map keys onto values that represent nodes

Example P2P problem: lookup N2 N1 N3 Key=“title” Value=file data… Internet ? Client Publisher Lookup(“title”) N4 N6 N5 • At the heart of all P2P systems

Structured P2P Applications • P2P routing protocols like Chord, Pastry, CAN, and Tapestry induce a connected overlay network across the Internet, with a rich structure that enables efficient key lookups • Such protocols have 2 parts: • Looking up file item in a specially constructed overlay structure • A protocol is specified that allows a node to join or leave the network, properly rearranging the ideal overlay to account for their presence or absence

Looking Up • It is a basic operation in these DHT systems • lookup(key) returns the identity (e.g., the IP address) of the node storing the object with that key • This operation allows nodes to put and get files based on their key, thereby supporting the hash-table-like interface

Document Routing • The core of these DHT systems is the routing algorithm • The DHT nodes form an overlay network with each node having several other nodes as neighbors • When a lookup(key) is issued, the lookup is routed through the overlay network to the node responsible for that key • The scalability of these DHT algorithms is tied directly to the efficiency of their routing algorithms

Document Routing Algorithms • They take, as input, a key and, in response, route a message to the node responsible for that key • The keys are strings of digits of some length • Nodes have identifiers, taken from the same space as the keys (i.e., same number of digits) • Each node maintains a routing table consisting of a small subset of nodes in the system • When a node receives a query for a key for which it is not responsible, the node routes the query to the neighbour node that makes the most “progress” towards resolving the query • The notion of progress differs from algorithm to algorithm, but in general is defined in terms of some distance between the identifier of the current node and the identifier of the queried key

Content-Addressable Network (CAN) • A typical document routing method • Virtual Cartesian coordinate space is used • Entire space is partitioned amongst all the nodes • every node “owns” a zone in the overall space • Abstraction • can store data at “points” in the space • can route from one “point” to another • Point = node that owns the enclosing zone

Basic Concept of CAN Data stored in the CAN is addressed by name (i.e. key), not location (i.e. IP address)Task of the routing: how find the place of a file?

CAN Example: Two Dimensional Space • Space divided between nodes • All nodes cover the entire space • Each node covers either a square or a rectangular area of ratios 1:2 or 2:1 • Example: • Node n1:(1, 2) first node that joins  cover the entire space 7 6 5 4 3 n1 2 1 0 0 2 3 4 6 7 5 1

CAN Example: Two Dimensional Space • Node n2:(4, 2) joins  space is divided between n1 and n2 7 6 5 4 3 n2 n1 2 1 0 0 2 3 4 6 7 5 1

CAN Example: Two Dimensional Space • Node n3:(3, 5) joins  space is divided between n1 and n3 7 6 n3 5 4 3 n2 n1 2 1 0 0 2 3 4 6 7 5 1

CAN Example: Two Dimensional Space • Nodes n4:(5, 5) and n5:(6,6) join 7 6 n5 n4 n3 5 4 3 n2 n1 2 1 0 0 2 3 4 6 7 5 1

CAN Example: Two Dimensional Space • Nodes: n1:(1, 2); n2:(4,2); n3:(3, 5); n4:(5,5);n5:(6,6) • Items: f1:(2,3); f2:(5,1); f3:(2,1); f4:(7,5); 7 6 n5 n4 n3 f4 5 4 f1 3 n2 n1 2 f3 1 f2 0 0 2 3 4 6 7 5 1

CAN Example: Two Dimensional Space • Each item is stored by the node who owns its mapping in the space 7 6 n5 n4 n3 f4 5 4 f1 3 n2 n1 2 f3 1 f2 0 0 2 3 4 6 7 5 1

CAN: Query Example • Each node knows its neighbours in the d-space • Forward query to the neighbour that is closest to the query id • Example: assume Node n1 queries File Item f4 7 6 n5 n4 n3 f4 5 4 f1 3 n2 n1 2 f3 1 f2 0 0 2 3 4 6 7 5 1

Resource Discovery Document Routing – CAN • Associate to each node and item a unique id(nodeId and fileId) in an d-dimensional space • Goals • Scales to hundreds of thousands of nodes • Handles rapid arrival and failure of nodes • Properties • Routing table size O(d) • Guarantees that a file is found in at most d*n1/d steps, where n is the total number of nodes

zone Core node Member node Member Tree Tx within a zone Tx between zones Tx between zones Overview of Structured P2P Network 216 0 216

Elements of Structured P2P Network • Core/Member Nodes • Neighboring zone information • core info, zone info, direction • Member information • member node information, routing table • Strategies • Routing Messages • Constructing Structured P2P Network • Managing Zone • Constructing Member Tree • Discovering Contents

Core/Member nodes • 7 neighboring zone information • Core node (IP, Port#) • Zone Range (x1,y1)~(x2,y2) • Numbering zone • 4 bits • 00 : less than • 01 : belong to • 10 : greater than • Member information • IP, Port# • Member Tree • Uplink node info (only 1) • Downlink node info (limited by 2) 1001 1010 1000 0110 0100 0001 0010 0000

Routing Messages • Within a zone • Depends on the Member Tree (Binary Tree) • Between zones • If not a core, just send its core • Then core route this message along X coordinate until reaching destination x • After that, route the message along Y coordinate • Every Message should have originator’s IP and Port 1001 1010 1000 0110 0100 0001 0010 0000

Constructing Structured P2P Network(JOIN) Node Core Bootstrapping RP JOIN/(JOIN_FWD) Routing Message Zone Management Join As Core/Join As Member Inform Neighboring Zones Inform Members Inform Neighboring Zones

Managing Zone(1) same network identifier? Msg Type? Yes No AsMember AsCore Split Zone & Rearrange Neighbors Accept as a Member Set itself & Inform Neighbors Inform Members Inform Neighbors Msg AsCore Msg AsMember Reply Join Completed

Managing Zone(2) Y • Splitting Zone • Network ID of New node is within its zone range, but Network ID is different • Direction of Split • X or Y direction • Depends on Difference of X and Y between two network IDs • Rearrange neighboring zones • Two nodes inform neighbors of this change X

Constructing Member Tree 6 • Each node • Maintain information of all members • Creates a binary tree • Using sorted IP address • Rule • one link between core and a member • Uplink is only one • Downlink is limited by 2 2 4 Core 1 5 3 7

Y 216-1 X 0 216-1 Discovering Content • Content Discovery • Send the Msg to its Member and it’s core • Core • On receiving it, Send it neighbor zones along X coordinate • Also send it Neighboring Y zones with flooding • DiscoveryHit

d471f1 d467c4 d462ba d46a1c d4213f Route(d46a1c) d13da3 65a1fc Other type of P2P Storage Systems • Example: Chord, Pastry or Tapestry P2P systems • Every node is responsible for a subset of the data • Routing algorithm locates data, with small per-node routing state • Volunteer nodes join and leave system at any time • All nodes have identical responsibilities • All communication is symmetric

P2P Ring • Nodes are arranged in a ring based on id • Ids are assigned randomly • Very large id space

Comparing P2P File Sharing Systems: Structured vs Unstructured

Comparing P2P File Sharing Systems: Structured vs Unstructured

Presentation Transcript

Structured P2P Network

P2P Network Structured Networks: Distributed Hash Tables

Position Paper: Gossiping in Structured Overlays

Applications over P2P Structured Overlays

Exploiting Routing Redundancy via Structured Peer-to-Peer Overlays

Identity Theft Protection in Structured Overlays

MANETs, P2P, and P2P MANET Overlays

Structured P2P Network for Loop Avoidance

Structured P2P overlay networks

Optimizations for Locality-Aware Structured Peer-to-Peer Overlays

A Common API for Structured Peer-to-Peer Overlays

Other Structured P2P Systems

Structured P2P Networks

Exploiting Route Redundancy via Structured Peer to Peer Overlays

Towards a Common API for Structured Peer-to-Peer Overlays

P2P Network Structured Networks: Distributed Hash Tables

Structured Overlays - self-organization and scalability

MANETs, P2P, and P2P MANET Overlays

Sloppy Management for Structured P2P services

Exploiting Route Redundancy via Structured Peer to Peer Overlays