1 / 34

LOOKING UP DATA IN P2P SYSTEMS

LOOKING UP DATA IN P2P SYSTEMS. Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS. Key Idea. Survey paper Discusses how to access data in a P2P system Covers four solutions CAN Chord Pastry Tapestry. INTRODUCTION. P2P systems are popular due to

edolie
Download Presentation

LOOKING UP DATA IN P2P SYSTEMS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LOOKING UP DATAIN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert MorrisIon Stoica MIT LCS

  2. Key Idea • Survey paper • Discusses how to access data in a P2P system • Covers four solutions • CAN • Chord • Pastry • Tapestry

  3. INTRODUCTION • P2P systems are popular due to • Low startup cost • High scalability at very low cost • Use of resources that would otherwise remain unused • Potential for greater robustness • Fully decentralized and distributed

  4. The lookup problem • How do we locate data in large P2P systems? • One solution • Distributed hash tables (DHT)

  5. Previous solutions (I) • Centralized database • Napster • Not scalable • Vulnerable to attacks on database

  6. Previous solutions (II) • Broadcasting • Customers broadcast their requests to their neighbors, which forward them to their own neighbors and so on • Gnutella • Doesnot scale either • Broadcast messages consume too much bandwidth

  7. Previous solutions (III) • Internet DNS • Organizes network nodes into an hierarchy • All searches start at top of hierarchy • Propagate down • Used by KaZaA, Grokster and others • Nodes higher in the tree do much more work than lower nodes • Solution vulnerable to loss of root node(s)

  8. Previous solutions (IV) • Freenet • Forwards queries from node to node until requested data are found • Emphasis is on anonymity • Not performance • Unpopular documents may become inaccessible • Nobody cares!

  9. DISTRIBUTED HASH TABLES • Implements primitive lookup(key) • Produces a path going from a node no to the node holding key • Big tradeoff is between • Keeping paths short • Minimizing state information kept by nodes

  10. Main design issues • Mapping keys to nodes in a balanced way • Use a hash function • Forwarding a lookup for a key to appropriate node • Find at each step a node closer to the node holding the key • Building routing tables • Each node should have a successor

  11. CAN • Uses a d-dimensional key space • Partitioned into hyper-rectangles • "Zones" • Each node manages a zone • Responsible for all keys in zone

  12. Neighbors • Each node keeps track of addresses of all its neighbors • Routing table • Neighbors are defined as nodes sharing a (d-1) dimensional hyper-plane • Contacts with fewer dimensions in common do not count

  13. A two-dimensional example (I)

  14. X(0, 0; 0.5, 0.5) A two-dimensional example (II) (1, 1) (0, 1) X(0, 0.5; 0.5, 1) X(0.5, 0.5; 1, 1) X(0.5, 0.25; 0.75, 0.5) X(0.75, 0;1, 0.5) X(0.5, 0; 0.75, 0.25) (0, 0) (1, 0) In reality the state space wraps

  15. A path from (0.25, 0.3) to (0.8, 0.8) (1, 1) (0, 1) X(0, 0.5; 0.5, 1) X(0.5, 0.5; 1, 1) X(0, 0; 0.5, 0.5) X(0.5, 0.25; 0.75, 0.5) X(0.5, 0; 0.75, 0.25) X(0.75, 0;1, 0.5) (0, 0) (1, 0) In reality the state space wraps

  16. Lookup • Routing tries to approximate the straight path between current zone and zone holding the key • Various optimizations attempt to reduce lookup latency

  17. Dynamic behavior • When a node joins the network • It picks random point in space • Find node managing the zone • Splits with it current zone • When a node departs • Zones are merged • More complex process

  18. Fault-tolerance • When a node fails neighbor with smallest zone takes over • Multiple failures may cause too many nodes to handle multiple zones

  19. CHORD • Assigns ID's to keys and nodes in the same address space • ID's are organized in a ring • ID 0 follows the highest ID • Each node is responsible for all keys that immediately precede it in the key space

  20. Example K1 N 24 N 4 N 20 K 6 K 15 N 12 K 10

  21. Finger table • Each node keeps a table containing IP addresses of nodes • Halfway around in the key space • Quarter-of-the-way around • … • Table has log N entries • Allows O(log N) searches

  22. Partial example N 24 N 4 N 20 N 12

  23. Fault-tolerance • Each node has a successor list • Contains IP addresses of next r successors • Guarantees routing progress as long as all r successors are not down

  24. Dynamic behavior • New node n learns its place in the Chord ring by asking any extant node to do a lookup(n) • Must also • Update successor list of its predecessor • Create its own successor list

  25. PASTRY Scalable, self-organizing, routing and object location infrastructure Each node has a node ID IDs are uniformly distributed in the ID space Includes a proximity metric to measure distances between pairs of ID's

  26. Pastry Nodes Each node maintains three sets of nodes Leaf set Closest nodes in terms of node ID's Same function as Chord's successor list Nodes in routing table Prefix routing (big idea) Neighborhood set Closest nodes in terms of proximity metric

  27. Dynamic behavior • Pastry is self-organizing • Nodes come and go • Includes a seed discovery protocol

  28. Prefix Routing At each step, a node forwards an incoming request to a node whose node id has largest common prefix with Destination ID: 1230 Node ID: 1023 Next Hop: 12--

  29. Routing table for node 1023

  30. Routing request for node 1230 Request is always send to a node having at least one more common prefix digit. Here it's node 1223

  31. At node 1233 Node with at least one more common prefix digitis node 1230

  32. TAPESTRY • Interprets keys as sequences of digits • Incremental prefix routing • Similar to Pastry • Main contribution is emphasis on proximity • In the actual world • Reduces query latency • Makes system much more complex

  33. CONCLUSIONS • Major issues include • Operational costs:searches are all O(log n); storage costs vary • Fault-tolerance and concurrent changes:only Chord and Tapestry can handle them • Proximity routing:Pastry, CAN and Tapestry have heuristics • Malicious nodes:Pastry checks node ID's

  34. Summary of costs 1 number of other nodes known by a given

More Related