1 / 32

P-Grid: A Self-organizing Access Structure for P2P Information Systems

P-Grid: A Self-organizing Access Structure for P2P Information Systems. Karl Aberer EPFL-DSC Distributed Information Systems Laboratory karl.aberer@epfl.ch. Overview. Peer-to-Peer Information Systems Data Access in a P2P Information System P-Grid Structure Construction algorithm

frey
Download Presentation

P-Grid: A Self-organizing Access Structure for P2P Information Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. P-Grid: A Self-organizing Access Structurefor P2P Information Systems Karl Aberer EPFL-DSC Distributed Information Systems Laboratory karl.aberer@epfl.ch ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis

  2. Overview • Peer-to-Peer Information Systems • Data Access in a P2P Information System • P-Grid • Structure • Construction algorithm • Simulation • P-Grid Search and Update • Algorithms • Simulation • Application to Gnutella • Conclusions ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis

  3. 1. P2P Information Systems • P2P Systems draw currently a lot of attention • File-sharing systems • Napster, Gnutella, FreeNet, etc. • Conferences • O’Reilly P2P conference 2001(conferences.oreilly.com/p2p/) • 2001 International Conferenceon Peer-to-Peer Computing (P2P2001) (www.ida.liu.se/conferences/p2p/p2p2001/) • … ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis

  4. Napster [www.napster.com] 1. A asks Napster: "I am searching XXX.mp3" Napster YYY.mp3 A 2. Napster tells A: "C should have XXX.mp3" 4. C delivers XXX.mp3 to A 3. A asks C: "I am requesting XXX.mp3" Internet XXX.mp3 C C ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis

  5. Gnutella [www.gnutella.com] 1. A asks B: "I am searching XXX.mp3" ZZZ.mp3 B YYY.mp3 A 2. B tells A: "C should have XXX.mp3" 4. C delivers XXX.mp3 to A 3. A asks C: "I am requesting XXX.mp3" Internet XXX.mp3 C ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis

  6. Properties of P2P Information Systems • No central coordination • No central database • No peer has a global view of the system • Global behavior emerges from local interactions • Peers are autonomous • Peers and connections are unreliable • Despite these limitations:All existing information should be accessible ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis

  7. 2. Data Access in a P2P System • B2B servers, Napster, eBay etc. • Central database (efficient) ! • Gnutella • Search requests are broadcasted (inefficient) • Anectode: the founder of Napster computed that a single search request (18 Bytes) on a Napster community would generate 90 Mbytes of data transfers. [http://www.darkridge.com/~jpr5/doc/gnutella.html] ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis

  8. Problem • Can a set of peers provide • efficient search on a data set • of which the storage space exceeds the resources of each agent substantially: e.g. s_local = O(log(s_global)) • Answer • In principle, yes ! • Requires scalable data access structure ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis

  9. route R0 route R1 route R00 route R01 data D01 Scalable Data Access Structures • Work in the following way • Every peer maintains a small fragment of the database and a routing table • The routing tables are organized such that at different levels of granularity requests can be forwarded • Replication is used to increase robustness ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis

  10. Approaches • Scalable data access structures • [Plaxton 97](distributed object addressing) • CHORD [Dabek 01] (distributed object addressing) • CAN (distributed object addressing) • FreeNet [Clarke 00] (file sharing systems) • [Litwin 97] (distributed databases) • [Yokota 99] (parallel databases) • P-Grid [Aberer 01] (decentralized databases) • etc. • Question • Are they decentralized ? ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis

  11. Comparison Criteria • Routing criteria • trees, key similarity, hashing, multidim. keys, … • Search criteria • equality, prefix, range, similarity • Performance • search, update, join and leave the network • Robustness • use of replication • Global knowledge (except nature of search keys) • number of ex. addresses • Global Control • Coordinator, central repository • Local autonomy • fixed association of roles with address Scalabledata access structure De-centralization ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis

  12. Comparison – Data Access Structure Routing Search Search perform. Replication Plaxton Binary tree equality O(log n) yes CHORD Implicit binary tree equality O(log n) no CAN Multi-dim. Grid equality O(n1/d) yes FreeNet Key similarity equality O(log n) ? yes Yokota B-Tree range O(log n) no P-Grid Binary tree prefix O(log n) yes ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis

  13. Comparison - Decentralization Global Knowledge Global Control Local autonomy Plaxton Max # participants no no CHORD IP address space no no CAN none no yes FreeNet none no no Yokota all yes no P-Grid none no yes ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis

  14. 3. The P-Grid Search Structure ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis

  15. Data Structure of a Peer a references path of peer R0 R1 R1 R1 R01 R00 R00 R00 R010 R011 R011 R011 R0101 R0100 R0100 R0100 ref data R0101 ref data R0101 ref data R0101 ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis

  16. P-Grid Construction • Bootstrap problem: How to build the P-Grid ? • without a fixed association of addresses with keys • i.e. a global schema to assign roles • violating local autonomy • efficiently ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis

  17. P-Grid Construction Algorithm (Bootstrap) • When peers meet (randomly) • Compare the current search paths p and q • Case 1: p and q are the same • If the maximal path length is not reached extend the paths and split search space, i.e. to p0 and q1 • Case 2: p is a subpath of q, i.e. q = p0… • Extend p by the complement of q, i.e. p1 • Case 3: only a common prefix exists • Forward to one of the referenced peers • Limit forwarding by recmax • The peers remember each other and exchange in addition references at all levels ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis

  18. Simulations • Implementation in Mathematica • Simulation parameters (n, k, recmax, refmax) • Peer population size n • Key length k • Recursion depth recmax • Multiple references refmax • Determine number of meetings required • by each peer • to reach on average 99% of maximal pathlength ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis

  19. Dependency on Peer Population Size • (n = 200..1000, k = 6, recmax = 2, refmax = 1) • None !? ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis

  20. Dependency on Key Length • (n = 500, k = 2..7, recmax = 2, refmax = 1) • exponential ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis

  21. Dependency on Recursion Depth • (n = 500, k = 6, recmax = 0..6, refmax = 1) • There exists an optimal value ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis

  22. Replica Distribution • (n = 20000, k = 10, recmax =2, refmax =20) ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis

  23. Properties of P-Grid Bootstrap Algorithm • Convergence ? • Does not depend on population size • Depends on key length exponentially • Depends on recursion depth • Distribution of replicas ? • Simulations indicate a reasonable distribution • Access paths to replicas are non-uniformly distributed • Balanced trees ? • Simple argument (and simulations) show that this is very likely ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis

  24. 4. Search and Update • Search straightforward • Follow own path or references • At most k steps • If multiple references are online, select randomly • Updates • All replicas need to be found • Repeated searches • Breadth first (limited recursion breadth) • Depth first • Depth first and contact buddies with same key ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis

  25. Simulation Result • (n = 20000, k = 10, recmax = 2, refmax = 20) • online probability 30% 1 breadth first search 0.8 search with buddies 0.6 depth first search 0.4 0.2 1000 2000 3000 4000 5000 ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis

  26. Update vs. Search Cost • Trade lower update quality for higher search cost • Use repeated searches to confirm results ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis

  27. P-Grid Variations • To be further explored • No global, maximal keylength • Growing and shrinking of keys • problem: integrity of referenced peers • Joining and leaving P-Grids ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis

  28. P-Grid Flexibility • The algorithm represents rather a framework than a single solution • options are left open and leave room for optimization • e.g. taking into account • access probability • existing data distribution • reachability and access cost ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis

  29. 5. Application to Gnutella • Currently under implementation • Uses Gnutella protocol and software • Controls routing of search requests using P-Grid • Problem: non-uniform distribution of search keys • Build statistics • Compute a global, prefix-preserving hash function ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis

  30. Computing the Required Resources • Assume • 10^7 searchable keys (substrings of filenames) • 10 Bytes for storing a peer address • 10^5 Bytes per peer provided for indexing • 30 % online probability • 99 % answer reliability • Then • Approx. 20.000 peers can be supported • refmax = 20 is sufficient ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis

  31. 6. Conclusions • Scalable distributed and decentralized access structures are possible • P-Grids offer a lot of flexibility to be further exploited • Powerful tools for analysis required • Foundation for many fully decentralized P2P applications • Application in mobile ad-hoc networks (www.terminode.org), Swiss national research centre at EPFL ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis

  32. References • [Aberer01] Karl Aberer, Zoran Despotovic. Managing Trust in a Peer-2-Peer Information System. To appear in the Proceedings of the Ninth International Conference on Information and Knowledge Management (CIKM 2001) 2001. • [Vingralek 98] Radek Vingralek, Yuri Breitbart, Gerhard Weikum: Snowball: Scalable Storage on Networks of Workstations with Balanced Load. Distributed and Parallel Databases 6(2): 117-156 (1998) • [Stonebraker 96] Michael Stonebraker, Paul M. Aoki, Witold Litwin, Avi Pfeffer, Adam Sah, Jeff Sidell, Carl Staelin, Andrew Yu: Mariposa: A Wide-Area Distributed Database System. VLDB Journal 5(1): 48-63 (1996) • [Plaxton 97] C. Greg Plaxton, Rajmohan Rajaraman, Andréa W. Richa: Accessing Nearby Copies of Replicated Objects in a Distributed Environment. SPAA 1997: 311-320. • [Yokota 99] Haruo Yokota, Yasuhiko Kanemasa, Jun Miyazaki: Fat-Btree: An Update-Conscious Parallel Directory Structure. ICDE 1999: 448-457. • [Litwin 97] Witold Litwin, Marie-Anne Neimat: LH*s: A High-Availability and High-Security Scalable Distributed Data Structure. RIDE 1997. • [Stoica 00] Ion Stoica, Robert Morris, David Karger, Frans Kaashoek, Hari Balakrishnan. Chord: A Scalable Peer-To-Peer Lookup Service for Internet Applications. Proceedings of the ACM SIGCOMM, 2001. • [Clarke 00] Ian Clarke, Oskar Sandberg, Brandon Wiley, and Theodore W. Hong. Freenet: A Distributed Anonymous Information Storage and Retrieval System. Designing Privacy Enhancing Technologies: International Workshop on Design Issues in Anonymity and Unobservability. LLNCS 2009. Springer Verlag 2001. • [Ratnasamy01] Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Shenker. A Scalable Content-Addressable Network. Proceedings of the ACM SIGCOMM, 2001. ©2001, Karl Aberer, EPFL-DSC, Laboratoire de systèmes d'informations répartis

More Related