1 / 42

Analysis and Design of Algorithms for Peer-to-Peer Networks

Analysis and Design of Algorithms for Peer-to-Peer Networks. Moritz Steiner Thesis Defense. Ernst Biersack Wolfgang Effelsberg. overlay edge. Overlay networks. More about overlays. Unstructured overlays No constraints on the overlay topology or data placement

cameo
Download Presentation

Analysis and Design of Algorithms for Peer-to-Peer Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Analysis and Design of Algorithms for Peer-to-Peer Networks Moritz Steiner Thesis Defense Ernst Biersack Wolfgang Effelsberg

  2. overlay edge Overlay networks

  3. More about overlays • Unstructured overlays • No constraints on the overlay topology or data placement • Query – flood or random walk • Structured overlays (Distributed Hash Tables) • Constraints both on topology and data placement • log(N) hops, log(N) neighbors • Efficient support for exact match query

  4. Analysis and Measurements of Real WorldPeer-to-Peer Networks

  5. What is a DHT? • A distributed database for publishing and searching information • Consists of many peers, each one is responsible for storing part of the database • What is a key: unique identifier • Key = hash(IP@), or • Key = hash(string) • Each peer and each object is identified by its key • How to partition content of DB • Use key of the object to decide on which peer to store information

  6. What is a DHT? (cont’) N2 N3 N1 Internet Publish(hash(“title”)) ? Client Lookup(hash(“title”)) Publisher N4 N6 N5 • Important issues • How to partition key space • How to route • How to maintain information under churn

  7. KAD • Study KAD which is a distributed index, running on many peers • Why interesting? • Only real-world “production” Distributed Hash Table • Very popular • eMule, aMule, (Azureus) • Permanent KAD id • Possible to track the peer behavior

  8. KAD Architecture

  9. Routing Table • Peer identifier: 128 bit string • Distance metric: bitwise XOR XOR - Distance from our peer 1* 01* 001* 000* 2-Buckets

  10. KAD Architecture Lookup module is used by both, the Publishing and Retrieval module

  11. Iterative Lookup • KAD uses iterative routing • Source is responsible for entire lookup process • At each step, source sends lookup request to the next hop and waits for reply • Advantages of iterative routing • Lookup messages cannot be lost • Iterative routing is easier to debug Recursive Routing

  12. KAD Architecture Note: Lookup module is used by both, the Publishing and Search module

  13. Publish: How and Where • Where to store the information for a given kID? • On 10 nodes, who’s first 8-bits are the same as kID kID zone • A zone defined by the first 8 bits is 1/256 of the entire key space, and contains several thousand peers • How to find a key later (contact several thousand peers?) • This high replication assures that Key is found despite node churn

  14. Publishing and Retrieval • Iterative lookup (only 3 hops to get to target) • High redundancy (and overhead) to cope with churn in • Routing table • Parallel lookup • Publishing • Theory vs. Practice • Main issue is not number of hops, but • How to assure persistence under churn

  15. User Behavior in KAD • Where are they? • For how long do they stay connected? • Do they come back? • Are there regional difference?

  16. Challenge: The Full Peer Crawl • Our method: Full crawl to take a complete snapshot of all peers in KAD at a given instant • Contacts 1.5 Mio to 4.5 Mio peers • Takes 8-11 minutes • Saturates a 100 Mbit/sec link at Uni Mannheim • 8 GBytes of traffic • Carried out once a day for over a year Versatile Tool: KAD, Overnet, BitTorrent (Azureus), and the Storm Worm

  17. The Full Peer Crawl • Our Approach • Single machine • Main Memory • State less • Un-synchronized queries • Traditional Approach • Cluster of computers • Centralized database • State full client • Synchronized queries • Crucial: Synchronization between the machines

  18. Discover the Peers • Functionality • Query seed peer for contacts using “route requests” • Breadth First Search to explore the full graph • Stop when no new peers are discovered

  19. Diurnal Pattern 21:30 Beijing 21:00 Madrid, Paris, Rome

  20. Total ~ 2500 China ~ 1500 New Peers • About 700,000 new KAD IDs join KAD every day for the first time. 260 million new peers/year THIS IS HARD TO BELIEVE ! New peers: peersseen the first time on day x in one zone

  21. Session Length Weibull distribution provides a very good fit of session length distribution: Predicts the stability of a peer

  22. Crawl Conclusions • China, Europe, rest of the world • Chinese are distinct, are connected for less time • KAD ID aliasing • KAD IDs are not persistent as was assumed before • Peers come back over and over again • mean lifetime greater than 7 months • Core of stable peers with extremely long session times • Up to 78 days • Session times are heavy tailed (Weibull distributed) • Possible to predict the future behavior Developed the (today’s) only crawler for the full KAD network

  23. Content in KAD • What content is shared? • Movies? Music? Legal material? • What keywords are popular? • How much control traffic is generated?

  24. The Content Spy • How to spy on part of the hash space called I ? • Introduce a large number of spy peers that have KAD ids in I • How many spy peers? • Scalability of spy • All the spy peers are running on a single PC • To reduce the memory requirements, no state is kept 00…00 11…11

  25. Spying: Control Traffic • Spied on the 8bit zone <e3> during 12 hours • Search • Messages 561 542 • Traffic 10,8 Mbytes • Publish • Messages 5 549 183 • Traffic 966 MBytes • Route • Messages 9 761 278 • Traffic 342 MBytes x10 x100

  26. Spying: Keyword Popularity in zone <e3> The most popular words are so-called Stop Words

  27. Spy Conclusion • Findings • Interesting methodology for spying • Publish traffic 100 times larger than search traffic • Large content base with more than 80 Mio files • Improvements • Don’t publish stop words • Modify re-publish frequency to increase time until next republish: • Reduces publish traffic by factor of 10

  28. Contributions • Measurement Methodologies • The (today’s) fastest crawler for the KAD network • First to crawl the entire KAD network • Content Spy • Instrumented client • Proposed Improvements • Publishing Overhead • Security • Content Retrieval

  29. An Augmented Delaunay Overlay forDecentralized Virtual Worlds

  30. Local knowledge • Networked Virtual Environment (NVE) based on the Delaunay Triangulation.

  31. Ignoring the physical network • Neighbors in the overlay may be far away in the network topology (and the other way around…)

  32. Goal: minimize the delay penalty • Small World • hops => hops • Topology awareness • Giving priority to nodes close in the physical network Augment the overlay, introduce a second type of neighbor relationship: shortcuts Contribution: Exploring these two approaches together

  33. Shortcuts: How to find? • How to find? • Join procedure • Traveling the virtual world • Message forwarding • Learn from existing shortcuts • How to choose? • Network Proximity Awareness • delay < x ms • Network coordinate system or ping • Complete coverage of the virtual world in order to create a small-world

  34. Shortcuts: How to use? • Greedy Walk • Use shortcuts in priority • Fallback to Delaunay routing • Minimize the remaining distance in the overlay • First using shortcuts • Travel long overlay distances • Travel short underlay distances • Close to the destination, using Delaunay neighbors • Travel short overlay distances • Travel long underlay distances

  35. Simulation setup • Overlay based on the Delaunay Triangulation • Gt-itm Network Topology Generator • 2 Tier • A fraction of nodes with one neighbor are chosen to participate in the overlay • Random assignment between the nodes (underlay) and the peers (overlay)

  36. Results: Intuition Path in the overlay Path in the underlay Shortcuts Withouts Shortcuts

  37. Results: Shortcut coverage Random distribution Clustered distribution

  38. Results: Delay Polynomial -> logarithmic increase

  39. Results: Delay distribution

  40. Conclusion • Nodes that are close in the underlay may be far away in the overlay • Reducing average number of hops and delay by augmenting the overlay in very simple way • Short in the underlay • Long in the overlay • Approach is not limited to Delaunay based overlays

  41. Contributions • Distributed algorithms for the construction and maintenance of a peer-to-peer network based on a Delaunay Triangulation (in n-dimensional spaces) • Dynamic and distributed clustering of peers • Augmenting the triangulation with (a few) shortcuts to reduce the delay penalty

  42. Thank You for Your Attention! Questions!

More Related