1 / 19

Recent Problems in Peer-to-peer Content Retrieval

Recent Problems in Peer-to-peer Content Retrieval. AMHERST. Brian Neil Levine Dept. of Computer Science UMass Amherst. The work by BNL and his students presented here was supported in part by National Science Foundation awards ANI-033055 and EIA-0080199. Motivation.

mave
Download Presentation

Recent Problems in Peer-to-peer Content Retrieval

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Recent Problems in Peer-to-peer Content Retrieval AMHERST Brian Neil Levine Dept. of Computer Science UMass Amherst The work by BNL and his students presented here was supported in part by National Science Foundation awards ANI-033055 and EIA-0080199. NeXtworking’03 June 23-25,2003, Chania, Crete, Greece The First COST-IST(EU)-NSF(USA) Workshop on EXCHANGES & TRENDS IN NETWORKING

  2. Motivation • Peer-to-peer content sharing is one of the largest portions of traffic on the network. • Illegal (gnutella, kazaa) or not (Apple iTunes), understanding the characteristics of such traffic is important to a well-performing Internet. • This talk: • What’s being done in p2p content & retrieval. • Overview of research in p2p traffic measurement. • How such measurements can affect p2p design. NeXtworking’03 June 23-25,2003, Chania, Crete, Greece The First COST-IST(EU)-NSF(USA) Workshop on EXCHANGES & TRENDS IN NETWORKING

  3. Distributed Centralized Robust P2P P2P What is a p2p architecture? Lots over-budgeted robust,fault-tolerant Resources out of your pocket to make it work (=money) Chance you’ll be held accountable successful unsuccessful Little 1 Many Peers required to make it work NeXtworking’03 June 23-25,2003, Chania, Crete, Greece The First COST-IST(EU)-NSF(USA) Workshop on EXCHANGES & TRENDS IN NETWORKING

  4. Overview of P2P research problems • Content search • P2P designs are not one-size-fits-all. • Different applications require different solutions. • Peer selection • Finding the best peer of many serving a file… • Incentives for peers to participate • Security and privacy • Evaluation against measurement traces • What does real p2p traffic look like? • What’s the real performance of these protocols? NeXtworking’03 June 23-25,2003, Chania, Crete, Greece The First COST-IST(EU)-NSF(USA) Workshop on EXCHANGES & TRENDS IN NETWORKING

  5. DHTs work great when: each node is associate with a unique keyword (e.g., SOS). The keywords stored are well-known e.g., DNS lookup using a DHT Hashes of keywords ensure work is evenly distributed Libraries of content? Real measurements show: Nodes store more than one file, each file brings at least one keyword h(“The Red Hot Chili Peppers”, “Breaking the girl”) Content search is difficult: index each term? Or index whole title? Or part? h(“red”), h(“hot”), h(“chili”),… H(“let”), h(“there”), h(“be”), h(“light”)… Some stored keywords are more popular than others. Some queried keywords are more popular than others. Circular Pegs, square holes… NeXtworking’03 June 23-25,2003, Chania, Crete, Greece The First COST-IST(EU)-NSF(USA) Workshop on EXCHANGES & TRENDS IN NETWORKING

  6. Number of unique keys Number of files in user library How many keys per new user in your app? • DNS: 1-2 keys pers authoritative domain. • [Left] : Unique terms in real collections of shared files (based on file names only! Not idv3 tags). NeXtworking’03 June 23-25,2003, Chania, Crete, Greece The First COST-IST(EU)-NSF(USA) Workshop on EXCHANGES & TRENDS IN NETWORKING

  7. 100% 80% 60% 40% 20% 0% Percentage of peers contacted to index files Cumulative percentage of peers (ranked) Cost of indexing files in DHTs e.g., in a 100-node network, 40% of the nodes must contact 100% of the peers to index filenames for each join and leave. NeXtworking’03 June 23-25,2003, Chania, Crete, Greece The First COST-IST(EU)-NSF(USA) Workshop on EXCHANGES & TRENDS IN NETWORKING

  8. Distributed Hash Tables CAN, Chord, Pastry, etc… Distribute the index Cost: updating pointers to content Methods of p2p search Much focus • Flooded search over • Random graphs • Small-world networks • Power-law degree networks • Return results only on the content you have stored • Make it easy for searches to traverse the graph • Cost: updating the graph; group similar nodes together Not enough focus • Links represent • Nothing • Relational autocorrelation • “Heat-seeking search” over an organized network. NeXtworking’03 June 23-25,2003, Chania, Crete, Greece The First COST-IST(EU)-NSF(USA) Workshop on EXCHANGES & TRENDS IN NETWORKING

  9. Searching for Topics not files… • Information Retrieval searches: • Show me all documents that are related to “salsa dancing” (as google does) • You can’t index every word of every document • It’s hard enough to handle file names. • One approach: place nodes with similar content together. NeXtworking’03 June 23-25,2003, Chania, Crete, Greece The First COST-IST(EU)-NSF(USA) Workshop on EXCHANGES & TRENDS IN NETWORKING

  10. Arranging topology to match content • Arrange topology so that we increase the amount of relevant information returned to peers for limited BFS of the graph. • Tough problem! • Can you find answers without flooding? Can you route queries towards content? Optimal Per-queryArrangement Arrangement Random (gnutella) NeXtworking’03 June 23-25,2003, Chania, Crete, Greece The First COST-IST(EU)-NSF(USA) Workshop on EXCHANGES & TRENDS IN NETWORKING

  11. Retrieval (briefly) • Content is likely to be available from several peers. • From which peer do you download? • Random (current approach) • Heuristics (ping, hop count, dl time) • (but, most peers you’ve never seen before) • Learned/Adaptive methods (e.g., MDPs) • See [BZLS; IPTPS’03] NeXtworking’03 June 23-25,2003, Chania, Crete, Greece The First COST-IST(EU)-NSF(USA) Workshop on EXCHANGES & TRENDS IN NETWORKING

  12. Selecting for both accuracy and speed • Of the set of 100, IR techniques will chose servers it believes are most accurate (red) • Selecting nodes for best transfer times picks a different set (green). • Trivial composition doesn’t work. ... Client NeXtworking’03 June 23-25,2003, Chania, Crete, Greece The First COST-IST(EU)-NSF(USA) Workshop on EXCHANGES & TRENDS IN NETWORKING

  13. Some other lessons learned from measurement (openNap) • What happened to content delivery on the Internet? • What happened to serving video on the Internet? NeXtworking’03 June 23-25,2003, Chania, Crete, Greece The First COST-IST(EU)-NSF(USA) Workshop on EXCHANGES & TRENDS IN NETWORKING

  14. Percentage of all down/uploads Percentage of users down/uploading Who’s transferring/serving files? (openNap) NeXtworking’03 June 23-25,2003, Chania, Crete, Greece The First COST-IST(EU)-NSF(USA) Workshop on EXCHANGES & TRENDS IN NETWORKING

  15. Percentage of all session >x Session Lengths (gnutella) Length of node availability (10 min. increments) NeXtworking’03 June 23-25,2003, Chania, Crete, Greece The First COST-IST(EU)-NSF(USA) Workshop on EXCHANGES & TRENDS IN NETWORKING

  16. 100% 80% 60% 40% 20% 0% Queries Resolved Cumulative percentage of work doing “x” performed Keys indexed Msgs rcvd Msgs sent Equal work Percentage of all nodes (ranked) Balance of work in Chord(simulation based on real traces) NeXtworking’03 June 23-25,2003, Chania, Crete, Greece The First COST-IST(EU)-NSF(USA) Workshop on EXCHANGES & TRENDS IN NETWORKING

  17. Does caching queries balance load?(simulation based on real traces) • normal: 20% answer 84% of the queries. • cached (infinite buffer): 20% answer 55% of the queries. • Answer: yes, but still a problem. NeXtworking’03 June 23-25,2003, Chania, Crete, Greece The First COST-IST(EU)-NSF(USA) Workshop on EXCHANGES & TRENDS IN NETWORKING

  18. Some Measurements of P2P • Ripeanu et al. – Gnutella topology does not match underlying network topology. MMCN'02 • Markatos – A simple, query caching scheme can reduce query traffic by a factor of two. CCGrid 2002 • Saroiu et al. – Gnutella bandwidth, latency, and node availability over a 60-hour period. Multimedia Systems Journal v8n6 • Adar and Huberman – A free-rider study, using Gnutella’s QueryHit messages to infer peer downloads. • Chu, Labonte, Levine – Measurements of Napster and Gnutella file popularity and session lengths. Proc. ITCom 2002 • Bhagwan et al – effects of dhcp on availability of nodes in p2p, TOD, joins and leaves IPTPS 2003 • Chu, Labonte, Levine – Measurements of all transfers and most libraries in a large p2p system (openNap); evaluation of Chord NeXtworking’03 June 23-25,2003, Chania, Crete, Greece The First COST-IST(EU)-NSF(USA) Workshop on EXCHANGES & TRENDS IN NETWORKING

  19. Summary Open Issues • Applications of p2p are broad. • Methods other than DHT are possible. • Measurement studies have revealed the skewed distributions of p2p systems. • Can these be modeled? • DHTs are limited in their application to content sharing. • Work well for single-key systems • Stronger efforts are needed to match research designs to real characteristics of systems. • Thanks to Jacky Chu and Kevin Labonte for doing the balance of the work. NeXtworking’03 June 23-25,2003, Chania, Crete, Greece The First COST-IST(EU)-NSF(USA) Workshop on EXCHANGES & TRENDS IN NETWORKING

More Related