1 / 38

Data Management in Peer-to-Peer Systems

Data Management in Peer-to-Peer Systems. Qi Sun Beverly Yang. Introduction. What is P2P? Distributed nodes Equal roles and functionality Providing/exchanging resources Why now? PCs are becoming valuable resources! Computing devices becoming pervasive. Many Applications .

Download Presentation

Data Management in Peer-to-Peer Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Management in Peer-to-Peer Systems Qi Sun Beverly Yang

  2. Introduction • What is P2P? • Distributed nodes • Equal roles and functionality • Providing/exchanging resources • Why now? • PCs are becoming valuable resources! • Computing devices becoming pervasive

  3. Many Applications • Grid computing • e.g., Seti-at-Home • Ubiquitous computing • Cell phones, wireless devices, hand helds • Cars, refrigerators, microwaves • Preservation/Archival systems • File-sharing

  4. File-sharing model • Data: (Title string, File blob) • Query: “Find songs by Madonna” • Result: • 63.274.18.3: Madonna – “Vogue” • 63.274.18.3: Madonna – “Beautiful Stranger” • 27.48.3.124: Madonna – “Like a Prayer” • 17.64.75.18: Madanna – “Vogue” • How is this “search” implemented?

  5. Many Approaches • Napster • Gnutella • KaZaA • OverNet • BitTorrent

  6. ? C,E,F Server Napster • “Hybrid” P2P system A D E Index B F C Peers

  7. Napster • Benefits • Efficient • Comprehensive • Can handle complex queries • Disadvantages • Server is single point of failure • Server is performance bottleneck • Server costs money to maintain!!!

  8. Gnutella • “Pure” P2P system TCP “Overlay network”

  9. = source = forward query = processed query = found result = forward response Gnutella

  10. Gnutella • Benefits • No server needed (cost) • Robust (nodes can come and go) • Can handle complex queries per node • Disadvantages • Not comprehensive (can miss results) • Inefficient! (many messages)

  11. Index Index Index KaZaA • “Super-peer” P2P system

  12. Index Index Index ? Like Gnutella Like Napster KaZaA • “Super-peer” P2P system

  13. KaZaA • Change the ratio of clients to super-peers • Napster: everyone (minus one) is a client • Gnutella: no one is a client • Combines strengths of hybrid and pure systems • Leverages heterogeneity of peers • e.g., bandwidth, memory, processing power • Napster: everyone (minus one) is a client • Gnutella: no one is a client

  14. 3561246 Hash(ABC) ABC ABC 7x106 – 8x106 Y 106 – 2x106 3x106 – 4x106 0 - 106 OverNet • Uses all peers to build a distributed index Z W . . . X . . .

  15. OverNet: Searching • Given key k, which peer has the index? 4 2 8 1 Peer 0 looking for k=25 16 0 31 Distributed Hash Table (DHT) 25 24

  16. Blk1 Blk2 Blk3 . . . Blk n BitTorrent • Downloading of a single file Tracker Peers 2, 3, 6

  17. BitTorrent: Downloading • Tit-for-Tat strategy • Choking Mechanism • Periodic un-choke • Rare blocks first B: 3,5 A: 1,2,3,4 C: 2,3,4 B: 3 A: 1,2,3,4 C: 4

  18. Challenges • Performance, Performance, Performance! • Find rare/popular files quickly • Minimize maintenance cost • Spread workload evenly • Etc. • Zillions of heuristics/variants

  19. Challenges (2) • Participation: Peers are selfish! • Do not want to “donate” bandwidth • Do not want to share their files • Do not care about others • Need some incentive mechanism!!

  20. Challenges (3) • Authenticity of data • How do you know you have the right file? • Bogus copies • Corrupt copies • Need detection/correction mechanisms

  21. Techniques • Performance • Routing Indices • Network Awareness • Participation • SLIC • Micropayments • Correctness • DoS Prevention • Reputation Systems

  22. ? Routing Indices

  23. DB 2,4 OS 2 AI 2,3,4 EE 3 1 DB 11,13 AI 11,12 AI 8,9 EE 10 DB 5 OS 5,6,7 2 3 4 EE AI DB Routing Indices (2) DB? 5 6 7 8 9 11 10 12 13 DB OS OS OS AI EE AI AI DB DB AI

  24. Routing Indices (3) • Benefits • Potentially reduce # messages • Drawbacks • Update cost (any time you have state) • Size of index

  25. File Y Reputation Systems I do! Who has file X? Bob Alice

  26. ? ? ? ? ? ? Reputation Systems Node 1 Node 2 • Have a “opinion list” • Base on personal experience? • Problem: sparse Node 3 Node 4 Node 5 Node 0 Node 6 Node 7 Node 8

  27. Node 4 Node 1 Node 2 Node 6 Reputation Systems • Have a “trust list” • Base on personal experience? • Problem: sparse • Ask friends • Efficient • Automatic

  28. Micropayments Micropayments • Only if you have money, will people do things for you! • Like a vending machine • Goods are cheap • Security can’t be too expensive

  29. Scalability and performance bottleneck Micropayments $ • Server is needed… • Handle accounts • Distribute and cash coins • Security

  30. Micropayments • Peers can do work too! • Challenge: SECURITY $

  31. Fragment B B A Fragment A SLIC: Link-based Incentive • Use quality of service as incentive They need each other to reach more nodes. Þ Can retaliate

  32. SLIC (2) B C D W(A,C) W(A,D) W(A,B) A Adjust weights, and use them to reward good neighbors and to penalize bad ones

  33. Network Awareness • Overlay network can be poor! Timbuktu Mali, Africa San Francisco Palo Alto

  34. Timbuktu Mali, Africa Palo Alto Network Awareness (2) • Form only “good” links • Probe a few and pick the best San Francisco

  35. Timbuktu Mali, Africa Palo Alto Network Awareness (3) • “Swap” peers around San Francisco

  36. Denial of Service • Malicious peers can flood queries on unstructured networks • Rate limit • Incentive • Micro-payment

  37. Denial of Service • Malicious peers can drop queries and indices in structured networks • Tracing/Audit • Reorganization • Alternate path

  38. Concluding Remarks • P2P provides a cheap infrastructure for leveraging the capacities of the masses. • P2P’s “openness” is both its strength and its weakness.

More Related