1 / 19

Open Problems in Data-Sharing Peer-to-Peer Systems

Open Problems in Data-Sharing Peer-to-Peer Systems. Neil Daswani, Hector Garcia-Molina, Beverly Yang. Shawn Jeffery CS294-4 Peer-to-Peer Systems 11/10/03. Overview. P2P has lots of advantages You know the list But, challenges to widespread (lasting) acceptance

ralstonb
Download Presentation

Open Problems in Data-Sharing Peer-to-Peer Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Open Problems in Data-Sharing Peer-to-Peer Systems Neil Daswani, Hector Garcia-Molina, Beverly Yang Shawn Jeffery CS294-4 Peer-to-Peer Systems 11/10/03

  2. Overview • P2P has lots of advantages • You know the list • But, challenges to widespread (lasting) acceptance • Security, efficiency, QoS, xacts, etc • Old distributed systems techniques don’t apply to the scale and nature of P2P systems • This paper looks at search and security

  3. Caveats • Not an exhaustive survey • Other applications besides data sharing • Other issues besides search and security • Other issues within search and security • Based on work within the Stanford Peers Group

  4. Search • Assume “pure” p2p • Their definition of “hybrid” is the Napster example • Challenges • Scale • Unreliability

  5. Implementation Choices for Peer Behavior • Topology • How peers connect to each other • autonomy vs. efficiency • Data placement • Both data and metadata • Message routing • How queries are propagated • Can utilize both topology and data placement

  6. Requirements for a Search Mechanism • Expressiveness • How powerful is the query language? • Comprehensiveness • All results vs top K vs single • Autonomy • Peers may want to only connect to trusted peers

  7. Goals of a Search Mechanism (Maximize) • Efficiency • Bandwidth + processing + storage + … • Quality of Service (QoS) • User perceived qualities • Robustness • Above good during churn

  8. Expressiveness • Key Lookup • DHTs • Keyword • Can DHTs handle this? • Ranked Keyword • Want to do ranking in the network if top K is less than total results • Aggregates • Want to do this in the network as well • SQL • PIER and PeerDB

  9. Autonomy vs. Efficiency • Decoupling autonomy and efficiency is a large challenge • With less autonomy, can bound the lookup cost (Chord) • By designating some nodes more equal than others, there are some nodes guaranteed to have the answer (super-peers) • Replication increases the chance of finding the answer on a random node • Skipnet makes progress by allowing the user to tune the autonomy vs. efficiency tradeoff

  10. Autonomy vs. Robustness • By imposing rigid requirements on the system, it becomes hard to maintain

  11. QoS • Different metrics: • Number of results • Response time • Relevance (precision and recall) • Application specific • Example: Gnutella • Tradeoff between # results and cost • Directed BFS and concept clustering address this • What is the best technique to optimize this tradeoff?

  12. Security • Challenging because of the nature of P2P systems • Open • Autonomous • Have to assume a hostile environment • Address: • Availability • File authenticity • Anonymity • Access control • Want to prevent, detect, manage, and recover from attacks

  13. Availability I • Each node should be able to accept messages as well as offer services to the network • DoS Attack • Chosen-victim attack in Gnutella • A node directs all search queries it gets to a victim node • Adversaries take advantage of loose protocols • Need to prevent amplification and back-door access

  14. Availability II • Malicious nodes create Byzantine failures • Current approaches are unpopular because of complexity and overhead • Also assumes complete and secure communication between nodes • How to deal with general node failures? • Being addressed by DHTs • Other issues: • Malicious query/storage flooding • File availability • No mention of Oceanstore, etc

  15. File Authenticity • What is the definition of authenticity? • Different than integrity • Solved with checksums/signatures • Oldest Document: the first submitted • Expert-based: A single expert deems a document authentic • Voting-based: majority of expert opinions determine authenticity • Reputation-based: weigh votes of some experts more

  16. Anonymity • Good for: • “Borrowing” music • Censorship resistance • Freedom of speech • Privacy protection

  17. Anonymity vs Efficiency tradeoff • For anonymity, should not be able to determine which node an object in stored at Vs. For efficiency, should be able to determine exactly which node is responsible for an object • Onion routing/crowds address anonymity through forwarding • Still have problems if nodes collude

  18. Access Control • Utility limited if there is restrictions on data-sharing, but some level is needed for legality • Endpoint vs P2P network enforcement

  19. Other Open Issues? • What are the most pressing issues for P2P to become widely acceptable? • P2P vs centralized? • Structured vs unstructured? • Hybrid vs pure P2P? • Where will P2P make an impact? • …

More Related