1 / 41

CSE6809-Distributed Search Techniques

CSE6809-Distributed Search Techniques. Lecture-1 P2P Concepts. Evolution of Distributed Computing Models. Client-Server Client : Process wishing to access data, use resources or perform operations on a different computer

coby
Download Presentation

CSE6809-Distributed Search Techniques

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSE6809-Distributed Search Techniques Lecture-1 P2P Concepts

  2. Evolution of Distributed Computing Models • Client-Server • Client: Process wishing to access data, use resources or perform operations on a different computer • Server: Process managing data and all other shared resources amongst servers and clients, allows clients access to resource and performs computation • Interaction: invocation / result message pairs invocation Client Server invocation result result Server Client Key: Process: Computer:

  3. Variants of Client-Server Model • A service provided by multiple servers • Examples: Many commercial web services are implemented through different physical servers • Motivation • Performance (e. g., cnn. com, download servers, etc.) • Reliability Service Server Client Server Client Server

  4. Client Server Proxy Server Server Client Variants of Client-Server Model (2) • Web Proxy Server • Render replication/ distributedness transparent • Caching • Proxy server maintains cache store of recently requested resources

  5. Variants of Client-Server Model (3) • Mobile Code • Code that is sent to a client process to carry out a specific task • Examples • Applets • Active Packets (containing communications protocol code) • a) Client request results in downloading of applet code • b) Client interacts with the applet Web Server Client Applet code Web Server Client Applet

  6. Variants of Client-Server Model (4) • Thin Clients and Compute Servers • Executing windows- based user interface on local computer while application executes on compute server • Example: X11 server (run on the application client side) • In reality: Palm Pilots, Mobile phones Network Computer or PC Compute Server Network Thin Client Web Server

  7. Music service Gateway Alarm service Internet Hotel wireless network Discovery service Camera TV/PC Guest’s Laptop Devices PDA Variants of Client-Server Model (5) • Spontaneous Networking • Discovery services • Services available in the network • Their properties, and how to access them (including device- specific driver information)

  8. Application Application Application Coordination Code Coordination Code Coordination Code The Peer-to-Peer Model • Applications based on peer processes • Not Client-Server • processes that have largely identical functionality

  9. A new energy erupting in the computing field Yet, P2P is the oldest architecture in the world of communications Telephones are peer-to-peer Usenet implementation of UUCP Routing in the Internet Internet endpoints have historically been peers P2P technologies return the Internet to its original vision, in which everyone creates as well as consumes The Peer-to-Peer Model (2)

  10. An Internet evolution: loosening the virtual from the physical DNS decoupled names from physical locations URNs allow to retrieve documents without knowing domain names Virtual hosting and replicated servers changed the one-to-one relationship of names to systems The Peer-to-Peer Model (3) The next major conceptual leap: let go of the notion of location

  11. Definitions • Everything except the client/server model • Network of nodes with equivalent capabilities/responsibilities (symmetrical) • Nodes are both Servers and clients called “Servents” • Direct exchange of information between hosts at the edge of the Internet

  12. A transient network that allows a group of computer users to connect with each other and collaborate by sharing resources (CPU, storage, content). The connected peers construct a virtual overlay network on top of the underlying network infrastructure Examples of overlays: BGP routers and their peering relationships Content distribution networks (CDNs) Application-level multicast And P2P apps ! Definitions (cont.)

  13. An overlay network is a set of logical connections between end hosts Overlay networks can be unstructured or structured Proximity not necessarily taken into account Overlay maintenance is an issue Overlay edge Overlay Networks

  14. application transport network data link physical application transport network data link physical application transport network data link physical Overlays: All in the application layer • Design flexibility • Topology • Protocol • Messaging over TCP, UDP, ICMP Underlying physical net is transparent to developer

  15. Cost reduction through cost sharing Client/Server: Server bears most of the cost P2P: Cost spread over all the peers (+Napster, ++Gnutella,…) aggregation of otherwise unused resources (e.g., seti@home) Improved scalability/reliability resource discovery and search (eg. Chord, CAN, …) Interoperability for the aggregation of diverse resources (storage, CPU, …) Increased autonomy independence from servers, hence providers (e.g., A way around censorship, licensing restrictions, etc.) Goals

  16. Anonymity/privacy Difficult to ensure with a central server Required by users who do not want a server/provider to know their involvement in the system Freenet is a prime example Dynamism Resources (e.g., compute nodes) enter and leave the system continuously Mechanisms are required to avoid polling (e.g., “buddy lists” in Instant messaging) Ad hoc communications P2P systems typically do not rely on an established infrastructure they build their own, e.g. logical overlay in CAN Goals (cont.)

  17. Degree of decentralization Hybrid decentralized P2P Purely decentralized P2P Partially centralized P2P Network Structure & index distribution Structured P2P Loosely structured P2P Unstructured P2P P2P Classification

  18. Central server facilitates the interaction b/w peers. Central server performs the lookups and identifies the nodes of the network. example: Napster (-) Single point of failure, scalability?, … Hybrid decentralized P2P Index Data

  19. network nodes perform the same tasks (Servents) no central coordination activity examples: original Gnutella, Freenet (-) data consistency?, Manageability?, Security?, Comm. overhead Purely decentralized P2P

  20. some of the nodes assume a more important role Supernodes act as local central indexes examples: Kazaa, recent Gnutella Partially centralized P2P files List of files

  21. data is distributed randomly over the peers and broadcasting mechanisms are used for searching. placement of data is unrelated to the overlay topology. examples: Napster, Gnutella, KaZaa Download Music A Where is “music A”? I have it Download Music A Where is “music A”? Reporting a file list music A is… Hybrid decentralized Purely decentralized Unstructured P2P

  22. Network topology is tightly controlled and files are placed at precisely specified locations. Provide a mapping between the file identifier and location Examples: Chord, CAN, PAST, Tapestry, Pastry, etc. File has a key value music A Key (music A) = 2429 Structured P2P Publish file or file position to node that has appropriate key value music A 7239 3482 2429 2145 6823 7328 4321 9832

  23. Between structured and unstructured File locations are affected by routing hints, but they are not completely specified. example: Freenet Loosely structured P2P “B” may be on right side… “A” may be on left side Here I am!! A There is no “B” B

  24. P2P classification summary

  25. Decentralization Scalability Self-Organization Performance Fault Resilience Security Anonymity Others Design requirements

  26. Client data + Server index Client data Server index Client data Napster model (Index centralized) P2P search (totally decentralized) Client Server index + data Traditional model (Yahoo) (Centralized) Decentralization • Emphasizes users’ ownership and control of data and resources

  27. The ease with which the system remains tractable with an increasing number of nodes and data elements Important feature because of the very large number of participants and data elements Immediate benefit of decentralization Early p2p systems had limited scalability! Napster was able to scale up to about 6 millions of users at the peak of its service Achieving good scalability should not be at the expense of other desirable features. Hybrid P2P systems, such as Napster, intentionally keep some amount of the operations and files centralized. Many Structured lookup protocols are proposed Scalability

  28. “A process where the organization of a system is spontaneously established, i.e, without this being controlled by the environment or an encompassing or otherwise external system” Needed for scalability and fault resilience, and because of intermittent connection of resources, and cost of ownership. Self-maintenance and self-repair DHTs are a means to achieve self-organization (e.g., in Chord, Pastry, …) Self-organization

  29. P2P systems aim to improve performance by aggregating distributed storage capacity (e.g., Napster, Gnutella) and processing cycles (e.g., seti@home) Performance is influenced by 3 types of resources: processing, storage, and bandwidth Performance: how long it takes to retrieve a file or how much bandwidth will a query consume? To optimize performance Replication Caching Intelligent routing & network organization Performance

  30. P2P systems are faced with failures commonly associated with systems. Disconnections Unreachability Partitions Node Failure The resilience of the connectivity when failures are encountered by the arbitrary departure of peers Fault resilience

  31. Protection against: Routing attacks Redirect queries in wrong direction or to non-existing nodes Misleading updates Partition Storage/Retrieval attacks Node responsible for holding data item does not store or deliver it as required Inconsistent behavior Node sometimes behaves and sometimes does not Identity spoofing Node claims to have an identity that belongs to another node Node delivers bogus content DoS attacks Let legitimate (authenticated) users through Rapid joins/leaves In P2P systems, security issues are raised by the presence of malicious peers Trust assessment and management Security

  32. Anonymity • The degree to which a system allows for anonymous transactions • Do not want someone to identify author, publisher, reader, server, or document on the systems

  33. Cost of ownership Sharing the cost of owning and maintaining a system and its content Ad-hoc connectivity Tolerate sudden disconnection and ad-hoc addition of nodes Transparency Make the system easy to use Location, address, naming, administration Interoperability Compatibility (protocols, systems, how to advertise and maintain the same levels of security, ...)? Standardization efforts: P2P WG, Grid Forum, JXTA Fairness The extent to which the workload is evenly distributed across nodes Others

  34. File Sharing Communication Collaboration Computation Databases Others P2P Applications

  35. File exchange: Killer application! (+) Potentially unlimited file exchange areas (+) High available safe storage: duplication and redundancy (+) Anonymity : preserve anonymity of authors and publishers (+) Manageability (-) Network bandwidth consumption (-) Security (-) Search capabilities P2P File Sharing

  36. Examples of P2P file sharing applications: Napster disruptive; proof of concept Gnutella open source KaZaA at some point, more KaZaA traffic then Web traffic! eDonkey becoming popular in Europe Some implementations use Kademlia (DHT) for search BitTorrent: 53% of all P2P traffic in June 2004 was BitTorrent traffic and many others… How do you explain this success? P2P File Sharing (cont.)

  37. Instant Messaging (IM) User A runs IM client on her PC Intermittently connects to Internet; gets new IP address for each connection Registers herself with “system” Learns from “system” that user B in her “buddy list” is active User A initiates direct TCP connection with User B: P2P User A and User B chat. Can also be voice, video and text. Audio-Video Conferencing Example: Voice-over-IP (Skype) P2P Communication

  38. Application-level user collaboration Shared Applications Example: Shared file editing (eg. Distributed Powerpoint) Online games Multi-players, distributed Example: Descent (www.planetdescent.com) Technical challenges Locating of peers Fault tolerance Real-time constraints Example: Groove P2P Collaboration

  39. Achieves processing scalability by aggregating the resources of large number of individual PC. Application areas Financial applications Biotechnology Astronomy,… Related projects seti@home Avaki Entropia Gridella P2P Computation

  40. Fragments large database over physically distributed nodes Overcomes limitations of distributed DBMS Static topology Heavy administration work Dissemination of data sources over the Internet Each peer is a node with a database Set of peers changes often (site availability, usage patterns) No global schema Examples: AmbientDB Xpeer : self-organizing XML DB P2P Databases

  41. Summary • Evolution to P2P model • P2P and Overlay networks • Goals in P2P networks • P2P classification • P2P design requirements • P2P applications

More Related