1 / 112

Peer to Peer Technologies

Peer to Peer Technologies. Roy Werber Idan Gelbourt. prof. Sagiv’s Seminar The Hebrew University of Jerusalem, 2001. Lecture Overview. 1 st Part: The P2P communication model, architecture and applications 2nd Part: Chord and CFS. Peer to Peer - Overview.

hosea
Download Presentation

Peer to Peer Technologies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Peer to Peer Technologies Roy Werber Idan Gelbourt prof. Sagiv’s Seminar The Hebrew University of Jerusalem, 2001

  2. Lecture Overview • 1st Part: • The P2P communication model, architecture and applications • 2nd Part: • Chord and CFS

  3. Peer to Peer - Overview • A class of applications that takes advantage of resources: • Storage, CPU cycles, content, human presence • Available at the edges of the Internet • A decentralized system that must cope with the unstable nature of computers located at the network edge

  4. Client/Server Architecture • An architecture in which each process is a client or a server • Servers are powerful computers dedicated for providing services – storage, traffic, etc • Clients rely on servers for resources

  5. Client/Server Properties • Big, strong server • Well known port/address of the server • Many to one relationship • Different software runs on the client/server • Client can be dumb (lacks functionality), server performs for the client • Client usually initiates connection

  6. Server Client Client Internet Client Client Client Server Architecture

  7. GET /index.html HTTP/1.0 HTTP/1.1 200 OK ... Client/Server Architecture Client Server

  8. Disadvantages of C/S Architecture • Single point of failure • Strong expensive server • Dedicated maintenance (a sysadmin) • Not scalable - more users, more servers

  9. Solutions • Replication of data (several servers) • Problems: • redundancy, synchronization, expensive • Brute force (a bigger, faster server) • Problems: • Not scalable, expensive, single point of failure

  10. The Client Side • Although the model hasn’t changed over the years, the entities in it have • Today’s clients can perform more roles than just forwarding users requests • Today’s clients have: • More computing power • Storage space

  11. Thin Client • Performs simple tasks: • I/O • Properties: • Cheap • Limited processing power • Limited storage

  12. Fat Client • Can perform complex tasks: • Graphics • Data manipulation • Etc… • Properties: • Strong computation power • Bigger storage • More expensive than thin

  13. Evolution at the Client Side IBM PC @ 4.77MHz 360k diskettes A PC @ 2GHz 40GB HD DEC’S VT100 No storage 2001 ‘70 ‘80

  14. What Else Has Changed? • The number of home PCs is increasing rapidly • PCs with dynamic IPs • Most of the PCs are “fat clients” • Software cannot cope with hardware development • As the Internet usage grow, more and more PCs are connecting to the global net • Most of the time PCs are idle • How can we use all this?

  15. Sharing • Definition: • To divide and distribute in shares • To partake of, use, experience, occupy, or enjoy with others • To grant or give a share in intransitive senses Merriam Webster’s online dictionary (www.m-w.com) • There is a direct advantage of a co-operative network versus a single computer

  16. Resources Sharing • What can we share? • Computer resources • Shareable computer resources: • “CPU cycles” - seti@home • Storage - CFS • Information - Napster / Gnutella • Bandwidth sharing - Crowds

  17. SETI@Home • SETI – Search for ExtraTerrestrial Intelligence • @Home – On your own computer • A radio telescope in Puerto Rico scans the sky for radio signals • Fills a DAT tape of 35GB in 15 hours • That data has to be analyzed

  18. SETI@Home (cont.) • The problem – analyzing the data requires a huge amount of computation • Even a supercomputer cannot finish the task on its own • Accessing a supercomputer is expensive • What can be done?

  19. SETI@Home (cont.) • Can we use distributed computing? • YEAH • Fortunately, the problem be solved in parallel - examples: • Analyzing different parts of the sky • Analyzing different frequencies • Analyzing different time slices

  20. SETI@Home (cont.) • The data can be divided into small segments • A PC is capable of analyzing a segment in a reasonable amount of time • An enthusiastic UFO searcher will lend his spare CPU cycles for the computation • When? Screensavers

  21. SETI@Home - Example

  22. SETI@Home - Summary • SETI reverses the C/S model • Clients can also provide services • Servers can be weaker, used mainly for storage • Distributed peers serving the center • Not yet P2P but we’re close • Outcome - great results: • Thousands of unused CPU hours tamed for the mission • 3+ millions of users

  23. What Exactly is P2P? • A distributed communication model with the properties: • All nodes have identical responsibilities • All communication is symmetric

  24. Client Client Client Internet Client Client P2P Properties • Cooperative, direct sharing of resources • No central servers • Symmetric clients

  25. P2P Advantages • Harnesses client resources • Scales with new clients • Provides robustness under failures • Redundancy and fault-tolerance • Immune to DoS • Load balance

  26. P2P Disadvantages -- A Tough Design Problem • How do you handle a dynamic network (nodes join and leave frequently) • A number of constrains and uncontrolled variables: • No central servers • Clients are unreliable • Client vary widely in the resources they provide • Heterogeneous network (different platforms)

  27. Two Main Architectures • Hybrid Peer-to-Peer • Preserves some of the traditional C/S architecture. A central server links between clients, stores indices tables, etc • Pure Peer-to-Peer • All nodes are equal and no functionality is centralized

  28. Hybrid P2P • A main server is responsible for various administrative operations: • Users’ login and logout • Storing metadata • Directing queries • Example: Napster

  29. Examples - Napster • Napster is a program for sharing information (mp3 music files) over the Internet • Created by Shawn Fanning in 1999 although similar services were already present (but lacked popularity and functionality)

  30. “beastieboy” • song1.mp3 • song2.mp3 • song3.mp3 “kingrook” • song4.mp3 • song5.mp3 • song6.mp3 “slashdot” • song5.mp3 • song6.mp3 • song7.mp3 Napster Sharing Style: hybrid center+edge Title User Speed song1.mp3 beasiteboy DSL song2.mp3 beasiteboy DSL song3.mp3 beasiteboy DSL song4.mp3 kingrook T1 song5.mp3 kingrook T1 song5.mp3 slashdot 28.8 song6.mp3 kingrook T1 song6.mp3 slashdot 28.8 song7.mp3 slashdot 28.8 1. Users launch Napster and connect to Napster server 2. Napster creates dynamic directory from users’ personal .mp3 libraries 3. beastieboy enters search criteria s o n g 5 4. Napster displays matches to beastieboy 5. beastieboy makes direct connection to kingrook for file transfer • song5.mp3

  31. What About Communication Between Servers? • Each Napster server creates its own mp3 exchange community: • rock.napster.com, dance.napster.com, etc… • Creates a separation which is bad • We would like multiple servers to share a common ground. Reduces the centralization nature of each server, expands searchability

  32. Various HP2P Models –1. Chained Architecture • Chained architecture – a linear chain of servers • Clients login to a random server • Queries are submitted to the server • If the server satisfies the query – Done • Otherwise – Forward the query to the next server • Results are forwarded back to the first server • The server merges the results • The server returns the results to the client • Used by OpenNap network

  33. 2. Full Replication Architecture • Replication of constantly updated metadata • A client logs on to a random server • The server sends the updated metadata to all servers • Result: • All servers can answer queries immediately

  34. 3. Hash Architecture • Each server holds a portion of the metadata • Each server holds the complete inverted list for a subset of all words • Client directs a query to a server that is responsible for at least one of the keywords • That server gets the inverted lists for all the keywords from the other servers • The server returns the relevant results to the client

  35. 4. Unchained Architecture • Independent servers which do not communicate with each other • A client who logs on to one server can only see the files of other users at the same local server • A clear disadvantage of separating users into distinct domains • Used by Napster

  36. Pure P2P • All nodes are equal • No centralized server • Example: Gnutella

  37. A completely distributed P2P network • Gnutella network is composed of clients • Client software is made of two parts: • A mini search engine – the client • A file serving system – the “server” • Relies on broadcast search

  38. Gnutella - Operations • Connect – establishing a logical connection • PingPong – discovering new nodes (my friend’s friends) • Query – look for something • Download – download files (simple HTTP)

  39. Gnutella – Form an Overlay Ping Pong Pong Pong Ping OK Pong Connect Ping Pong Ping Pong

  40. How to find a node? • Initially, ad hoc ways • Email, online chat, news groups… • Bottom line: you got to know someone! • Set up some long-live nodes • New comer contacts the well-known nodes • Useful for building better overlay topology

  41. A B Gnutella – Search • Toad A – look nice • Toad B – too far Green Toad I have I have Green Toad Green Toad I have I have Green Toad

  42. On a larger scale, things get more complicated

  43. Gnutella – Scalability Issue • Can the system withstand flooding from every node? • Use TTL to limit the range of propagation • 5 ^ 5 = 3125, how much can you get ? • Creates an “horizon” of computers • The promise is an expectation that you can change horizon everyday when login

  44. The Differences • While the pure P2P model is completely symmetric, in the hybrid model elements of both PP2P and C/S coexist • Each model has its disadvantages • PP2P is still having problems locating information • HP2P is having scalability problems as with ordinary server oriented models

  45. P2P – Summary • The current settings allowed P2P to enter the world of PCs • Controls the niche of sharing resources • The model is being studied from the academic and commercial point of view • There are still problems out there…

  46. End Of Part I

  47. Part II Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Robert Morris Ion Stoica, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley Roy Werber Idan Gelbourt

  48. A P2P Problem • Every application in a P2P environment must handle an important problem: The lookup problem • What is the problem?

  49. A Peer-to-peer Storage Problem • 1000 scattered music enthusiasts • Willing to store and serve replicas • How do you find the data?

  50. The Lookup Problem N2 N1 N3 Key=“title” Value=MP3 data… Internet ? Client Publisher Lookup(“title”) N4 N6 N5 Dynamic network with N nodes, how can the data be found?

More Related