1 / 22

Artur Andrzejak Zuse-Institute Berlin (ZIB)

Overview: Challenges in P2P Systems. Artur Andrzejak Zuse-Institute Berlin (ZIB). What is a Peer-To-Peer System?. Participants are autonomous (different owners) Resources are distributed Sites have equal functionality clients , when accessing information

lori
Download Presentation

Artur Andrzejak Zuse-Institute Berlin (ZIB)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Overview: Challenges in P2P Systems • Artur Andrzejak • Zuse-Institute Berlin (ZIB)

  2. What is a Peer-To-Peer System? • Participants are autonomous (different owners) • Resources are distributed • Sites have equal functionality • clients, when accessing information • servers, when serving information to other peers • routers, when forwarding information ... and so are called „peers“ • Lange number of participants

  3. P2P – a Bad Idea? • „Distribution is expensive, specialized functionality is good!“ (Garcia-Molina) • If distribution is necessary (e.g. due to reliability): • build centralized directory and use backups •  computational efficiency suffers in P2P-scenario!

  4. So Why P2P-Systems Exist at All? • User‘s view: • exploiting existing inexpensive resources • sharing costs among many • legal protection • autonomy • anonymity • Researcher‘s view: • Scalability • Self-organization and low management cost • High availability and fault-tolerance

  5. Main Challenges • Search • Reliability and security • (Resource Management)

  6. Main Challenges • Search • Reliability and security

  7. Search Mechanism Characteristics • Comprehensiveness and guarantees • Many of today‘s systems do not guarantee that existing items will be found at all, or they do not find all items • Query expressiveness • Today: only key/keyword searches; range queries, aggregates and SQL-like queries desirable • Efficiency • A major problem: too many messages for searching, some systems even use flooding • Robustness • Autonomy

  8. Search Mechanism Determines.. • Topology • From arbitrary (Gnutella) to rigid (Napster) • Rigid topology increases efficiency but decreases autonomy • Placement of Data/Metadata • Gnutella – only own data; Chord – data/metadata is carefully distributed in whole network; superpeers – metadata for superpeers is centralized • Message Routing • Each query message is sent to a group of peers • From unstructured flooding (Gnutella) to sofisticated protocols (Chord, CAN etc.)

  9. Gnutella – How it Works Query TTL = 2 Query TTL = 1 query hit download

  10. Gnutella – Characteristics

  11. N120 112 ½ ¼ 1/8 1/16 1/32 1/64 1/128 N80 Chord – How it Works Key 5 Node 105 K5 K20 N105 Circular 160-bit ID space N32 N90 K80 A key is stored at its successor: node with next higher ID Finger i points to successor of n+2i

  12. Chord - Characteristics

  13. Decouple Efficiency, Autonomy, Robustness + autonomy gnutella chord + efficiency robustness + (From „Open Problems in Data Sharing Peer-To-Peer Systems“ by Hector Garcia-Molina)

  14. Novelty: Location-Independent Routing • Each unique document or endpoint has a globally unique identifier (GUID) • Locating data can be seen as a routing problem: • clients construct messages addressed with GUIDs and let peers pass these messages until object is located • Known as Decentralized Object Location and Routing (DOLR) paradigm or Distributed Hash Table (DHT) • Advantages: • allows for routing messages to objects without knowing their location • data can be stored anywhere, amidst millions of peers  scalability • provides locality: use of local resources instead of distant, if possible • Implemented in Chord, CAN, Pastry, Tapestry

  15. Main Challenges • Search • Reliability and security

  16. Essence: Untrusted/Unreliable Components • Centralized systems have components which are professionally maintained and trusted to behave well • Components of a P2P-system may crash or fail at any time (unreliable components) • Also, the participants might be adversarial, attempting to damage the system (untrusted components) • Failure rate ~ system size  larger P2P-systems are guaranteed to have malfunctioning components • P2P-system builders must invoke new design principles to achieve guarantees • „only the aggregate behaviour of many peers can be trusted“ • Techniques for untrusted components solve issues for unreliable ones (converse is not true)

  17. Achieving Reliability and Security • Replication • Cryptography • Byzantine Agreement • Exploiting differences • „Thermodynamic“ Systems Design

  18. Replication • Redundancy helps to achieve fault tolerance by providing online replacements for faulty resources • Advanced P2P Systems (Intermemory, OceanStore, FreeHaven) use so called erasure coding • Each chunk of data is transformed into many fragments • Very low Fraction of Blocks Lost Per Year (FBLPY) Losses per year for 6 months repair interval: Std: 0.03 blocks Erasure: 10-35 blocks

  19. Byzantine Agreement • Immutable (read-only) data can be easily signed („sealed“) by cryptographic means to detect and discard faulty information • Also repairs are possible by these techniques • However, some decisions are active: e.g. changing, replacing or deleting information • These decisions must be taken collectively to eliminate corrupted nodes • Here Byzantine Agreement can be used: only if a correct number of nodes agree, a unified decision is taken • Works if no more than 1/3 of the nodes are compromized • Applied in OceanStore and Farsite

  20. Exploiting Differences • Some peers are „more equal“ than others: • Different CPUs, memory, storage cap., network connectivity • Some are professionally managed, others not • Physically, some are locked in secure rooms, others are public • We can exploit these differences to tune performance, availability, reliability, security • Examples: • Computers with higher connectivity as supernodes • Actively managed nodes for Byzantine Agreement • Placing archival data on servers deep in mountains

  21. „Thermodynamic“ Systems Design • A new concept of John Kubiatiowicz – „Stability through Statistics“ • We can give guarantees on collective behaviour while individual nodes are not predictable • Over time, the latent order of a system is destroyed – this resembles the 2nd law of thermodynamics: „entropy of closed systems increases“ • Therefore, self-organizing behaviour is necessary: • Servers must continuously collect, regenerate and redistribute fragments in a data storage system • They must adjust routing links in the DOLR to correct changes • They must recognize faults without global communication • Entropy reduction can be also achieved by introspection • System observes itself, applies analyses, then adapts accordingly • Research in the area of IBM‘s Autonomic Computing

  22. P2P-Research at ZIB: CSR-DMS • Management of large scientific data-sets (up to 400 Mio. files) • Should improve existing approaches in the area of GRID technologies • Also as a framework for research • Architecture is P2P-based • Should exhibit self-management abilities • Candidates for Diplomarbeiten are very welcome!

More Related