The Research Advances in P2P Systems

The Research Advances in P2P Systems Zou FuTai zoufutai@cs.sjtu.edu.cn Internet Computing Lab 11/7/2003

Outline • 1.Why P2P is so attractive? • 2.Survey on Current State-of-Art of P2P Systems • 3.The Technologies • 3.1 Basic Research Issues • 3.2 The Wide Research Issues • 4.Current Hot Research Issues • 5.My Works • 6.Final Remarks 2

1.Why P2P is so attractive? • Peer-to-peer applications fostered explosive growth in recent years. • Low cost and high availability of large numbers of computing and storage resources, • Increased network connectivity • As long as these issues keep their importance, peer-to-peer applications will continue to gain importance 3

Why is called Peer? • Every node is designed to(but may not by user choice) provide some service that helps other nodes in the network to get service • Each node potentially has the same responsibility • Sharing can be in different ways: • CPU cycles: SETI@Home • Storage space: Napster, Gnutella, Freenet… 4

The Broad Application • File Sharing(BitTorrent) • Popularizing Computing(SETI@Home) • Instant Communication(ICQ, MSN) • Information Search(P2Psearch) • Cooperative Work(platform) • Internet Storage Systems(Oceanstore) • .NET(Soap, XML by Microsoft, JXTA(by SUN) 5

2.Survey on Current State-of-Art of P2P Systems • 2.1 Definitions • 2.2 A brief overview of the evolution of P2P systems • 2.3 A taxonomy of P2P systems. • 2.4 A summary of P2P systems features and performance. • 2.5 DHTs 6

2.1 Definitions • Peer-To-Peer computing is a set of techniques and architectures that enable the construction of large-scale distributed systems providing the favorable properties of: Scalability, reliability and self-organization in a potentially highly dynamic environment. 7

2.2 A brief overview of the evolution of P2P systems 8

First generation P2P systems • Napster, 1999 MP3 music sharing • Locate&Search: a central scheme. • Napster system was composed of two services, a storage service and a directory service. The storage was decentralized and functioning in a Peer-to-Peer style while the directory service was centralized. • characteristics: i) A dynamic Internet address, and ii) Freedom to join and leave the network 9

Second generation P2P systems • Gnutella, Freenet… • Intended for large scale sharing of data files • Locate&Search: Employ broadcasting and limited TTL • Reliable content location was not guaranteed 10

Third generation P2P systems • Pastry, Tapestry, Chord, CAN… • They guarantee a definite answer to a query in a bounded number of network hops. • Locate&Search: DHTs • They provide a load balanced,fault-tolerant distributed hash table, in which items can be inserted and looked up in a bounded number of forwarding hops. 11

Napster • Assume a centralized index system that maps files (songs) to machines that are alive • How to find a file (song) • Query the index system  return a machine that stores the required file • Ideally this is the closest/least-loaded machine • ftp the file • Advantages: • Simplicity, easy to implement sophisticated search engines on top of the index system • Disadvantages: • Robustness, scalability (?) 12

E? E E? m5 Napster: Example m5 E m6 F D m1 A m2 B m3 C m4 D m5 E m6 F m4 C A B m3 m1 m2 13

Gnutella • Distribute file location • Idea: multicast the request • Hot to find a file: • Send request to all neighbors • Neighbors recursively multicast the request • Eventually a machine that has the file receives the request, and it sends back the answer • Advantages: • Totally decentralized, highly robust • Disadvantages: • Not scalable; the entire network can be swamped with request (to alleviate this problem, each request has a TTL) 14

E? E E? E? E? Gnutella: Example • Assume: m1’s neighbors are m2 and m3; m3’s neighbors are m4 and m5;… m5 E m6 F D m4 C A B m3 m1 m2 15

Chord • Associate to each node and item a unique id in an uni-dimensional space • Properties • Routing table size O(log(N)) , where N is the total number of nodes • Guarantees that a file is found in O(log(N)) steps 16

Data Structure • Assume identifier space is 0..2m • Each node maintains • Finger table • Entry i in the finger table of n is the first node that succeeds or equals n + 2i • Predecessor node • An item identified by id is stored on the succesor node of id 17

Chord Example • Assume an identifier space 0..8 • Node n1:(1) joinsall entries in its finger table are initialized to itself Succ. Table 0 i id+2i succ 0 2 1 1 3 1 2 5 1 1 7 6 2 5 3 4 18

Chord Example • Node n2:(3) joins Succ. Table 0 i id+2i succ 0 2 2 1 3 1 2 5 1 1 7 6 2 Succ. Table i id+2i succ 0 3 1 1 4 1 2 6 1 5 3 4 19

Chord Example Succ. Table • Nodes n3:(0), n4:(6) join i id+2i succ 0 1 1 1 2 2 2 4 0 Succ. Table 0 i id+2i succ 0 2 2 1 3 6 2 5 6 1 7 Succ. Table i id+2i succ 0 7 0 1 0 0 2 2 2 6 2 Succ. Table i id+2i succ 0 3 6 1 4 6 2 6 6 5 3 4 20

Chord Examples Succ. Table Items • Nodes: n1:(1), n2(3), n3(0), n4(6) • Items: f1:(7), f2:(2) 7 i id+2i succ 0 1 1 1 2 2 2 4 0 0 Succ. Table Items 1 1 i id+2i succ 0 2 2 1 3 6 2 5 6 7 6 2 Succ. Table i id+2i succ 0 7 0 1 0 0 2 2 2 Succ. Table i id+2i succ 0 3 6 1 4 6 2 6 6 5 3 4 21

Query Succ. Table Items • Upon receiving a query for item id, a node • Check whether stores the item locally • If not, forwards the query to the largest node in its successor table that does not exceed id 7 i id+2i succ 0 1 1 1 2 2 2 4 0 0 Succ. Table Items 1 1 i id+2i succ 0 2 2 1 3 6 2 5 6 7 query(7) 6 2 Succ. Table i id+2i succ 0 7 0 1 0 0 2 2 2 Succ. Table i id+2i succ 0 3 6 1 4 6 2 6 6 5 3 4 22

Content Addressable Network (CAN) • Associate to each node and item a unique id in an d-dimensional space • Properties • Routing table size O(d) • Guarantees that a file is found in at most d*n1/d steps, where n is the total number of nodes 23

CAN Example: Two Dimensional Space • Space divided between nodes • All nodes cover the entire space • Each node covers either a square or a rectangular area of ratios 1:2 or 2:1 • Example: • Assume space size (8 x 8) • Node n1:(1, 2) first node that joins  cover the entire space 7 6 5 4 3 n1 2 1 0 0 2 3 4 6 7 5 1 24

CAN Example: Two Dimensional Space • Node n2:(4, 2) joins  space is divided between n1 and n2 7 6 5 4 3 n2 n1 2 1 0 0 2 3 4 6 7 5 1 25

CAN Example: Two Dimensional Space • Node n2:(4, 2) joins  space is divided between n1 and n2 7 6 n3 5 4 3 n2 n1 2 1 0 0 2 3 4 6 7 5 1 26

CAN Example: Two Dimensional Space • Nodes n4:(5, 5) and n5:(6,6) join 7 6 n5 n4 n3 5 4 3 n2 n1 2 1 0 0 2 3 4 6 7 5 1 27

CAN Example: Two Dimensional Space • Nodes: n1:(1, 2); n2:(4,2); n3:(3, 5); n4:(5,5);n5:(6,6) • Items: f1:(2,3); f2:(5,1); f3:(2,1); f4:(7,5); 7 6 n5 n4 n3 f4 5 4 f1 3 n2 n1 2 f3 1 f2 0 0 2 3 4 6 7 5 1 28

CAN Example: Two Dimensional Space • Each item is stored by the node who owns its mapping in the space 7 6 n5 n4 n3 f4 5 4 f1 3 n2 n1 2 f3 1 f2 0 0 2 3 4 6 7 5 1 29

CAN: Query Example • Each node knows its neighbors in the d-space • Forward query to the neighbor that is closest to the query id • Example: assume n1 queries f4 7 6 n5 n4 n3 f4 5 4 f1 3 n2 n1 2 f3 1 f2 0 0 2 3 4 6 7 5 1 30

2.3 A taxonomy of P2P systems. 31

2.4 A summary of P2P systems features and performance 32

2.5 DHTs • Chord • Pastry • Tapestry • CAN • D2B • Koorder • …… 33

…. node node node What is DHTs? Distributed application data get (key) put(key, data) Distributed hash table • DHT provides the information look up service for P2P applications. • Nodes uniformly distributed across key space • Nodes form an overlay network • Nodes maintain list of neighbours in routing table • Decoupled from physical network topology 34 (Figure adopted from Frans Kaashoek)

Using DHTs • One way of using the DHT abstraction would be to associate a name with each object of interest, and hash that name to a key in an m-bit virtual address space. The virtual address space is partitioned into section, which form contiguous regions of this address space. Each p2p node is responsible for one or more sections, and maintains copies of those key-value bindings whose key values lie within its assigned sections. 35

Why DHTs? • Why Middleware? • Simplifies the development for large-scale distributed Apps • Better security and robustness • Simple API • Why Do We Need DHTs? • Simplifies the development for large-scale distributed Apps • Better security and robustness • Simple API • Exploits P2P resources 36

DHTs in Context User Application store_file load_file File System Retrieve and store files Map files to blocks CFS store_block load_block Storage Replication Caching Reliable Block Storage DHash lookup Lookup Routing DHT Chord send receive Transport TCP/IP Communication 37

3..The Technologies • While P2P have arisen very exciting prospects, there are many challenges to be dealt with. • The basic challenge is that P2P systems are deployed in the dynamic model where nodes can join and leave freely. • The initial steps begin towards scalability, reliability, self-organization , and then … 38

3.1 The basic research issues • Considering the resource sharing systems which are the most popular application of P2P, the key problem is lookup. The research have been focusing on these issues as follows: • I. Self Organization • II.Lookup Efficiency • III. Lookup Quality 39

I. Self Organization • Cost of Maintaining the Structure • Failure Detects • Replication&Cache 40

Cost of Maintaining the Structure • Most of the current DHTs depend on the periodic checking and correction (stabilization procedure) for the maintenance of the structure which is crucial to the performance properties of those systems. This periodic activity costs a high number of messages and sometimes unnecessarily in the case of checking stable sections of a routing table. The awareness about this problem motivated research such as e.g., [441] where a network tries to "self-tune"' the rate at which it performs periodic stabilization. 41

Cont. • Papers: • 101 Self-Organization in Peer-to-Peer Systems • 102 Analyzing peer-to-peer traffic across large networks • 441　 Controlling the Cost of Reliability in Peer-to-Peer Overlays 42

Failure Detects • The DHT infrastructure provides robust routing: even in the face of massive failures, most messages reach their intended destination. However, when nodes fail, they take their data with them. Steps must be taken to ensure data availability and durability despite node and network failures. Two primary tools help provide highly-available and durable data: • replication • active repair. 43

Cont. • Papers: • 456　Efficient Recovery From Organizational Disconnects in SkipNet • 701 An End-to-End Approach to Network Failure Recovery • 710 Exploring Tradeoffs in Failure Detection in P2P Networks • 981 The Phoenix Recovery System: Rebuilding from the ashes of an Internet catastrope 44

Replication&Cache • Replication can be achieved by hashing each object into several different DHT cells. The replicas should be geographically distributed so that, for every host, the round-trip time to the copy nearest that host is as small as possible. The most efficient form of replication, from the standpoint of durability per byte, is achieved through erasure-coding techniques such as Reed-Solomon • Cache mechanism is important for the robust and the efficiency. 45

Cont. • Papers: • 004 Replication Strategies in Unstructured Peer-to-Peer Networks • 426Dynamic Replica Placement for Scalable Content Delivery • 430Erasure Coding vs. Replication: A Quantitative Comparison • 972 A Random structure for optimum cache size distributed hash table(DHT) peer-to-peer design 46

II. Lookup Efficiency • Logic Hops • Proximity-aware routing • Goal: Minimize end-to-end overlay path latency, not only the number of logic hops, • Load balance/hot spot problem 47

Logic Hops • Tradeoff between space and time.(state efficiency tradeoff). From O(logN) vs. O(logN) O(d) vs. O(logN) • The key: underlying graph. Tree, Chord, Torus • Papers: • 200 Scalable Peer-to-Peer Indexing with Constant State • 962 Structured Superpeers:Leveraging Heterogeneity to Provide Constant-Time Lookup • 1010 A　novel scalable and constant cache size DHT Peer-to-Peer design 48

Proximity-aware routing(delay) • The technologies: • Proximity neighbor selection(PNS) • The neighbors in the routing table are chosen based on their proximity.(Tapestry, Pastry,Chord) • Proximity routing selection(PRS) • Once the routing table is chosen, the choice of next-hop when routing to a particular destination depends on the proximity of the neighbors.(CAN,Chord) • Proximity identifier Selection (PIS) • one can pick the node identifiers based on their geographic location. Plain << FRS << FNS ≈ FNS+FRS 49

Cont. • Papers: • 602 Topology-aware routing in structured peer-to-peer overlay networks • 709 Optimizations for Locality-Aware Structured Peer-to-Peer Overlays • 712 Secure proximity aware routing for structured p2p overlays • 963 Exploiting Network Proximity in Distributed Hash Tables • 1015　The Impact of DHT Routing Geometry on Resilience and　Proximity 50

The Research Advances in P2P Systems

The Research Advances in P2P Systems

Presentation Transcript

Kangaroo: Video Seeking in P2P Systems

Semantic Overlay Networks in P2P systems

ALZHEIMER RESEARCH ADVANCES

Trust Management in P2P systems

Data types in P2P systems

SPAM DETECTION IN P2P SYSTEMS

SPAM DETECTION IN P2P SYSTEMS

SPAM DETECTION IN P2P SYSTEMS

Comparing P2P Systems

P2P Database Systems

LOOKING UP DATA IN P2P SYSTEMS

Neuroscience Advances in Reading Research

ADVANCES IN LIVING SYSTEMS SCIENCE

8. Trust in P2P Systems

Network Coding in P2P-Systems

Security and Trust in P2P systems

Exploring VoD in P2P Swarming Systems

Trust Management in P2P systems

Security Issues in P2P Systems

Advances in Applied Science Research