Minseok Kwon Department of Computer Science Rochester Institute of Technology jmk@cs.rit

Week 2: P2P Overlay Networks Minseok Kwon Department of Computer Science Rochester Institute of Technology jmk@cs.rit.edu http://www.cs.rit.edu/~jmk

Client: Lookup (“title”) N2 N1 N3 N7 N4 N5 N8 N6 • Structured P2P: Can we find a target efficiently, specifically in O(log n) steps? Peer-to-Peer (P2P) Networks • Lookup problem: Given a network of peers and a keyword, how can we find the data? Publisher Key = “Davinci Code” File = mpg data Internet Client Lookup(“Davinci Code”) File = mpg data

Distributed Hash Table (DHT) • Question:Can we do the lookup in the logarithmic time? If yes, how? • In CS 3, we know how to do this if numbers are sorted in an array. Run binary search! • In P2P networks, we first compute node identifiers and key identifiers using a hash function. • Key-identifier = SHA-1(keyword) • Node-identifier = SHA-1(IP address) • We then arrange nodes and data items in a certain order in the same ID space. This is called DHT! • Now, how can we map key IDs to node IDs to support the lookup? • At the same time, the system must be scalable, robust, self-organizing, and completely decentralized.

CAN: Insert/Retrieve Files • Node I wants to add file F with keyword K. • First, get the target coordinates x and y. • Second, route to the target (x, y). • Third, store F at the target. • How can we retrieve F later? • Answer: get the target coordinates using the same hash functions. I y = hy(K) (K,F) x = hx(K)

CAN: Routing • Greedy approach • Select one of the neighbors which gets closer to the target. (a,b) (x,y)

CAN: Node Join new node I • A new node contacts a random node I initially. • I routes the new node to other randomly picked target (p, q), which is node X. • The node X zone is split in half in which the new node takes one half. (p,q) new node X

How Good is CAN? • Suppose that there are n nodes and d dimensions. • Scalable? • Inserting a new node affects only a single other node and its immediate neighbors • A node only maintains state for its immediate neighboring nodes. The number of neighbors is 2d. • Efficient? • Average routing path: (dn1/d)/4 hops. • Robust? • Resilient to node and link failures • No single point of failure since the system is completely distributed.

Chord: Basic Lookup Key 5: K5 A key is stored at its successor N105 Node 105: K20 N32 Circular 7-bit ID space Which node is K80 stored at? N90

N5 N10 N110 N20 K19 N99 N32 Lookup(K19) N80 N60 Lookup takes O(log(N)) steps Chord: Advanced Lookup N120 1/2 1/4 1/8 1/16 1/32 N80 Finger i points to successor of n+2^i

N25 N36 K30 K36 wants to join N40 K30, K38 Chord: Join N25 K36 wants to join N40 K30, K38

How Good is Chord? • Suppose that there are n nodes. • Scalable? • O(log(n)) states per lookup • Efficient? • O(log(n)) messages (or steps) per lookup • Robust? • Survives massive failures

Is DHT a Silver Bullet? • What is the dominant P2P applications? • Mass-market file sharing mostly for music, video, and software • Potential problems with these applications? • Wide range of heterogeneity • Large transient user population • Flooding does not scale; how about DHTs? • DHTs scale, especially for finding the location of a given filename. • One problem with DHTs: require lots of extra work including caching, keyword searching • Do we really need DHTs for mass-market file sharing? • NOT necessarily! • DHTs are great at finding rare files, but most queries are about popular files.

Suggested Solution: GIA • Can we make Gnutella-like P2P systems scalable? • Our answer is yes, and we designed GIA! • Idea • Unstructured (based on flooding), but take node capacity into account. • High-capacity nodes have room for more queries; why not sending more queries to them? • This will work only if: • High-capacity nodes have correspondingly more answers. • High-capacity nodes are easily reachable from other nodes.

GIA (Gianduia) Design • Make high-capacity nodes easily reachable • Dynamic topology adaptation • Make high-capacity nodes have more answers • One-hop replication • Search efficiently • Biased random walks • Prevent overloaded nodes • Active flow control Query

Dynamic Topology Adaptation • Goals • High capacity nodes are high-degree nodes. • Low capacity nodes are close to higher capacity ones. new node high-capacity node • How does a new node join? • A new node selects the node with max capacity (> its own capacity) among a small number of randomly selected nodes. • Uses a level of satisfaction as a function of capacity, degree, and age. • Neighbors must have outgoing capacity to handle forwarded queries.

One-hop Replication • Content information is exchanged during connection, and updated incrementally. • High-capacity peers can act as a proxy for low capacity peers. • Nodes keep an index of their neighbors’ shared files. high-capacity node maintains indices for both itself and neighbors high-capacity node

Active Flow Control • Sender sends queries to a neighbor only if that neighbor accepts queries. • How to know whether or not accept queries? • If that neighbor sends a token… • Active flow control periodically assigns tokens to neighbors. • Token allocation rate varies on query processing capability and buffer queue.

Search Protocol • Biased random walk • Rather than forwarding incoming queries to randomly chosen neighbors, a node selects the highest capacity neighbor for which it has flow-control tokens and sends the query to that neighbor. • Use GUID to send queries to different paths. • TTL and max_responses bound propagation. • Advantage: reduce flooding • Disadvantage: sensitive to peer failures

Performance

Transient Behavior • GIA outperforms under heavy churn.

Summary • GIA: scalable Gnutella • 3-5 orders of magnitude improvement in system capacity. • Unstructured approach is good enough • DHTs may be overkill!

Minseok Kwon Department of Computer Science Rochester Institute of Technology jmk@cs.rit

Minseok Kwon Department of Computer Science Rochester Institute of Technology jmk@cs.rit

Presentation Transcript

Department of Technology Systems College of Technology and Computer Science

Sujayyendhiren RS, Kaiqi Xiong and Minseok Kwon Rochester Institute of Technology

Department of Computer Science

Michael Aquilino Microelectronic Engineering Department Rochester Institute of Technology

Robert J. Stevens Department of Mechanical Engineering Rochester Institute of Technology

Rochester Institute of Technology

ROCHESTER INSTITUTE OF TECHNOLOGY MICROELECTRONIC ENGINEERING

Department Of Computer Science Information Technology

Rochester Institute of Technology

ROCHESTER INSTITUTE OF TECHNOLOGY MICROELECTRONIC ENGINEERING

by Nirish Dhruv Department of Computer Science Advisor Dr. Taek Kwon

Instructor: Zhe He Department of Computer Science New Jersey Institute of Technology

Instructor: Zhe He Department of Computer Science New Jersey Institute of Technology

Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology

TigerBot IV Rochester Institute of Technology

Department of Computer Science

Department of Computer Science

Rochester Institute of Technology

Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science

Elizabeth Goins--Rochester Institute of Technology-

Department of Computer Science

Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology