220 likes | 353 Views
PeerCluster: A Cluster-Based Peer-to-Peer System Xin-Mao Huang, Cheng-Yue Chang, and Ming-Syan Chen, Fellow, IEEE. ECE 6102 Qiyu Liu Ethan Trewhitt. Agenda. Background Structure Functional Protocols Structural Protocols Scaling Performance. Background – Existing P2P Systems.
E N D
PeerCluster: A Cluster-BasedPeer-to-Peer SystemXin-Mao Huang, Cheng-Yue Chang, and Ming-Syan Chen, Fellow, IEEE ECE 6102 Qiyu Liu Ethan Trewhitt
Agenda • Background • Structure • Functional Protocols • Structural Protocols • Scaling • Performance
Background – Existing P2P Systems • Centralized system - Napster • Pro: Low cost to resolve queries • Cons: Single point of failure • Decentralized/unstructured - Gnutella • Pro: Fault-tolerant, resilient to join/leaves • Cons: Search mechanism scales poorly • Decentralized/structured - PeerCluster • Same benefits of decentralized/unstructured • Cluster structure reduces broadcast flooding
Background – PeerCluster • Principle of interest grouping • A given user has few interests • Queries relate to interests • How to exploit? • Logically group users with similar topics • Increases query efficiency
Background – Query Resolution • A node receives a query if (query topic = present cluster’s interest topic) { broadcast to all nodes in present cluster // intracluster broadcasting } else { route to responsible node in corresponding interest cluster // intercluster broadcasting } • Intra/intercluster broadcasting are main operations in query resolution • How to implement?
Structure – Hypercube • Three interests can be implemented with 5-D hypercube • Nodes & edges are virtual • One hypercube address one computer • However, one computer multiple hypercube addresses
Structure – Clusters • Interest-based • Realized with hypercubes within the overall system hypercube • Initial size based on popularity, Huffman coding
Structure – Tree Creation • Assume n-dimensional hypercube with k different interest topics • Ij: jth interest topic where 0 ≤ j ≤ k - 1 • pop[Ij]: popularity of Ij • 0 < pop[Ij] < 1 and • Construct Huffman tree based on pop[Ij] • Cluster size = 2n-length(prefix[Ij])
Structure – Routing Table • Routing table created for each computer • Must keep track of mapping of neighboring computers to send messages • addr(A): addresses owned by computer A • NH(A): neighboring hypercube addresses =Uai Є addr(A) Ne(ai) – addr(A) where Ne(ai) is set of hypercube addresses adjacent to address ai
Structure – Assigned Tree • Assigned tree records number of free addresses in every cluster • Root address is lowest address • Parent and child address differ by 1 bit only • Child address is longer than parent address • Present address manages assignment of child address • Every address records number of free addresses of all its children. Initial number of free addresses of children = total number of subtrees • When parent address wants to assign free address to joining request, checks number of free addresses starting from lowest address
Functional Protocol – Broadcast Proc_Broadcast(subq, msg, node_addr, step) for (i = step to subq – 1) { dest_addr = node_addr xor 2i; send(subq, msg, dest_addr, i++); }
Functional Protocol – Route Proc_Route(msg, dest_addr, node_addr) if (dest_addr != node_addr) { i = Compare(dest_addr, node_addr); send(msg, dest_addr, node_addr xor 2i); }
JOIN Protocol • Joining computer A finds any computer B in the system • Ask computer B to find computer C with the same major interest • Ask computer C to find computer D that holds an available alias address* • Take the available address and notify neighbors • Computer D notifies its parent nodes of one less available address *if there are no available addresses, a cluster expansion must be performed
LEAVE Protocol • Leaving computer A finds the root node B (smallest address) of the cluster • Donate address (and aliases) to computer at B • Computer B notifies its neighbors that A has left
SEARCH Protocol • Searching computer A wants to find something • Query computer B in the corresponding interest cluster who has the same postfix • Computer B broadcasts query to its cluster • Computers in the queried cluster respond directly to A with relevant results
Cluster Expansion • Runs whenever a computer wants to join but the cluster is full • Query the utilization rates of neighboring clusters • Choose a neighboring cluster • The neighboring cluster splits and loans the upper half of its addresses • Upper-half addresses rejoin at the lower half
Cluster Expansion Issues • Expansion and splitting cause partitions • Clusters are no longer a single hypercube • System restoration consolidates clusters • If the cluster can’t be expanded or the system is full, the system must be expanded
System Expansion • Easier than cluster expansion • Addresses gain an additional bit, entire system doubles in size • Each node becomes two • Each cluster doubles in size
Performance Setup • Uses data from the Open Directory Project • Compares Gnutella and PeerCluster • Determined the “query efficiency”, which is the ratio of files found to query messages sent • Varied the Search Limit (SL), which acts like a TTL value • Also varied the number of interest clusters • Base 4 vs. base 2