Resource Discovery in Unstructured P2P Networks Distributed Systems Research Seminar on 22.3.2007

Resource Discovery in Unstructured P2P NetworksDistributed Systems Research Seminar on 22.3.2007 Mikko Vapa, research studentP2P Computing Group Department of Mathematical Information Technology http://www.mit.jyu.fi/cheesefactory

Resource Discovery

Resource Discovery Problem • In peer-to-peer (P2P) resource discovery problem any node in the network can possess resources and also query these resources from other nodes Node1: Where is ? Node 2 Node 1 Node 4 Node 3

A Simple Solution for the Problem • The most studied P2P network, Gnutella, for example used Breadth-First Search (BFS) flooding algorithm which sends query to all neighbors • Problems: all resources in the network can be found, but network gets congested and there are lots of useless packets Node 2: I have it! Node 4: Node 4 has it too! Reply Node 1: Where is ? Query Node 2 Query Query Node 1 Query Query Reply Node 4: I have it! Query Node 4 Node 3

Near-Optimal Solution:Steiner Minimum Tree Problem • Optimal paths for resource discovery can be found by using non-distributed algorithm which requires global knowledge of topology and resources • Precisely, this problem can be formulated as a task of finding a Steiner Minimum Tree (SMT) from a graph:

MST k-Steiner Minimum Tree Algorithm • MST k-Steiner Minimum Tree Algorithm was developed for finding an approximation solution:

MST k-Steiner Minimum Tree Algorithm Time Complexity: whereE = number ofedges in a graph G Worst-CaseApproximation Ratio: whereR = availableresources

Efficiency =Found Replies / Query Packets • MST k-Steiner Minimum Tree algorithm shows that current local search algorithms for peer-to-peer networks are far from optimal paths

Highest Degree Search K-Steiner Minimum Tree K-Steiner Tree Algorithm locates9 resource instances with 11 query packets. For this query the approximated solutionis also the optimal solution. HDS uses almost twice as muchquery packets for this query.

Hops • MST k-Steiner does not use the shortest paths to locate resources

Branching Factor =Average Number of Children of Each Node Having Children in A Search Tree • MST k-Steiner starts as one search direction algorithm, but changes to multiple search direction algorithm when more resources are being located

MST k-Steiner Minimum Tree Algorithm • Ways to improve MST k-Steiner: • Conducting an extensive survey of related work in graph theory and k-Steiner Minimum Trees and modifying the problem to support multiple resource instances on a same node (Prize Collecting Steiner Tree problem with Quota) • Getting the results published:Vapa M., Auvinen A., Ivanchenko Y., Kotilainen N., Vuori J., ”K-Steiner Minimum Tree Is An Upper Bound for Unstructured Peer-to-Peer Resource Discovery Algorithms”, submitted to Euro-Par 2007 • Now all the tools are available for discovering the theoretical limit of peer-to-peer technology in terms of total traffic induced on a telecommunication network in a given peer-to-peer network compared to client-server approach • However, real-world applicability of ”Distributed k-Steiner minimum tree resource discovery algorithm” seems to be impossible, because all caching in P2P networks is likely to be useless (wide namespace, dynamic peers, dynamic topology and possibly changing content)

Distributed Resource Discovery

Distributed Resource Discovery • Distributed Resource Discovery needs to be solved using distributed algorithm and therefore k-Steiner Minimum Tree cannot be used directly • In distributed resource discovery the node has to forward the query based on local knowledge Node 2: I have it! But whom should I forward this query further? Reply Node 1: Where is ? Query Node 2 Unknown topology Node 1

NeuroSearch

Our Solution: NeuroSearch • NeuroSearch resource discovery algorithm uses neural networks and evolution to adapt its’ behavior to given environment • neural network for deciding whether to pass the query further down the connection or not • evolution for breeding and finding out the best neural network in a large class of local search algorithms Neighbor Node Forward the query Query Neighbor Node Forward the query

NeuroSearch’s Inputs • The internal structure of NeuroSearch algorithm • Multiple layers enable the algorithm to express non-linear behavior • With enough neurons the algorithm can universally approximate any decision function

NeuroSearch’s Training Program • The neural network weights define how neural network behaves so they must be adjusted to right values • This is done using iterative optimization process based on evolution and Gaussian mutation Define theP2P network conditions Iteratethousandsofgenerations Create candidate algorithmsrandomly Select the bestones for nextgeneration Breed a newpopulation Define the fitness requirementsfor the algorithm Finally select thebest algorithm forthese conditions Compare the bestone against other local search algorithms

An Example

Typical Query Pattern of NeuroSearch NeuroSearch uses 26 querypackets to locate 11 resourceinstances. There is a total of 17resource instances availableso locating 9 resource instanceswould have been enough to reach50% of resource instances.

Ranking List • Highest Degree Search is currently the best known local search algorithm for power-law distributed scenario NeuroSearch 2003 NeuroSearch 2004

Ideal Algorithm • NeuroSearch is close to HDS in performance, but different in nature: • NeuroSearch uses maximum number of hops far less than one search direction algorithms resulting in a low latency for searching • Ideal would be to find an algorithm that: • Has low maximum hops • Has high efficiency independent of how many resources needs to be located • Sustains these properties in many P2P scenarios

Future Work • Now the first versions of NeuroSearch are ready and analyzed • Ways to enhance NeuroSearch include: • History-based inputs to allow more accurate decisions • Studying the scalability factors affecting NeuroSearch when the P2P network size grows • Analysis of the behavior in dynamic conditions • Speeding up the optimization process by parallelizing evolutionary algorithm using distributed computing • The computational cost is demanding and replacing the optimization algorithm does not help (see: Neri, Kotilainen, Vapa, ”An Adaptive Global-Local Memetic Algorithm to Discover Resources in P2P Networks”, to be published in EvoCOMNET 2007) • Less flexible approximator could replace neural network

References • M. Vapa, A. Auvinen, Y. Ivanchenko, N. Kotilainen, J. Vuori, K-Steiner Minimum Tree Is An Upper Bound for UnstructuredPeer-to-Peer Resource Discovery Algorithms, submitted to Euro-Par 2007. • F. Neri, N. Kotilainen, M. Vapa, An Adaptive Global-Local Memetic Algorithm to Discover Resources in P2P Networks, to be published in EvoCOMNET 2007 • M. Vapa, N. Kotilainen, H. Kainulainen, J. Vuori, “Resource Discovery in P2P Networks Using Evolutionary Neural Networks”, International Conference on Advances in Intelligent Systems – Theory and Applications (AISTA 2004), 15.-18.11.2004.

Resource Discovery in Unstructured P2P Networks Distributed Systems Research Seminar on 22.3.2007