Kaleidoscope – Adding Colors to Kademlia

Kaleidoscope – Adding Colors to Kademlia Gil Einziger, Roy Friedman, EyalKibbar Computer Science, Technion

Kademlia Overview Kademlia is nowadays implemented in many popular file sharing applications like Bit-torrent, Gnutella, and eMule. Applications over Kademlia have 100’s of millions users worldwide. Invented in 2002 by Petar Maymounkov and David Mazieres.

Kademlia is good Kademlia has a number of desirable features not simultaneously offered by any previous DHT. • It minimizes the number of configuration messages nodes must send to learn about each other. • Configuration information spreads automatically as a side-effect of key lookup. • Nodes have enough knowledge and flexibility to route queries throughlow-latency paths. • Kademlia uses parallel, asynchronous queries to avoid timeout delays from failed nodes. Easy to maintain Easy to maintain Fast Log(N) lookups Fault tolerant The “Problem” – one of the key advantages of Kademlia.

Many ways to reach the same value… K possible peers to make the first step. The first peer returns k other peers that are closer to the value. Each one of these peers returns other closer peers And so on… Until finally we reach the k-closest nodes. These nodes store the actual value!

All roads lead to Rome… Many users that love Fry… Many possible routing paths… I can’t help you all, I am just a laptop! Popular content But all of them lead to the same k closest peers.

Caching to the rescue! LC Motivation: If a value is popular, we should be able to hit a cached copy before reaching the k-closest nodes. Local Cache – After searching an item, cache it locally. (Guangmin, 2009). KadCache – After searching an item, send it to the last peer along the path. A suggestion of Kademlia authors that was not evaluated until now. KC

The three rules of Kaleidoscope- “Everything has a color” – We assign each key an additional secondary key called color. This key is generated by hashing the Kademlia key. It has a small domain (e.g. 17). “Peers only perform lookups for keys of their own color” –if a node wants to find a (key, value) pair of a different color it has to first forward the request to a correctly colored node. “Only the peer that performed the lookup cache the lookup”–More efficient use of distributed cache content.

Kaleidoscope Step 1: forward request to a correctly colored peer. Step 2: Iterative lookup that favors contacting correctly colored peers. Step 3: Cache the result, and forward it back to the initiator.

Forward the request - Looking for a value (along the Kademlia lookup path) Forward the request to one of the peers in your appropriate k-bucket. If there is a peer, favor contacting that peer. If the receiving node is not it will continue forwarding. The forward phase ends when we reach a peer. (or if we cannot advance in the XOR metric)

Break symmetry using the color: Looking for a value K possibilities, but we favor peers. We keep picking peers as the next iterative step. If there is no we can still pick any of the peers. We continue the lookup until we find the value. The value can be found either at the k-closest peersor in Peers.

Forward the result backwards. Looking for a value Thanks !!!! The value is cached at the node that performed the iterative lookup. Items are only cached on nodes of matching color. Making each node an ‘expert’ for its own color. Also, cache content do not violate users privacy.

More Colors = better cache hit rate: • Higher cache capacity– items are only stored in correctly colored peers. • Higher cache hit rate (Mathematically analyzed in the paper). However: • It takes longer to reach a correctly colored peer. • We encounter less correctly colored peers during the lookup.

Numerical example Lets assume that our value was requested before by 10% of the nodes, and that our caches are infinite. How likely are we to hit a cached value ? In local cache, all nodes are symmetric. 10% 10% 10% 10% 10% 10% 10% 10% 57% In Kaleidoscope, we can only hit when contacting peers. However, when we do contact them the likelihood is increased. 0% 0% 0% 0% 0% 0% 0% 57%

Comparative results • Emulation– We run the actual implementation, sending and receiving actual UDP packets. (Only the user is simulated) • Scale- Different network sizes up to 2,500 Kademlia peers. • Experimental settings:Each peer does: • 200 requests warm-up. • 500 requests measurement interval. • (Up to 300K find value requests in warm-up and 1.25 Million requests in measurement) • Experiment generation: Each peer receives a file with 700 requests from the appropriate workload. All users continuously play the requests.

Wikipedia trace(Baaren & Pierre 2009) • “10% of all user requests issued to Wikipediaduring the period from September 19th 2007 to October 31th. “ • YouTube trace(Chenget al, QOS 2008) • Weekly measurement of ~160k newly created videos during a period of 21 weeks. • We directly created a synthetic distribution for each week.

Comparative results Load is distributed more evenly than with local and KadCache.

Comparative results Average lookup cost is reduced. 100 items Kaleidoscope can do better than 800 items KadCache or Local cache!!!

Conclusions Our algorithm contributes in the following ways: Less messages per lookup– up to 60% reduction from cache-less Kademlia. Better load distribution – Using an overload protection algorithm that is not surveyed in this talk. Reproducibility– Kaleidoscope, KadCache and Local are released as part of the open source project OpenKad: https://code.google.com/p/openkad/. Feel free to use them! 

In the paper… Kaleidoscopes performance is mathematically analyzed for infinite caches. An overload protection mechanism that helps with efficient load distribution. Further evaluate Kaleidoscope according to more metrics, such as latency success rate and privacy. In the near future

The end: Any questions ? Thanks for listening!

Kaleidoscope Kaleidoscope caches enjoy higher hit rate than local cache… why ? If we denote by C the number of colors. By Nthe number of nodes in the system. For each color on average there are nodes. These nodes perform requests for all nodes in the system and therefore each one performs on average requests for C nodes (including itself).

Kaleidoscope Some calculations… Since any node perform requests of a certain color for C other nodes on average, the probability for a cache miss is the same as the probability that C different nodes did not request the value in the past. (*) we assume unbounded cache in this calculation.

Kaleidoscope – Adding Colors to Kademlia