Distributed Hash Tables

Distributed Hash Tables Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems March 12, 2014 Some slides based on originals by Raghu Ramakrishnan

Today • Recall HW1 Milestone 1 due Monday @ 11:59PM • For next time: please read the Google File System paper (Ghemawat et al.)

A “Flatter” Scheme: Hashing buckets • Start with a hash function with a uniform distribution of values: • h(name)  a value (e.g., 32-bit integer) • Map from values to hash buckets • Generally using mod (# buckets) • Put items into the buckets • May have “collisions” and need to chain h(x) values { 0 … 1 0 12 8 4 2 3 overflow chain

Dividing Hash Tables Across Machines • Simple distribution – allocate some number of hash buckets to various machines • Can give this information to every client, or provide a central directory • Can evenly or unevenly distribute buckets • Lookup is very straightforward • A possible issue – data skew: some ranges of values occur frequently • Can use dynamic hashing techniques • Can use better hash function, e.g., SHA-1 (160-bit key)

Some Issues Not Solved withConventional Hashing • What if the set of servers holding the inverted index is dynamic? • Our number of buckets changes • How much work is required to reorganize the hash table? • Solution: consistent hashing

Consistent Hashing – the Basis of “Structured P2P” Intuition: we want to build a distributed hash table where the number of buckets stays constant, even if the number of machines changes • Requires a mapping from hash entries to nodes • Don’t need to re-hash everything if node joins/leaves • Only the mapping (and allocation of buckets) needs to change when the number of nodes changes Many examples: CAN, Pastry, Chord • For this course, you’ll use Pastry • But Chord is simpler to understand, so we’ll look at it

Basic Ideas • We’re going to use a giant hash key space • SHA-1 hash: 20B, or 160 bits • We’ll arrange it into a “circular ring” (it wraps around at 2160 to become 0) • We’ll actually map both objects’ keys (in our case, keywords) and nodes’ IP addresses into the same hash key space • “abacus”  SHA-1  k10 • 130.140.59.2  SHA-1  N12

Chord Hashes a Key to its Successor k10 k120 N10 Node ID k112 k11 N100 Circular hashID Space k30 k99 N32 Key Hash k33 k40 N80 k52 k70 k65 N60 • Nodes and blocks have randomly distributed IDs • Successor: node with next highest ID

Basic Lookup: Linear Time N5 N10 “Where is k70?” N110 N20 N99 N32 N40 “N80” N80 N60 • Lookups find the ID’s predecessor • Correct if successors are correct

“Finger Table” Allows O(log N) Lookups ½ ¼ 1/8 1/16 1/32 1/64 1/128 N80 • Goal: shortcut across the ring – binary search • Reasonable lookup latency

Node Joins N120 • How does the node know where to go? (Suppose it knows 1 peer) • What would need to happen to maintain connectivity? • What data needs to be shipped around? N5 N10 N110 N20 N99 N32 N40 N80 N60

A Graceful Exit: Node Leaves • What would need to happen to maintain connectivity? • What data needs to be shipped around? N5 N10 N110 N20 N99 N32 N40 N80 N60

What about Node Failure? • Suppose a node just dies? • What techniques have we seen that might help?

Successor Lists Ensure Connectivity N10, N20, N32 N5 N20, N32, N40 N10 N5, N10, B20 N110 N32, N40, N60 N20 N110, N5, N10 N99 N40, N60, N80 N32 N40 N60, N80, N99 N99, N110, N5 N80 N60 N80, N99, N110 • Each node stores r successors, r = 2 log N • Lookup can skip over dead nodes to find objects

Objects are Replicated as Well • When a “dead” peer is detected, repair the successor lists of those that pointed to it • Can take the same scheme and replicate objects on each peer in the successor list • Do we need to change lookup protocol to find objects if a peer dies? • Would there be a good reason to change lookup protocol in the presence of replication? • What model of consistency is supported here? Why?

Stepping Back for a Moment:DHTs vs. Gnutella and Napster 1.0 • Napster 1.0: central directory; data on peers • Gnutella: no directory; flood peers with requests • Chord, CAN, Pastry: no directory; hashing scheme to look for data • Clearly, Chord, CAN, and Pastry have guarantees about finding items, and they are decentralized • But non-research P2P systems haven’t adopted this paradigm: • Kazaa, BitTorrent, … still use variations of the Gnutella approach • Why? There must be some drawbacks to DHTs..?

Distributed Hash Tables, Summarized • Provide a way of deterministically finding an entity in a distributed system, without a directory, and without worrying about failure • Can also be a way of dividing up work: instead of sending data to a node, might send a task • Note that it’s up to the individual nodes to do things like store data on disk (if necessary; e.g., using B+ Trees)

Applications of Distributed Hash Tables • To build distributed file systems (CFS, PAST, …) • To distribute “latent semantic indexing” (U. Rochester) • As the basis of distributed data integration (U. Penn, U. Toronto, EPFL) and databases (UC Berkeley) • To archive library content (Stanford) • It can also be used as the basis of MapReduce-like operations, as we’ll discuss next time

Distributed Hash Tables andYour Project If you’re building a mini-Google, how might DHTs be useful in: • Crawling + indexing URIs by keyword? • Storing and retrieving query results? The hard parts: • Coordinating different crawlers to avoid redundancy • Ranking different sites (often more difficult to distribute) • What if a search contains 2+ keywords? (You’ll initially get to test out DHTs in Homework 3)

From Chord to Pastry • What we saw was the basic data algorithms for the Chord system • Pastry is slightly different: • It uses a different mapping mechanism • Object is located at closest node in ID space, not successor node • It doesn’t exactly use a hash table abstraction – instead there’s a notion of routing messages • It allows for replication of data and finds the closest replica • It’s written in Java, not C • … And you’ll be using it in your projects!

Pastry API Basics (v 2.1) • See freepastry.org for details and downloads • Nodes have identifiers that will be hashed: interface rice.p2p.commonapi.Id • 2 main kinds of NodeIdFactories – IPNodeIdFactory for real nodes, RandomNodeIdFactory for virtual nodes • Nodes are logical entities: can have more than one virtual node • Several kinds of NodeFactories: create virtual Pastry nodes • All Pastry nodes have built in functionality to manage routing Derive from “common API” class rice.p2p.commonapi.Application

Creating a P2P Network • Example code in DistTutorial.java • Tutorial at http://freepastry.org/FreePastry/tutorial/ • Create a Pastry node: Environment env = new Environment(); PastryNodeFactory d = new SocketPastryNodeFactory(new RandomNodeIdNodeFactory(env), portNo, env); // Need to compute InetSocketAddress of a host to be addr NodeHandle aKnownNode = ((SocketPastryNodeFactory)d).getNodeHandle(addr); PastryNode pn = d.newNode(aKnownNode); MyApp = new MyApp(pn); // Base class of your application!

Pastry Client APIs • Based on a model of routing messages • Derive your message from class rice.p2p.commonapi.Message • Every node has an Id (NodeId implementation) • Every message gets an Id corresponding to its key • Call endpoint.route(id, msg, hint) to send a message (endpoint is an instance of Endpoint) • The hint is the starting point, of type NodeHandle • At each intermediate point, Pastry calls a notification: • app.forward(msg) • At the end, Pastry calls a final notification: • app.deliver(id, msg)

IDs • Pastry has mechanisms for creating node IDs itself • Obviously, we need to be able to create IDs for keys • Example: use java.security.MessageDigest: MessageDigest md = MessageDigest.getInstance("SHA"); byte[] content = myString.getBytes(); md.update(content); byte shaDigest[] = md.digest(); rice.pastry.Id keyId = rice.pastry.Id.build(shaDigest);

How Do We Create a Hash Table (Hash Map/Multiset) Abstraction? We want the following: • put (key, value) • remove (key) • valueSet = get (key) • How can we use Pastry to do this?

Next Time • Distributed file systems (GFS), databases (PNUTS)

Distributed Hash Tables