870 likes | 905 Views
Dive into the core principles of ZooKeeper, including its functioning, practical applications, coordination primitives, fault tolerance, and distributed system fallacies. Master the fundamentals with programming exercises and practical examples.
E N D
ZooKeeperTutorial Flavio Junqueira BenjaminReed Yahoo!Research hCps://cwiki.apache.org/confluence/display/ZOOKEEPER/EurosysTutorial Eurosys 2011 ‐Tutorial 1 Used for 14-848 Discussion, 11/6/2017 by Gregory Kesden
Plan fortoday • Firsthalf • Part1 • Motivation andbackground • Part2 • How ZooKeeper works onpaper • Secondhalf • Part3 • Share some practicalexperience • Programmingexercises • Part4 • Somecaveats • Wrapup Eurosys 2011 ‐Tutorial 2
ZooKeeperTutorial Part 1 Fundamentals
Yahoo!Portal Search E‐mail Finance Weather News Eurosys 2011 ‐Tutorial 4
Yahoo!: Workloadgenerated • Homepage • 38 million users a day(USA) • 2.5 billion users a month(USA) • Websearch • 3 billion queries amonth • E‐mail • 90 million actualusers • 10min/visit Eurosys 2011 ‐Tutorial 5
Yahoo!Infrastructure • Lots ofservers • Lots ofprocesses • High volumes ofdata • Highly complex soawaresystems • … and developers are meremortals • Yahoo! Lockport DataCenter Eurosys 2011 ‐Tutorial 6
Coordinationisimportant Eurosys 2011 ‐Tutorial 7
Coordinationprimitives • Semaphores • Queues • Leaderelection • Groupmembership • Barriers • Configuration Eurosys 2011 ‐Tutorial 8
Even small ishard… Eurosys 2011 ‐Tutorial 9
A simplemodel • Workassignment • Master assignswork • Workers execute tasks assigned bymaster Master Worker Worker Worker Worker Eurosys 2011 ‐Tutorial 10
Mastercrashes • Single point offailure • No work isassigned • Need to select a newmaster Master Worker Worker Worker Worker Eurosys 2011 ‐Tutorial
Workercrashes • Not as bad… Overall system stillworks • – Does not work if there aredependencies • Some tasks will never beexecuted • Need to detect crashedworkers • Master Worker Worker Worker Worker Eurosys 2011 ‐Tutorial
Worker does not receiveassignment • Same problem asbefore • Some tasks may not be executed • Need to guarantee that worker receives assignment • Master Worker Worker Worker Worker Eurosys 2011 ‐Tutorial
Fault‐tolerant distributedsystem Master CoordinationService Master Worker Worker Worker Worker Worker Worker Eurosys 2011 ‐Tutorial
Fault‐tolerant distributedsystem Master CoordinationService Master Worker Worker Worker Worker Worker Worker Eurosys 2011 ‐Tutorial
Fullydistributed CoordinationService Worker Worker Worker Worker Worker Worker Eurosys 2011 ‐Tutorial
Fallacies of distributedcomputing The network isreliable. Latency iszero. Bandwidth isinfinite. The network issecure. Topology doesn'tchange. There is oneadministrator. Transport cost iszero. The network is homogeneous. Peter Deutsch,http://blogs.sun.com/jag/resource/Fallacies.html Eurosys 2011 ‐Tutorial
One morefallacy • You know who is alive Eurosys 2011 ‐Tutorial
Why is itdifficult? • FLP impossibilityresult • Asynchronoussystems • Consensus is impossible if a single process cancrash • Fischer, Lynch, Paterson, ACM PODS,1983 • According to Herlihy, we do needconsensus • Wait‐free synchronization • Wait‐free: completion in a finite number ofsteps • Universal object: equivalent to solving consensus forn • processes • Herlihy, ACM TOPLAS,1991 Eurosys 2011 ‐Tutorial
Why is itdifficult? • CAPprinciple • – Can’t obtain availability, consistency, and parOtiontolerancesimultaneously Availability ParOtiontolerance Consistency Gilbert, Lynch, ACM SIGACT NEWS, 2002 Eurosys 2011 ‐Tutorial 20
The case for a coordinationservice • Many impossibilityresults • Many fallacies to stumbleupon • Several common requirements across applications • Duplicating isbad • Duplicating poorly is evenworse • Coordination service • Implement it once andwell • Share by a number ofapplications Eurosys 2011 ‐Tutorial
Currentsystems • Chubby,Google • Lockservice • Burrows, USENIX OSDI,2006 • Centrifuge,Microsoft • Leaseservice • Adya et al., USENIX NSDI,2010 • ZooKeeper,Yahoo! • Coordinationkernel • On Apache since2008 • Hunt et al., USENIX ATC,2010 Eurosys 2011 ‐Tutorial
Example – Bigtable,HBase • Sparse column‐oriented datastorage • Tablet: range ofrows • Unit ofdistribution • Architecture • Master • Tabletservers Eurosys 2011 ‐Tutorial
Example – Bigtable,HBase • Masterelection • Tolerate mastercrashes • Metadatamanagement • ACLs, Tabletmetadata • Rendezvous • Find tabletserver • Crashdetection • Live tabletservers Eurosys 2011 ‐Tutorial
Example – Webcrawling • Fetchingservice Master • – Fetch Web pages for search engine • Masterelection • Assignwork • Metadatamanagement • Politenessconstraints • Shards • Crashdetection • Liveworkers Fetcher Fetcher Fetcher Eurosys 2011 ‐Tutorial
And moreexamples… • GFS – Google FileSystem • Masterelection • File systemmetadata • KaCa ‐ Document indexingsystem • Shardinformation • Index version coordination • Hedwig – Pub‐Subsystem • Topicmetadata • Topicassignment Eurosys 2011 ‐Tutorial
Summary of Part1 • Large infrastructures requirecoordination • Fallacies of distributedcompuOng • Theory results: FLP,CAP • Coordination services • Examples • Websearch • Storagesystems Eurosys 2011 ‐Tutorial
ZooKeeperTutorial Part 2 Theservice
ZooKeeperIntroduction • Coordinationkernel • Does not export concreteprimitives • Recipes to implementprimitives • File system basedAPI • Manipulate small data nodes:znodes Eurosys 2011 ‐Tutorial 29
ZooKeeper:Overview Ensemble ClientApp Follower ZooKeeper ClientLib Session Leader atomically broadcast updates Leader ClientApp Follower ZooKeeper ClientLib Session Follower Replicated system ClientApp Follower ZooKeeper ClientLib Session Eurosys 2011 ‐Tutorial 30
ZooKeeper: Readoperations Ensemble Read operations processed locally ClientApp Read“x” X =10 Follower ZooKeeper ClientLib Leader ClientApp Follower ZooKeeper ClientLib Follower ClientApp Follower ZooKeeper ClientLib Eurosys 2011 ‐Tutorial 31
ZooKeeper: Writeoperations Ensemble ClientApp Write“x”,11 X =11 Follower ZooKeeper ClientLib X =11 Leader ClientApp X =11 Follower ZooKeeper ClientLib Follower ClientApp Follower ZooKeeper ClientLib Replicates across aquorum Eurosys 2011 ‐Tutorial 32
ZooKeeper: Semantics ofSessions • A prefix of operations submitted through a session areexecuted • Upondisconnection • Client lib tries to contact anotherserver • Before session expires: connect to newserver • Server must have seen a transaction id at least as large as thesession Eurosys 2011 ‐Tutorial 33
ZooKeeper:API • Create znodes:create • – Persistent, sequential,ephemeral • Read and modify data: setData,getData • Read the childrenofznode: getChildren • Check if znode exists:exists • Delete a znode:delete Eurosys 2011 ‐Tutorial 34
ZooKeeper:API • Order • Updates: Totally ordered,linearizable • FIFO order for clientoperations • Read: sequentiallyordered • write(x,10) • Client1: • write(x,11) • Client2: • write(x,10) write(x,11) • Sequen7al: Eurosys 2011 ‐Tutorial 35
ZooKeeper:API • Order • Updates: Totally ordered,linearizable • FIFO order for clientoperations • Read: sequentiallyordered write(x,10) read(x) Client1: read(x) Client2: write(x,10) read(x) read(x) Sequen7al: Eurosys 2011 ‐Tutorial 9
ZooKeeper:Example 1‐ create“/C‐”, “Ci”, sequential, ephemeral 2‐ getChildren“/” 3‐ If not leader, getData“first node” Client1 (C1) / C1/C‐1 Client2 (C2) C3 /C‐2 Client3 (C3) C2/C‐3 Eurosys 2011 ‐Tutorial 10
ZooKeeper: Znodechanges • Znodechanges • Data isset • Node is created ordeleted • Etc… • To learn of znodechanges • Set awatch • Upon change, client receives anotification • Notification ordered before newupdates Eurosys 2011 ‐Tutorial
ZooKeeper: Watches getData “/foo”,true Client return10 / 10 /foo Eurosys 2011 ‐Tutorial
ZooKeeper: Watches Client / 11 /foo setData “/foo”,11 Client returnok Eurosys 2011 ‐Tutorial
ZooKeeper: Watches Client / notification 11 /foo Client Eurosys 2011 ‐Tutorial
Watches, Locks, and the herdeffect • Herdeffect • Large number of clients wake upsimultaneously • Loadspikes • Undesirable Eurosys 2011 ‐Tutorial
Watches, Locks, and the herdeffect Client1 (C1) Client2 (C2) / C1 /C‐1 C2 /C‐2 Client3 (Cn) /C‐m Cn Eurosys 2011 ‐Tutorial
Watches, Locks, and the herdeffect Client1 (C1) Client2 (C2) / C2 /C‐2 Client3 (Cn) /C‐m Cn Eurosys 2011 ‐Tutorial
Watches, Locks, and the herdeffect Client1 (C1) / Client2 (C2) notification C2 /C‐2 Client3 (Cn) /C‐m Cn notification Eurosys 2011 ‐Tutorial
Watches, Locks, and the herdeffect • Asolution • Use order ofclients • Eachclient • Determines the znode z preceding its own znode in the sequentialorder • Watchz • A single notification is generated upon a crash • Disadvantage for leaderelection • One client is notified of a leaderchange Eurosys 2011 ‐Tutorial
Linearizability • Correctnesscondition • Informaldefinition • Order of operations is equivalent to a sequential execution • Equivalent order satisfies real time precedenceorder write(x,10) read(x) Client1: read(x) Client2: write(x,10) read(x) read(x) Sequential: Eurosys 2011 ‐Tutorial
Linearizability • Correctnesscondition • Informaldefinition • Order of operations is equivalent to a sequential execution • Equivalent order satisfies real time precedenceorder write(x,10) read(x) Client1: read(x) Client2: write(x,10) read(x) read(x) Sequen7al: Eurosys 2011 ‐Tutorial
Linearizability • Correctnesscondition • Informaldefinition • Order of operations is equivalent to a sequential execution • Equivalent order satisfies real time precedenceorder write(x,10) read(x) Client1: read(x) Client2: write(x,10) read(x) read(x) Sequen7al: Eurosys 2011 ‐Tutorial
Linearizability • Is it important? Itdepends… • Implements universal object • Herlihy’sresult • Implement consensus for nprocesses Eurosys 2011 ‐Tutorial