The Chubby Lock Service for Loosely-coupled Distributed Systems

The Chubby Lock Service for Loosely-coupled Distributed Systems Mike Burrow, Google Inc Presented by Xin (Joyce) Zhan

Outline • Design • System structure • Locks, caching, failovers • Scaling mechanism • Use and observations • As name service • Failover problems

Lock service for distributed system • Synchronize access to shared resources • Other usage • Primary election, meta-data storage, name service • Reliability, availability

System Strucure

System Structure • Set of replicas • Periodically elected master • Master lease • Paxos protocol • All client requests are directed to master • updates propagated to replicas • Replace failed replicas • master periodically polls DNS

Design • Store small files • Event notification mechanism • Consistent caching • Advisory lock (vs.mandatory) • confilct only when others attempt to acquire the same lock • Coarse grained locks • survive lock server failures

Design - File Interface • Ease distribution • /ls/fool/wombat/pouch • Node meta-data include Access Control Lists • Handle • analogous to UNIX file descriptors • support for use across master changes

Design - Sequencer for lock • Delayed / Out-of-order messages • introduce sequence numbers into interactions that use locks • lock holder requests a sequencer, pass it to file server to validate • Alternative • lock-delay

Design - Events • Client subscribes when creating handle • Delivered async via up-call from client library • Event types • file contents modified • child node added / removed / modified • Chubby master failed over • handle / lock have become invalid • lock acquired / conflicting lock request (rarely used)

Design - Caching • Clients cache file data and meta data • Consistent, write-through • Invalidation • master keeps list of what clients may have cached • master sends invalidations on top of KeepAlive • clients flush changed data, ack. with KeepAlive • server proceeds the modification only after invalidation • Clients cache open handle and locks

Design - Sessions • Session maintained through KeepAlives • handles, locks, cached data remain valid • lease • Lease timeout advanced when • creation of a session • master fail-over occurs • master responds to KeepAlive RPC

Design - KeepAlive • Master responds close to lease timeout • Client sends another KeepAlive immediately • Client maintains local lease timeout • conservative approximation • When local lease expires • disable cache • session in jeopardy, client waits in grace period • cache enabled on reconnect • Application informed about session changes • Jeopardy/safe/expired event

Design – Failovers

Design - Failovers • In-memory state discarded • sessions, handles, locks, etc. • Lease timer “stops” • Fast master election • client reconnect before lease expires • Slow master election • clients flush cache, enter grace period • New master reconstruct the assumption of in-memory state of previous master

Design - Failovers Steps of newly-elected master: • Pick new epoch number • Respond only to master location requests • Build in-memory state for sessions / locks from database • Respond to KeepAlives • Emit fail-over events to sessions, flush caches • Wait for acknowledgements / session expire • Allow all operations to proceed • Allow clients to use handles created before fail-over • Delete ephemeral files w/o open handles after an interval

Design - Backup and Mirroring • Master writes snapshots every few hours • GFS server in different building • Collection of files mirrored across cells • /ls/global/master mirrored to /ls/cell/slave • Mostly for configuration files • Chubby’s own ACLs • Files advertising presence / location • pointers to Bigtable cells

Design - Scaling Mechanisms • 90,000 clients communicate with one cell • Regulate the number of Chubby cells • client use the nearby cell • Increase lease time • Client caching • Protocol-conversion servers

Scaling - Proxies • Proxies pass requests from clients to cell • Reduce traffic of KeepAlive and read requests • Not writes, but writes << 1% of workload • KeepAlive traffic by far most dominant • Overheads: • additional RPC for writes / first time reads • increased probability of unavailability

Scaling - Partitioning • Namespace of a cell partitioned between servers • N partitions, each with master and replicas • Node D/C stored on P(D/C) = hash(D) mod N • meta-data for D may be on different partition • Little cross-partition communication • Reduce R/W traffic, no necessarily KeepAlive

Use and Observations Many files for naming Config, ACL, meta-data common 10 clients use each cached file, on avg. Few locks held, no shared locks KeepAlives dominate RPC traffic

Use as Name Service • DNS uses TTL values • entries must be refreshed within that time • huge (and variable) load on DNS server • Chubby’s caching uses invalidations, no polling • client builds up needed entries in cache • name entries further grouped in batches

Failover problems • Master writes sessions to DB when created • Overload when start of many processes at once • Instead, store session at first modification / lock acquisition etc. • Active sessions recorded with probability on KeepAlive • spread out writes in time • young read-only session may be discarded in a fail-over

Failover problems • New design – do not record sessions in database • recreate them like handles after fail-over • new master waits full lease time before operations proceed

Lesson learnt • Developers rarely consider availability • should plan for short Chubby outages • Fine-grained locking not essential • Poor API choices • handles acquiring locks cannot be shared • RPC use affects transport protocols • forced to send KeepAlives by UDP for timeliness

Q & A

The Chubby Lock Service for Loosely-coupled Distributed Systems

The Chubby Lock Service for Loosely-coupled Distributed Systems

Presentation Transcript

Lock Services in Distributed File Systems

Toward Loosely Coupled Programming on Petascale Systems

Coupled systems

The Chubby lock service for loosely-coupled distributed systems

ROMA: Reliable Overlay Multicast with Loosely Coupled TCP Connections

An Efficient Asymmetric Distributed Lock for Embedded Multiprocessor Systems

The Chubby Lock Service for Loosely-coupled Distributed Systems

A Loosely Coupled Ocean-Atmosphere Ensemble Assimilation System.

Chubby

Flexible and Efficient Control of Data Transfers for Loosely Coupled Components

Presentación: “Loosely Coupled Traceability for ATL” Frederic Jouault 2005

Techniques for Monitoring Large Loosely-coupled Cluster Jobs

Learning activities loosely coupled with Sakai @ UCT

Taking Advantages of Collective Operation Semantics for Loosely Coupled Simulations

Loosely coupled OPC client used to animate GIS

Developing loosely coupled systems with Dependency Injection, PicoContainer, NanoContainer and AOP

Loosely Coupled Sakai

Chubby

Late Typing for Loosely Coupled Recursion

Maintaining XPath Views in Loosely Coupled Systems

Loosely Coupled Parallelism: Clusters

Chubby