1 / 25

The Chubby Lock Service for Loosely-coupled Distributed Systems

The Chubby Lock Service for Loosely-coupled Distributed Systems. Mike Burrow, Google Inc Presented by Xin (Joyce) Zhan. Outline. Design System structure Locks, caching, failovers Scaling mechanism Use and observations As name service Failover problems.

duscha
Download Presentation

The Chubby Lock Service for Loosely-coupled Distributed Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Chubby Lock Service for Loosely-coupled Distributed Systems Mike Burrow, Google Inc Presented by Xin (Joyce) Zhan

  2. Outline • Design • System structure • Locks, caching, failovers • Scaling mechanism • Use and observations • As name service • Failover problems

  3. Lock service for distributed system • Synchronize access to shared resources • Other usage • Primary election, meta-data storage, name service • Reliability, availability

  4. System Strucure

  5. System Structure • Set of replicas • Periodically elected master • Master lease • Paxos protocol • All client requests are directed to master • updates propagated to replicas • Replace failed replicas • master periodically polls DNS

  6. Design • Store small files • Event notification mechanism • Consistent caching • Advisory lock (vs.mandatory) • confilct only when others attempt to acquire the same lock • Coarse grained locks • survive lock server failures

  7. Design - File Interface • Ease distribution • /ls/fool/wombat/pouch • Node meta-data include Access Control Lists • Handle • analogous to UNIX file descriptors • support for use across master changes

  8. Design - Sequencer for lock • Delayed / Out-of-order messages • introduce sequence numbers into interactions that use locks • lock holder requests a sequencer, pass it to file server to validate • Alternative • lock-delay

  9. Design - Events • Client subscribes when creating handle • Delivered async via up-call from client library • Event types • file contents modified • child node added / removed / modified • Chubby master failed over • handle / lock have become invalid • lock acquired / conflicting lock request (rarely used)

  10. Design - Caching • Clients cache file data and meta data • Consistent, write-through • Invalidation • master keeps list of what clients may have cached • master sends invalidations on top of KeepAlive • clients flush changed data, ack. with KeepAlive • server proceeds the modification only after invalidation • Clients cache open handle and locks

  11. Design - Sessions • Session maintained through KeepAlives • handles, locks, cached data remain valid • lease • Lease timeout advanced when • creation of a session • master fail-over occurs • master responds to KeepAlive RPC

  12. Design - KeepAlive • Master responds close to lease timeout • Client sends another KeepAlive immediately • Client maintains local lease timeout • conservative approximation • When local lease expires • disable cache • session in jeopardy, client waits in grace period • cache enabled on reconnect • Application informed about session changes • Jeopardy/safe/expired event

  13. Design – Failovers

  14. Design - Failovers • In-memory state discarded • sessions, handles, locks, etc. • Lease timer “stops” • Fast master election • client reconnect before lease expires • Slow master election • clients flush cache, enter grace period • New master reconstruct the assumption of in-memory state of previous master

  15. Design - Failovers Steps of newly-elected master: • Pick new epoch number • Respond only to master location requests • Build in-memory state for sessions / locks from database • Respond to KeepAlives • Emit fail-over events to sessions, flush caches • Wait for acknowledgements / session expire • Allow all operations to proceed • Allow clients to use handles created before fail-over • Delete ephemeral files w/o open handles after an interval

  16. Design - Backup and Mirroring • Master writes snapshots every few hours • GFS server in different building • Collection of files mirrored across cells • /ls/global/master mirrored to /ls/cell/slave • Mostly for configuration files • Chubby’s own ACLs • Files advertising presence / location • pointers to Bigtable cells

  17. Design - Scaling Mechanisms • 90,000 clients communicate with one cell • Regulate the number of Chubby cells • client use the nearby cell • Increase lease time • Client caching • Protocol-conversion servers

  18. Scaling - Proxies • Proxies pass requests from clients to cell • Reduce traffic of KeepAlive and read requests • Not writes, but writes << 1% of workload • KeepAlive traffic by far most dominant • Overheads: • additional RPC for writes / first time reads • increased probability of unavailability

  19. Scaling - Partitioning • Namespace of a cell partitioned between servers • N partitions, each with master and replicas • Node D/C stored on P(D/C) = hash(D) mod N • meta-data for D may be on different partition • Little cross-partition communication • Reduce R/W traffic, no necessarily KeepAlive

  20. Use and Observations Many files for naming Config, ACL, meta-data common 10 clients use each cached file, on avg. Few locks held, no shared locks KeepAlives dominate RPC traffic

  21. Use as Name Service • DNS uses TTL values • entries must be refreshed within that time • huge (and variable) load on DNS server • Chubby’s caching uses invalidations, no polling • client builds up needed entries in cache • name entries further grouped in batches

  22. Failover problems • Master writes sessions to DB when created • Overload when start of many processes at once • Instead, store session at first modification / lock acquisition etc. • Active sessions recorded with probability on KeepAlive • spread out writes in time • young read-only session may be discarded in a fail-over

  23. Failover problems • New design – do not record sessions in database • recreate them like handles after fail-over • new master waits full lease time before operations proceed

  24. Lesson learnt • Developers rarely consider availability • should plan for short Chubby outages • Fine-grained locking not essential • Poor API choices • handles acquiring locks cannot be shared • RPC use affects transport protocols • forced to send KeepAlives by UDP for timeliness

  25. Q & A

More Related