330 likes | 338 Views
Learn about Google Chubby, a library and infrastructure for synchronization in distributed systems, and its role in ordered communication. Explore the architecture, interfaces, and caching mechanisms of Chubby.
E N D
Distributed SystemsCS 15-440 Google Chubby and Message Ordering Recitation 4, Sep 29, 2011 Majd F. Sakr, Vinay Kolar, Mohammad Hammoud
Today… • Last recitation session: • Google Protocol Buffers and Publish-Subscribe • Today’s session: • Google Chubby • A Google library and infrastructure for synchronization • Ordered Communication • Ordering events and enforcing ordering while communicating • Announcement: • Project 1 due on Oct 3rd
Overview • Recap • Google Chubby • Ordered Communication
Recap: Google Physical Infrastructure • Google has created a large distributed system from commodity PCs Commodity PC Data Center Cluster Approx 30 racks (around 2400 PCs) 2 high-bandwidth switches (each rack connected to both the switches for redundancy) Placement and replication generally done at cluster level Rack Approx 40 to 80 PCs One Ethernet switch (Internal=100Mbps, external = 1Gbps)
Recap: Google Data center Architecture (To avoid clutter the Ethernet connections are shown from only one of the clusters to the external links)
Overview • Recap • Google Chubby • Ordered Communication
Google Chubby • Google Chubby offers the coordination and storage services to other services (e.g., to Google File System) • It provides coarse-grained distributed locks to synchronize distributed activities in a large-scale, asynchronous environment • It can be used to support the election of primary in a set of replicas • It can be used as a name-service within Google • It provides a file system offering the reliable storage of small files Chubby is an all-in-one package consisting of file-system, locking service, naming service and election facilitator!
Chubby Interface • Chubby provides an abstraction based on a file system concept that every data object is a file • Files are organized into hierarchical namespace • Example /ls/chubby_cell/directory_name/…/file_name Lock Service An identifier for describing the name of the instance of Chubby
Chubby as a file-system and a locking service • The interface provides an easy mechanism to store small files • Chubby provides following Interfaces • General Interfaces • File-System Interfaces • Locking Service Interfaces
Chubby – General Interfaces • Chubby provides interfaces for opening, closing and deleting a file in its namespace • Open call: Opens a file or directory and returns a handle • Client can specify if the file has to be opened for reading, writing or locking • Close call: Relinquishes the handle • Delete calls: Remove the file or directory
Chubby – File-System Interfaces • Chubby provides two services: • Whole-file reading and writing operations • Single atomic operations are provided to read and write complete data in the file • Chubby can be used to store small files (but not large files) • Access control • A file is associated with an Access Control List (ACL) • ACL can be get and set through interfaces
Chubby – Locking Service Interfaces • In Chubby, a file can be opened as a lock • The owner of the lock has the handle to the file • Chubby provides three interfaces • Acquire: The call gets a handle to the lock • Release: This call releases the lock • TryAcquire: This is a Non-blocking variant of the Acquire call • Chubby provides advisory locks, and not mandatory locks • Advantage: Extra flexibility and resilience • Disadvantage: Programmer has to manage the conflict
Chubby Architecture • A Chubby Instance (or a chubby cell) is the first level of hierarchy inside Chubby (ls) /ls/chubby_cell/directory_name/…/file_name • Chubby instance is implemented as a small number of replicated servers (typically 5) with one designated master • Clients access these replicas using Chubby Library • Uses Protocol Buffers to communicate • Replicas are placed at failure-independent sites • Typically, they are placed within a cluster but not within a rack
Chubby Namespace Architecture • The hierarchical namespace of directories and files/locks is maintained in a database at each replicas • The consistency of replicated database is ensured through a consensus protocol that uses operation logs • Logs can be used to reconstruct the state of the system • Problem: Logs can become too large over time • Solution: Chubby takes a snapshot of the system periodically, and erases the old logs
Chubby Session • Chubby Session is the relationship between client and a Chubby cell • KeepAlive messages maintain the session
Client Caching and Consistency • Client caches file data, meta data and handles that are open • Cache consistency • Whenever a mutation is to occur, the associated operation is blocked until all caches are invalidated • Invalidation messages are piggybacked on KeepAlive messages • Disadvantages: • Cached copies are not invalidated, and not simultaneous updated • Operation cannot progress until all replicas are invalidated • Advantages: • Simple and elegant for small files and locks
Overview • Recap • Google Chubby • Ordered Communication
Ordered Communication • In several applications, ordering of events is vital • For example, consider a flight-booking system Reserve Cancel time Client Server Prices 15% Off Server cancels the reservation before booking – even when the messages are reliably delivered! We will study how to ensure ordered delivery of events in group communication
Ordered Multicast – An Example • An example where total-ordering is necessary • In an eCommerce application, the bank database has been replicated across many servers • Let us consider a 2-replica scenario Event 2 = Add interest of 5% Event 1 = Add $1000 2 1 4 Bal=2000 Bal=2100 Bal=1000 3 Bal=1000 Bal=1050 Bal=2050 Replicated Database The updates from Event 1 and Event 2 should be performed in the same order on every replicated server. Else the data is inconsistent.
Three Types of Ordering • FIFO Order • Causal Order • Total Order
T 1 T 2 F 1 F F 3 2 Time C 1 C 2 C 3 P P P 1 2 3 FIFO Ordering • FIFO Order • If a process sends a multicasts a message m before m’, then no correct process delivers m’ if it has not already delivered m • In the example, • F1 and F2 are in FIFO Order • Drawback: • FIFO Order does not specify any order for the messages generated across different processes • e.g, F1 and F3 can be delivered in any order
T 1 T 2 F 1 F F 3 2 Time C 1 C 2 C 3 P P P 1 2 3 Causal Ordering • Causal Order • If process Pi multicasts a message mi and Pj multicasts mj, and if mimj (operator ‘’ is Lamport’s happened-before relation) then any correct process that delivers mj will deliver mi before mj • Relationship between FIFO and Causal order: • Causal Order implies FIFO Order, but FIFO Order does not imply Causal Order • In the example, C1 and C3 are in Causal Order • Drawback: • The happened-before relation between mi and mj should be induced before communication
T 1 T 2 F 1 F F 3 2 Time C 1 C 2 C 3 P P P 1 2 3 Total Ordering • Total Order • If process Pi multicasts a message mi and Pj multicasts mj, and if one correct process delivers mi before mj then every correct process delivers mi before mj • In the example, T1 and T2 are in Total Order • Drawback: • Total order does not imply FIFO or causal orders
Totally Ordered Multicast • Totally Ordered Multicast is a multicast communication paradigm that ensures that all messages are delivered in the same order at all the receivers • Approach: • Process Pi sends timestampedmulticast message msgito all the receivers in the group • At the sender, the message is buffered in a local queue queuei • Any incoming message at Pj is queued in queuej, according to its timestamp, and acknowledged to every other process. Process 1 Process 2 Process 3 1 1 1 5 3 2 7 1 0 5 0 3 7 1 1 7 0 3 5 2 2 2 4 4 4 6 6 6
Totally Ordered Multicast (cont’d) • A receiver will deliver the message to the application if • The message is at the head of the queue, and • The message has been acknowledged by each other process • Assumptions in Totally Ordered Multicast: • Communication is reliable • There is no out-of-order delivery of messages that are transmitted from the same sender
Application of Vector Clocks: Causally Ordered Multicast • In Causally Ordered Communication, a message m is delivered to an application only if all messages that causally precede m has been received • Vector Clocks allow implementation of Causally Ordered Multicast • Here, a multicast message is delivered to an application in the causal order • Under some criteria, Causally Ordered Multicast is weaker than Totally Ordered Multicast • If two messages are not related to each other, it does not matter in which order they are delivered to the application
Causally Ordered Multicast – Approach • Clocks are adjusted only when sending and receiving messages • When sending a message mfrom Process Pi: • VCi[i] = VCi[i] + 1 • ts(m) = VCi • When it delivers a message with ts(m): • VCj[k] = max(VCj[k], ts(m)[k]) ; (for all k) • When Pj receives a message m (with timestamp ts(m)) from Pi, it will deliver the message to the application only if: • ts(m)[i] = VCj[i]+1 • m is the next message that Pj was expecting from Pi • ts(m)[k] <= VCj[k]; (for all k != i) • Pj has seen all the messages that have been seen by Piwhen it sent the message m
References • http://perspectives.mvdirona.com/2008/06/11/JeffDeanOnGoogleInfrastructure.aspx • http://mobilelocalsocial.com/2010/google-data-center-fire-returns-worldwide-404-errors/ • http://techcrunch.com/2008/04/11/where-are-all-the-google-data-centers/ • http://cdk5.net • http://www.dis.uniroma1.it/~baldoni/ordered%2520communication%25202008.ppt • http://www.cs.uiuc.edu/class/fa09/cs425/L5tmp.ppt