Synchronization

Synchronization Tanenbaum Chapter 5

Synchronization Multiple processes sometimes need to agree on order of a sequence of events. This requires some synchronization, which is more elaborate in distributed systems. Synchronization may be based on time (absolute or relative), leader election The aim is to make it global…

Clock Synchronization Time • Execution of Make utility in a distributed system: The edited local version is created later than the object file according to the local clocks, although this was because of the discrepancy of local clocks. • When each machine has its own clock, an event that occurred after another event may nevertheless be assigned an earlier time.

Physical Clocks (1) • Computation of the mean solar day. • The period of earth’s rotation is not constant • Starting 1958 International Atomic Time (TAI) was accepted, counting the number transitions of Cesium 133 in an average solar second (9,192,631,770 transitions=1 second), one solar second is 1/86400 solar day, which is between to sun peak times in the sky. Averaged over 50 labs. • Solar day length seems to changed because of atmospheric drag and tidal friction issues

Physical Clocks (2) TAI seconds are of constant length, unlike solar seconds. However leap seconds are introduced when necessary (about 3 msec in a day), to keep in phase with the sun, 1 sec in every 800 msec of discrepancy. So far, since 1958, 30 leap seconds are introduced… This is known as Universal Coordinated Time or UTC

Clock Synchronization Algorithms • The relation between clock time and UTC when clocks tick at different rates. • In perfect world, C(t)=t, where t is the UTC, C(t) is value of the local clock, on all machines. With modern timer chips, the relative error is 10-5. • Two clocks needs to be synchronized according to maximum drift rate for each clock. • If difference between two clocks is to be limited to , then a resynchronization is required every /2 seconds, if the  is the max drift rate. 2, when clocks drifts in opposite direction.

Cristian's Algorithm • Getting the current time from a time server. • The time should never set to smaller value, as it will cause consistency problems. So, a large discrepancy should be consumed slowly, by adjusting numb of msec to be added per clock interrupt. • (T1-T0-I)/2 is the one way propagation time, counting for the server’s request (interrupt) handling time I. Cristian suggest taking average of the delays in the system… Note that the time server is passive.

The Berkeley Algorithm: the time server is active and poling the clients. • The time daemon sends its time and asks all the other machines for their clock discrepancy values • The answers from the machines is received and an average time discrepancy is computed, for each computer… • Then, the time daemon tells everyone else how to adjust their clock • The daemons’s time need to be set periodically by the operator or radio time servers…

Distributed Clock synchronization • Cristian’s and Berkeley’s algorithms are centralized • In decentralized distributed algorithms case, every machine should periodically broadcast its time and collects time from other peers. • Every peer comes to conclusion about the average time, using the same algorithm distributedly, taking into account the communication latencies… • In the Internet, a so called Network Time Protocol-NTP is used, which is assumed to achieve 1-50 msec accuracy.

Network Time Protocol-NTP • RFC 1305 defines the NTP • The recent implementations provide accuracy of up to 1 microseconds • It is designed to execute on top of IP and UDP • NTP is organized into multiple Tree structures, with primary servers at the root the secondary servers at the internal nodes • NTP design goals: accurate UTC synchronization, Survival despite the losses of connectivity, allow frequent resynchronization, protect against malicious interference • NTP communicates clock offset (diff between two clocks), round-trip delay, dispersion (max error) • Statistical technique is used, based on multiple comparisons of timing information exchanged • It may operate in three modes: multicast, client/server, symmetric • The SNTP-Simple NTP is also defined in RFC 1769, with no fault tolerance

Use of Synchronized clocks • Used in the implementation of at-most-once message delivery: • Every message is sent with a connection number and a time stamp • For each connection the recent time stamp is recorded • If any message on any connection is lower than the recorded one, the message is discarded. • To remove old messages, • The server removes all the messages with old time stamps older than G=CurrentTime-MaxLifeTime-MaxClockSkew • MaxLifeTime is the max time a message can live in the system… • MaxClockSkew is the distance from UTC. • To recover from a crash, every T, G needs to be written to the hard disk, to be processed later, during the recovery phase….

Coordinator or Leader Election Algorithms • Bully Algorithm • A process holds an election for the coordinator, if it thinks coordinator is failed: • Send an election message to all the processes with higher id numbers, • If no one responds process declares itself as coordinator • If on of the higher-ups answer, it withdraws from the contest • Ring Algorithm • The process are logically or physically ordered • Process detecting the missing coordinators sends a message down the ring, if message comes back to the sender, then it declares itself as the coordinator…

The Bully Algorithm (1) • The bully election algorithm • Process 4 holds an election • Process 5 and 6 respond, telling 4 to stop • Now 5 and 6 each hold an election

The Bully Algorithm (2) • Process 6 tells 5 to stop • Process 6 wins and tells everyone

A Ring Algorithm • Election algorithm using a ring. Both 5 and 2 decide on failure of the coordinator, about the same time. Both messages make a full trip round the network.

Mutual Exclusion: • Mutual exclusion involves execution of critical sections, one at a time, in mutual exclusion. • In centralized systems this is achieved using semaphores, monitors, and similar constructs… • How to establish mutual exclusion in distributed systems: • Centralized approach • Distributed approach

Mutual Exclusion: A Centralized Algorithm • Process 1 asks the coordinator for permission to enter a critical region. Permission is granted • Process 2 then asks permission to enter the same critical region. The coordinator does not reply. • When process 1 exits the critical region, it tells the coordinator, it will then reply to 2…

MX:A Distributed Algorithm • Two processes want to enter the same critical region at the same moment. Processes 0 and 2 contend for the CR, so they send a time stamped “MX access to the resource” message to every one else. • Process 0 has the lowest timestamp, so it wins. • When process 0 is done, it sends an OK also, so 2 can now enter the critical region.

MX:A Token Ring Algorithm • An unordered group of processes on a network, logically numbered. • A logical ring constructed in software, where a token is released by one of the nodes, initially 0. • Token loss must be handled properly, with token generation algorithm. • Node failure must be handled too…

Comparisonnumber of messages per process to enter/exit a critical region • A comparison of three mutual exclusion algorithms for n odes, regarding complexity and failure or loss situation.

The Transaction Model • Transaction model is all or nothing model. • Analogy can be made with a discussion process going on for a project towards signing a contract. Unless the contract is signed, any party can withdraw with no harm. • Programming with tx requires special primitives supplied by the OS, language, or a middleware. The exact list of primitives may be different for different application or system environments.

The Transaction Model (1) • Updating a daily master inventory tape is fault tolerant. If something goes wrong, every thing is redone from the beginning, ie. rewind the tapes to the beginning and restart the process- all or nothing.

The Transaction Model (2) • Typical examples of primitives for transactions. Either all nothing between the begin and end is executed.

The Transaction Model (3)reservation flight seat from NY to Malindi in Kenya, capitol city Nairobi. • Transaction to reserve three flights commits, as three different operations • Transaction aborts when third flight is unavailable, during the same booking, as if nothing has happened

The Transaction Model (4)Transaction properties • Atomicity-indivisibility of the tx • Consistency-no violation of the invariants • Isolated-no interference between concurrent txs • Durable- changes are made permanent once committed • …ACID property of txs

Classification of Txs • Flat Txs- Txs of ACID properties discussed so far: not practical for most distributed tx applications… • Nested Txs- a number of logically related complementing sub-transactions form one nested tx. One problem is the level of ACID, top level parent aborts very every done child must be undone; every child’s universe becomarees the universe for the parent… • Distributed Txs- flat indivisible tx that operates on data that are distributed across multiple computers.

Nested and Distributed Transactions • A nested transaction • A distributed transaction

Implementation • How to implement nothing or all principle in case of Dist Txs? • Private workspace: implemented so that individual updates can be undone without effecting the original data, defending on commit/abort • Writeahead log: log of changes is created throughout execution, so that commit/abort can be taken care of…

Private Workspace • The file index and disk blocks for a three-block file • The situation after a transaction has modified block 0 and appended block 3 • After committing

Writeahead Log • a) N example transaction that changes x and y • b) – d) The log before each statement is executed. First value is before the change, second value is after the change

Concurrency Control (1) • General organization of managers for handling transactions. Top level ensures atomicity, middle level ensures consistency, bottom level ensures execution

Concurrency Control (2) • General organization of managers for handling distributed transactions.

SerializabilityFinal result of concurrent tx exec should be same for different runs, as if the txs are sequentially executed… Concurrency control algs should synchronize tex executions… (d) • a) – c) Three transactions T1, T2, and T3 • d) Possible schedules

Concurrency Control Methods • Two-phase locking • Pessimistic time-stamp ordering • Optimistic time-stamp ordering

Two-phase locking-2PL-1 • Rcquire all the locks during the growing phase, release them during the shrinking phase. • On conflict operation is delayed • A lock is never released before the operation on the data for which the lock is set is complete • Once a lock is released on behalf of a transaction no other lock can b granted to the same transaction • In strict 2PL, all the acquired resource are released at the same time…This avoids cascaded aborts deadlocks • 2PL can easily cause deadlocks to happen • Centralized and versions of distributed 2PL are possible

Two-Phase Locking (2) • Two-phase locking.

Two-Phase Locking (3) • Strict two-phase locking.

Pessimistic time-stamp ordering-1 • Every operation of a Tx is time stamped as ts by an appropriate algorithm (Lamport’s algorithm) • Every data item in the system is time-stamped for the last read (tsR) and last write (tsW) transaction operations • If two operations on a data item x conflict, the data manager grant the operation to the Tx with earlier ts

Pessimistic time-stamp ordering-2 • Read operation of a Tx with time-stamp ts • If ts <tsW abort the Tx • If ts>tsW allow execution and set tsR to max(ts,tsR) • Write operation of a Tx with time-stamp ts • If ts <tsR abort the Tx • If ts>tsR allow execution and set tsW to max(ts,tsW)

Pessimistic Timestamp Ordering-3 • Concurrency control using timestamps.

Optimistic time-stamp ordering • Go ahead do whatever you want, if there is conflict during the commit handle it then: If conflicts are rare, most of the time commits take place without any problem • This requires recording of all read and write ts on the data items, to check if any of the items have been changed during decision a commit… • Abort, if a changed is detected, commit otherwise • This scheme has not been much research for distributed systems…

Snapshot Protocols • Snapshot Protocol 2 • Process p0 sends “take snapshot at ” to all process and than sets its clock to  • when its LC reaches , pi • records its i and immediately • sends an empty message along each outgoing channel. • Start recording messages received over each of its incoming channels • Pi stops recording messages first time a message with TS>  is received from pj… pi declares messages received from pj as ji • Instead of using a message “take snapshot at ” a process can record its state first time it receive a special empty message serving as a tag message. • This is protocol 3…

Supplementary for Mullender’s book • Snapshot Protocol 2 • Already covered!!!!

Snapshot Protocols • Snapshot Protocol 2 • Process p0 sends “take snapshot at ” to all process and than sets its clock to  • when its LC reaches , pi • records its i and immediately • sends an empty message along each outgoing channel. • Start recording messages received over each of its incoming channels • Pi stops recording messages first time a message with TS>  is received from pj… pi declares messages received from pj as ji • Instead of using a message “take snapshot at ” a process can record its state first time it receive a special empty message serving as a tag message. • This is protocol 3…

Properties of Snapshots • Any state constructed by distributed snapshot algorithm is guaranteed to be consistent. However, the actual run may not pass through the constructed states, • yet constructed states are, but the relation related to the constructed state holds in in general… • Order of two events in a run can be swapped to put in pre-recording post-recording order.

Properties of Global Predicates • Once a predicate became true it remains to be true is Stability criteria for the predicate… (figure 4.16).

Synchronization