UBI529 Distributed Algorithms

UBI529 Distributed Algorithms 2. Time Synchronization in Distributed Systems

Assumptions • Process can perform three kinds of atomic actions/events • Local computation • Send a single message to a single process • Receive a single message from a single process • Communication model • Point-to-point • Error-free, infinite buffer • Potentially out of order

Motivation • Many protocols need to impose an ordering among events • If two players open the same treasure chest “almost” at the same time, who should get the treasure? • Physical clocks: • Seems to completely solve the problem • But what about theory of relativity? • Even without theory of relativity – efficiency problems • How accurate is sufficient? • Without out-of-band communication: Minimum message propagation delay • With out-of-band communication: distance/speed of light • In other words, some time it has to be “quite” accurate

Physical Clock Synchronization The relation between clock time and UTC when clocks tick at different rates.

Types of Synchronization External Synchronization : Keep as close to UTC as possible Internal Synchronization : Mutual synchronization is important Phase Synchronization Types of clocks Unbounded 0, 1, 2, 3, . . . Bounded 0,1, 2, . . . M-1, 0, 1, . . . Classification Unbounded clocks are not realistic, but are easier to deal with in algorithms. Real clocks are always bounded.

Drift rate  : Max rate two clocks can differ, < 10^-6 for Xtal clocks Clock skew  : Max. Allowed drift Resynchronization interval R : Depends on clock skew Challenges (Drift is unavoidable) Accounting for propagation delay Accounting for processing delay Faulty clocks Terminology

A simple averaging algorithm Assume n clocks, at most t are faulty Step 1. Read every clock in the system. Step 2. Discard outliers and substitute them by the value of the local clock. Step 3. Update the clock using the average of these values. Synchronization is maintained if n > 3t Why? Internal synchronization Some clocks may be faulty. Exhibits 2-faced behavior

A simple averaging algorithm The maximum difference between the averages computed by two non-faulty nodes is (3td / n) To keep the clocks synchronized, 3td /n < d So, 3t < n Internal synchronization

Tiered architecture Broadcast mode - least accurate Procedure call - medium accuracy Peer-to-peer mode - upper level servers use this for max accuracy Network Time Protocol (NTP) Time server Level 0 Level 1 Level 1 Level 1 Level 2 Level 2 Level 2 The tree can reconfigure itself if some node fails.

Let Q’s time be ahead of P’s time by . Then T2 = T1 + TPQ + T4 = T3 + TQP - y = TPQ + TQP = T2 +T4 -T1 -T3 (RTT)  = (T2 -T4 -T1 +T3) / 2 - (TPQ -TQP) / 2 P2P mode of NTP T2 T3 Q P T1 T4 x Between y/2 and -y/2 So, x- y/2 ≤  ≤ x+ y/2 Ping several times, and obtain the smallest value of y. Use it to calculate 

Cristian’s Algorithm • 1. Client gets data from a time server. • 2. It computes the round trip time (RTT), and compensates for this • delay while adjusting its own clock

Berkeley Time Protocol (FromTanenbaum 2007) a) The timedaemon asks all the other machines for their clockvalues. b) The machines answer The time daemon tells everyone how to adjust their clock.

Software “Clocks” • Software “clocks” can incur much lower overhead than maintaining (sufficiently accurate) physical clocks • Allows a protocol to infer ordering among events • Goal of software “clocks”: Capture event ordering that are visible to users • But what orderings are visible to users without physical clocks?

Visible Ordering to Users • A  B (process order) • B  C (send-receive order) • A  C (transitivity) • A ? D • B ? D • C ? D B A user1 (process1) C user2 (process2) user3 (process3) D

“Happened-Before” Relation • “Happened-before” relation captures the ordering that is visible to users when there is no physical clock • A partial order among events • Process-order, send-receive order, transitivity • First introduced by Lamport – Considered to be the first fundamental result in distributed computing • Goal of software “clock” is to capture the above “happened-before” relation. Formally, a logical clock C is a map from the set of events E to N (the set of natural numbers) with the following constraint:

1: Logical Clocks • Each event has a single integer as its logical clock value • Each process has a local counter C • Increment C at each “local computation” and “send” event • When sending a message, logical clock value V is attached to the message. At each “receive” event, C = max(C, V) + 1 3 = max(1,2)+1 4 = max(3,3)+1 1 2 3 4 user1 (process1) 1 4 5 3 user2 (process2) 1 2 5 user3 (process3)

Logical Clock Properties • Theorem: • Event s happens before t  the logical clock value of s is smaller than the logical clock value of t. • The reverse may not be true 1 2 3 4 user1 (process1) 1 4 5 3 user2 (process2) 1 2 5 user3 (process3)

Lamport’s Logical Clock Algorithm

Lamport's logical clock algorithm in Java publicclass LamportClock { int c; public LamportClock() { c = 1; } publicint getValue() { return c; } publicvoid tick() { // on internal actions c = c + 1; } publicvoid sendAction() { // include c in message c = c + 1; } publicvoid receiveAction(int src, int sentValue) { c = Util.max(c, sentValue) + 1; } }

2: Vector Clocks • Logical clock: • Event s happens before event t  the logical clock value of s is smaller than the logical clock value of t. • Vector clock: • Event s happens before event t  the vector clock value of s is “smaller” than the vector clock value of t. • Each event has a vector of n integers as its vector clock value • v1 = v2 if all n fields same • v1  v2 if every field in v1 is less than or equal to the corresponding field in v2 • v1 < v2 if v1  v2 and v1 ≠ v2 Relation “<“ here is not a total order

Vector Clock Protocol • Each process i has a local vector C • Increment C[i] at each “local computation” and “send” event • When sending a message, vector clock value V is attached to the message. At each “receive” event, C = pairwise-max(C, V); C[i]++; C = (0,1,0), V = (0,0,2) pairwise-max(C, V) = (0,1,2) (1,0,0) (2,0,0) (3,0,0) user1 (process1) (0,2,2) (2,3,2) (0,1,0) user2 (process2) (0,0,2) (2,3,3) (0,0,1) user3 (process3)

Vector Clock Properties • Event s happens before t  vector clock value of s < vector clock value of t: There must be a chain from s to t • Event s happens before t  vector clock value of s < vector clock value of t • If s and t on same process, done • If s is on p and t is on q, let VS be s’s vector clock and VT be t’s • VS < VT  VS[p] ≤ VT[p]  Must be a sequence of message from p to q after s and before t s p t q

Example Application of Vector Clock • Each user has an order placed by a customer • Want all users to know all orders • Each order has a vector clock value I have seen all orders whose vector clock is smaller than (2,3,2): I can avoid linear search for existence testing D1, (1,0,0) (2,0,0) (3,0,0) user1 (process1) (0,2,2) D1 (2,3,2) D2, (0,1,0) user2 (process2) D1, D2, D3 D3 D3, (0,0,1) (2,3,3) (0,0,2) user3 (process3)

Vector Clock Algorithm

A sample execution of the vector clock algorithm

Vector clock algorithm in Java publicclass VectorClock { publicint[] v; int myId; int N; public VectorClock(int numProc, int id) { myId = id; N = numProc; v = newint[numProc]; for (int i = 0; i < N; i++) v[i] = 0; v[myId] = 1; } publicvoid tick() { v[myId]++; } publicvoid sendAction() { //include the vector in the message v[myId]++; } publicvoid receiveAction(int[] sentValue) { for (int i = 0; i < N; i++) v[i] = Util.max(v[i], sentValue[i]); v[myId]++; } }

3: Matrix Clocks • Motivation • My vector clock describe what I “see” • In some applications, I also want to know what other people see • Matrix clock: • Each event has n vector clocks, one for each process • The i th vector on process i is called process i ’s principle vector • Principle vector is the same as vector clock before • Non-principle vectors are just piggybacked on messages to update “knowledge”

Matrix Clock Protocol • For principle vector C on process i • Increment C[i] at each “local computation” and “send” event • When sending a message, all n vectors are attached to the message • At each “receive” event, let V be the principle vector of the sender. C = pairwise-max(C, V); C[i]++; • For non-principle vector C on process i, suppose it corresponds to process j • At each “receive” event, let V be the vector corresponding to process j as in the received message. C = pairwise-max(C, V)

Matrix Clock Example (1,0,0) (0,0,0) (0,0,0) (2,0,0) (0,0,0) (0,0,0) (3,0,0) (0,0,0) (0,0,0) user1 (process1) (0,0,0) (0,1,0) (0,0,0) (0,0,0) (0,2,2) (0,0,2) (2,0,0) (2,3,2) (0,0,2) user2 (process2) (0,0,0) (0,0,0) (0,0,1) (0,0,0) (0,0,0) (0,0,2) (2,0,0) (2,3,2) (2,3,3) user3 (process3)

Matrix Clock Example (1,0,0) (0,0,0) (0,0,0) (2,0,0) (0,0,0) (0,0,0) (3,0,0) (0,0,0) (0,0,0) Theorem: On any given process, any non-principle vector is smaller than or equal to the principle vector. (0,0,0) (0,1,0) (0,0,0) (0,0,0) (0,2,2) (0,0,2) (2,0,0) (2,3,2) (0,0,2) (0,0,0) (0,0,0) (0,0,1) (0,0,0) (0,0,0) (0,0,2) (2,0,0) (2,3,2) (2,3,3)

Application of Matrix Clock (1,0,0) (0,0,0) (0,0,0) (2,0,0) (0,0,0) (0,0,0) (3,0,0) (0,0,0) (0,0,0) user1 (process1) gossip D1 = (1,0,0) D1 user3 now knows that all 3 users have seen D1 (0,0,0) (0,1,0) (0,0,0) (0,0,0) (0,2,2) (0,0,2) (2,0,0) (2,3,2) (0,0,2) user2 (process2) gossip D2 = (0,1,0) D1, D2, D3 D3 (0,0,0) (0,0,0) (0,0,1) (0,0,0) (0,0,0) (0,0,2) (2,0,0) (2,3,2) (2,3,3) user3 (process3) gossip D3 = (0,0,1)

Matrix Clock Algorithm

The matrix clock algorithm in Java publicclass MatrixClock { int[][] M; int myId; int N; public MatrixClock(int numProc, int id) { myId = id; N = numProc; M = newint[N][N]; for (int i = 0; i < N; i++) for (int j = 0; j < N; j++) M[i][j] = 0; M[myId][myId] = 1; } publicvoid tick() { M[myId][myId]++; } publicvoid sendAction() { //include the matrix in the message M[myId][myId]++; } publicvoid receiveAction(int[][] W, int srcId) { // component-wise maximum of matrices for (int i = 0; i < N; i++) if (i != myId) { for (int j = 0; j < N; j++) M[i][j] = Util.max(M[i][j], W[i][j]); } } }

Discussions • Vector clock tells me what I know • one-dimensional data structure • Matrix clock tells me what I know about what other people know • Two-dimensional data structure • ?? tells me what I know about what other people know about what other people know • ??-dimensional data structure

Summary • Goal : Define “time” in a distributed system • Logical clock • “happened before”  smaller clock value • Vector clock • “happened before”  smaller clock value • Matrix clock • Gives a process knowledge about what other processes know

Acknowledgements • This part is heavily dependent on the course : CS4231Parallel and Distributed Algorithms, NUS by Dr. Haifeng Yu and Dr. Vijay Gargh Elements of Distributed Computing Book and also Dr. Sukumar Ghosh Iowa University Distributed Systems course 22C:166

UBI529 Distributed Algorithms