Consistency & Replication II

Consistency & Replication II CSE5306 Lecture Quiz 19 due at 5 PM Saturday, 11 October 2014

Monotonic Reads SFO: NYC: • Monotonic-read consistency model assures that a client sees the same value (or a more recent value) every time she reads a data item. • Emails are delivered in a lazy, on-demand fashion. For example, the same emails a client read in the morning in San Francisco can be reread in the evening in New York City, plus a few more (see above). Old Plus New Only New

R U O K ? 1. Describe the monotonic-read consistency model. • It assures that a client sees the same value (or a more recent value) every time she reads a data item. • For example, the same emails a client read in the morning in San Francisco can be reread in the evening in New York City, plus a few more. • All prior writes to the store are completed before each new write. • Both a and b above. • None of the above.

Monotonic Writes • Monotonic-write consistency model assures that a each write is finished, before any successive write by the same process begins; i.e., a replicate must be up to date before it is edited (FIFO consistency). • Above left correctly shows a WS(x1) store update before the W(x2) edit. Above right incorrectly omits the update, violating monotonic-write consistency.

R U O K ? 2. Describe the monotonic-write consistency model. • It assures that a client sees the same value (or a more recent value) every time she reads a data item. • For example, the same emails a client read in the morning in San Francisco can be reread in the evening in New York City, plus a few more. • All prior writes to the store are completed before each new write. • All of the above. • None of the above.

Read Your Writes Yes • A data store is said to provide read-your-writes consistency, if the following condition holds: • The effect of a write operation by a process on data item x will always be seen by a successive read operation on x by the same process. • Positive example (above left): Write operation completes before successive read operation, wherever the read takes place. • Negative example (above right): Just-changed password still doesn’t let you in (security and app servers not collaborating). No, previous write operation didn’t propagate to L2

R U O K ? 3. Define “read-your-writes” consistency in data stores. • The effect of a write operation by a process on data item x will not always be seen by a successive read operation on x by the same process. • A write operation completes before successive read operation, wherever the read takes place. • A just-changed password still doesn’t let you in, because your security and app servers are not collaborating. • All of the above. • None of the above.

Writes Follow Reads Yes No • A data store is said to provide writes-follow-reads consistency, if the following holds: • A write operation by a process on a data item x following a previous read operation on x by the same process is guaranteed to take place on the same or a more recent value of x that was read. • Positive example (above left): A reader’s reaction to an article is posted, after the article is written. • Negative example (above right): No guarantees on posting order (newsgroup editor is on vacation).

R U O K ? 4. Define “writes-follow-reads” consistency in data stores. • A write operation by a process on a data item x following a previous read operation on x by the same process is not guaranteed to take place on the same or a more recent value of x that was read. • A reader’s reaction to an article is posted, after the article is written. • There are no guarantees on posting order, when the newsgroup editor is on vacation. • All of the above. • None of the above.

Replica Management • When, where and by whom should replicas be placed? • Replicated servers – best locations? • Replicated data – best host for each data item? • And after placement, how will the replicas be kept consistent?

Replica Server Placement • Replication server must be placed in real time, because of “flash crowds”; i.e., bursts of requests for one popular Web site. • Szymaniak proposes that… • Cluster similar-content users with little internode latency (above). • Compute the size a box around each cluster, using the average distance between two nodes and the number of required replicates. • Choose as your replica server any of the nodes in the box that contains the greatest number of nodes. This places the 20 best replicates among 64,000 nodes in real time; i.e., 50,000 times as fast as competing brute force methods.

R U O K ? 5. How can a needed replication server be strategically placed in real time? • Cluster all similar-content users who have insignificant internode latency. • Compute the size a box around each cluster, using the average distance between two nodes and the number of required replicates. • Choose as your replica server any of the nodes in the box that contains the greatest number of nodes. • All of the above. • None of the above.

Content Replication & Placement • Three different types of content replicas appear in (already properly placed) servers: • Permanent replicas. • Server-initiated replicas. • Client-initiated replicas.

R U O K ? 6. What type(s) of content replicas appear in already properly placed servers? • Permanent replicas. • Server-initiated replicas. • Client-initiated replicas. • All of the above. • None of the above.

Permanent Replicas • The initial set of replicas of a distributed data store: • A few servers at the same site. • Mirror sites geographically spread across the Internet. • Shared-nothing architecture: • Distributed database replicated in a concentrated or geographically spread out server cluster. • Processors do not share disks or main memory.

R U O K ? 7. Which of the following exemplifies replicas of a distributed data store? • A few servers at the same site. • Mirror sites geographically spread across the Internet. • Distributed database replicated in a concentrated or geographically spread out server cluster. • All of the above. • None of the above.

Server-Initiated Replicas • The owner’s data store replicates enhance its performance: • E.g., the owner installs several new servers to L.A. to handle a burst of L.A. Internet requests, in behalf of overloaded NYC Web server. Better yet, she sends her data store replicate to a Web hosting service, who already has many servers in L.A., to help handle the extra traffic. • Rabinovich’ content placement algorithm: • When the count of requests cntQ(S,F) for server S’s file F falls below a predefined deletion threshold del(S,F), that file is deleted, if it is not the last copy (see figure above). • When cntQ(S,F) exceeds the predefined replication threshold rep(S,F), that very popular file gets replicated at the server that relays most of S’s requests for F. • When cntQ(S,F) lies between del(S,F) and rep(S,F), the file simply moves to the server that relays most of S’s requests for F.

R U O K ? 8. Describe a widely used content placement algorithm. • When the count of requests for your server’s file falls below a predefined deletion threshold, you delete that file (if it is not the last copy). • When that count exceeds the predefined replication threshold, your very popular file gets replicated at the server, which relays most of your server’s file requests. • When the count lies between those thresholds, your file simply moves to the server, which relays most of your server’s file requests. • All of the above. • None of the above.

Client-Initiated Replicas • Clients can cache recent web pages to reduce their re-access time. Clients with similar interests (e.g., many whiteboard designers) also may share a level-2 cache in their LAN or WAN’s Web server. • Web servers may tell the caches their data items are stale, but they do not attempt to enforce consistency policies upon clients’ caches. • Web pages get purged from cache, when they expire, to make space for other Web pages.

R U O K ? 9. How could you effectively manage client-based replicas? • Clients can cache recent web pages to reduce their re-access time. Clients with similar interests (e.g., many whiteboard designers) also may share a level-2 cache in their LAN or WAN’s Web server. • Web servers may tell the caches their data items are stale, but they do not attempt to enforce consistency policies upon clients’ caches. • Web pages get purged from cache, when they expire, to make space for other Web pages. • All of the above. • None of the above.

Content Distribution • How are updated contents propagated to the relevant replica servers…? • It depends….

State vs. Operations It depends on… whatis to be propagated: • Propagate only a notification of an update: • An “invalidation protocol” simply tells another that part or all of its replicate is out of date—stock market traders don’t like inconsistencies. • This works best when data get stale quickly, but they are seldom read; i.e., high write/read ratio. • Transfer data from one copy to another: • Server gets partial or full update from its nearest neighbor. • This works best when data get stale slowly, but they are read often; i.e., low write/read ratio. • Propagate the update operation to other copies: • “Active replication” tells each replica which of its preprogram-med update operations it should perform and what its parameters are. • This may reduce network bandwidth, if the parameters are small. But it may require significant processing power, if the operation is complex.

R U O K ? 10. Which of the following propagation strategies would work best, when the read/write ratio is small? • Propagate only a notification of an update. • Transfer data from one copy to another. • Propagate the update operation to other copies. • All of the above. • None of the above.

Pull vs. Push Protocols • It depends on… whether replicas are active or passive agents of change. • Server-based protocols push updates to replicas: • A permanent replica pushes to a server-initiated replica, or the latter pushes to its clients. • Use this when a high degree of consistency is required among many clients; i.e., high read/update ratio. • Client must inform server if it purges a page to make space. • Client-based protocols pull updates from servers: • Upon cache hit, ask Web server if cached copy is stale; if so, pull it. • Use this when cache isn’t shared among clients; i.e., low read/update ratio. Don’t, when response must be quick. • A “lease” is server’s limited-time promise to push, then client must pull: • Long-lasting leases work well on unchanging Web pages. • Frequently pulled pages (better customers) get longer-term leases. • Short-term leases give overloaded server relief from pushing.

Unicasting vs. Multicasting • It depends on… whether unicasting or multicasting works better: • Push-based updates can be multicasted efficiently; the LAN does all the work. • Pull-based updates must be sent point-to-point (unicasted).

R U O K ? Classify each of the following as: a) push, b) pull or c) both. 11. Server-based protocols __ 12. Client-based protocols__ 13. Leases __ 14. Multicasting __ 15. Point-to-point unicasting __

Consistency & Replication II