The costs and limits of availability for replicated services
1 / 22

The Costs and Limits of Availability for Replicated Services - PowerPoint PPT Presentation

  • Uploaded on

The Costs and Limits of Availability for Replicated Services. Presented by: Sarath Chandra Dorbala. Outline of presentation. Introduction A word about consistency protocols Focus of this article Background Details of TACT System Model and Assumptions Availability Upper bound theory

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' The Costs and Limits of Availability for Replicated Services' - akamu

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Outline of presentation
Outline of presentation

  • Introduction

  • A word about consistency protocols

  • Focus of this article

  • Background

  • Details of TACT

  • System Model and Assumptions

  • Availability Upper bound theory

  • Derivation of Upper bound

  • Determining the serialization order

  • Simulation Results

  • Conclusion


  • Raw system performance increases at exponential rates.

  • Services utility is limited by availability rather than performance

  • Key approaches

    • Caching and replication

  • Problems with replication

    • Consistency

A word about consistency protocols
A word about consistency protocols

  • Types of consistency protocols

    • Strict consistency

      • Reduces availability

    • Optimistic consistency

      • System will soon be in inconsistent state

    • Continuous consistency models

      • Somewhere in between optimistic and strict

        • The idea is: applications can set the consistency as a parameter

  • Basic idea

    • Decreasing consistency increases

Focus of this article
Focus of this article

  • Evaluate availability of prototype replication system across internet as a function of

    • Consistency level

    • Consistency protocol

    • Failure characteristics

  • Simple optimizations to existing consistency protocols result in significant improvement in availability

  • Upper bound for availability of services

  • Shows that maximizing availability typically entails remaining as close to strong consitency as possible during time of good connectivity


  • Surveys show that 0.1% improvement in service availability = $ 1 Billion annual revenue

  • Goal – High availability

    • Trade consistency for availability

  • Consistency protocols

    • Not good, least availability but no conflicting updates

  • Optimized consistency

    • Good for availability, worst for consistency

  • Continuous consistency

    • Works good for availability, with varied consistency parameter

Details of tact
Details of TACT

  • TACT gradually reduces the amount of required synchronous communication among replicas in moving from strong to optimistic consistency.

  • At any replica updates can be in either a tentative or committed state

  • Three replica metrics

    • Numeric Error

      • The maximum weight of writes not seen by a replica

    • Order error

      • The maximum weight of writes that have not established their commit order at the local replica

    • Staleness

      • The maximum amount of time before a replica is guaranteed to observe a write accepted by a remote replica

  • Setting these parameters to zero  Strong consistency

  • Setting these parameters to infinite  Optimistic consistency

Example scenario
Example scenario

Replica A has accepted updates W1 and W2

Replica B has accepted updates W3 and W4.

Update W1 has been propagated from A to B.

The final serialization order of the four writes is W1W2W3W4.

System model and assumptions
System Model and Assumptions

  • Database is replicated in full at multiple replicas.

  • Each replica may accept reads and writes from clients, both called accesses

  • All replicas remain consistent at all times, that is, the numerical error, order error, and staleness on any replica are always within bounds.


  • Access typically a read / write from client to the network service

  • Each access is classified as:

    • a failed access if the request cannot reach any replica because of network failures

    • a rejected access if it is received by some replica but its acceptance would violate some consistency requirement

    • an accepted access otherwise.

  • Therefore,

    Availclient= accepted accesses/submitted accesses.

Availability upper bound theory
Availability Upper bound theory

  • Upper bound on service availability as a function of workload, faultload, and consistency.

    Availservice≤ F(consistency, workload, faultload).

  • Workload: describes the timestamped accesses reaching any of the replicas, that is, when and which access reaches which replica

  • Falutload: is a trace of timestamped failure events and recovery events for replicas and the network, fully specifies the failure pattern

Characteristics of f
Characteristics of F

  • Function F returns the availability upper bound, which is independent of the consistency maintenance protocol

  • Demonstrates the inherent effects of consistency, workload, and faultload on availability.

  • The availability achieved by any system will be less than or equal to this upper bound.

  • This a NP Hard problem

Derivation of upper bound evolution graph
Derivation of Upper bound – Evolution Graph

  • The evolution graph of a faultload is a directed graph constructed as follows.

    • For each interval in the faultload, add a node to the graphfor each network partition in that interval. Let nodek,mcorrespond to intervalk, partitionm.

    • An edge from nodek,mto nodek`,m` is added if k = k` + 1, and partitionm` intersects with partitionmat one or more replicas.

    • A node in the evolution graph is an ancestor of another node if there is a path from the former to the latter.

Objective function
Objective function

To compute the availability upper bound, we only need to focus on writes

writesk,mbe the number of writes accepted by partitionm during intervalk

wsubmitk,m be the number of writes submitted from clients

Additional consistency constraints
Additional consistency constraints

  • Constraints from Order Error

    • Order error is the number of writes that are out of order at each replica

    • A serialization order is any total order among all accepted writes as long as it is agreed upon by all replicas.

  • Details:

    • Write is either accepted or rejected by the replica (originating replica)

    • After the write is accepted, the originating replica may apply the write to its local data store.

    • At the same time, the originating replica may propagate the write to other replicas, and the other replicas may then apply the write to their local data stores as well.

    • Finally, after the serialization order is determined, the write becomes committed if all writes before it in the serialization order have been seen and applied to the data store.

Determining the serialization order
Determining the serialization order

  • At any stage in the system from a single replica’s point of view, there can be many serialization orders possible

  • We need to distill the serialization orders to a small size for practical problems

Simulation results
Simulation Results

  • Simulation results prove that

    • Simple optimizations to existing consistency protocols can greatly improve the availability of replicated services

    • Staying as close to strong consistency as possible during times of good connectivity allows services to approach the upper bound on availability

    • Of the order-error bounding algorithms considered, voting and primary copy generally achieve the best availability (using our optimizations) with voting achieving slightly better availability, while primary copy incurs significantly less communication overhead.

    • Results on availability as a function of the number of replicas quantifies the intuition that additional replicas will not always improve service availability and can in fact reduce it.


  • Replication is a key approach for improving the availability of network services

  • Given the well-known trade-offs between strong and optimistic consistency models, this article explores the benefits of a continuous consistency model for improving service availability

  • The long-term goal of this work is to allow applications to dynamically set their consistency level, degree of replication, and placement of replicas based on changing network and service characteristics to achieve a target level of service availability.