The Costs and Limits of Availability for Replicated Services

The Costs and Limits of Availability for Replicated Services Presented by: Sarath Chandra Dorbala

Outline of presentation • Introduction • A word about consistency protocols • Focus of this article • Background • Details of TACT • System Model and Assumptions • Availability Upper bound theory • Derivation of Upper bound • Determining the serialization order • Simulation Results • Conclusion

Introduction • Raw system performance increases at exponential rates. • Services utility is limited by availability rather than performance • Key approaches • Caching and replication • Problems with replication • Consistency

A word about consistency protocols • Types of consistency protocols • Strict consistency • Reduces availability • Optimistic consistency • System will soon be in inconsistent state • Continuous consistency models • Somewhere in between optimistic and strict • The idea is: applications can set the consistency as a parameter • Basic idea • Decreasing consistency increases

Focus of this article • Evaluate availability of prototype replication system across internet as a function of • Consistency level • Consistency protocol • Failure characteristics • Simple optimizations to existing consistency protocols result in significant improvement in availability • Upper bound for availability of services • Shows that maximizing availability typically entails remaining as close to strong consitency as possible during time of good connectivity

Background • Surveys show that 0.1% improvement in service availability = $ 1 Billion annual revenue • Goal – High availability • Trade consistency for availability • Consistency protocols • Not good, least availability but no conflicting updates • Optimized consistency • Good for availability, worst for consistency • Continuous consistency • Works good for availability, with varied consistency parameter

Details of TACT • TACT gradually reduces the amount of required synchronous communication among replicas in moving from strong to optimistic consistency. • At any replica updates can be in either a tentative or committed state • Three replica metrics • Numeric Error • The maximum weight of writes not seen by a replica • Order error • The maximum weight of writes that have not established their commit order at the local replica • Staleness • The maximum amount of time before a replica is guaranteed to observe a write accepted by a remote replica • Setting these parameters to zero  Strong consistency • Setting these parameters to infinite  Optimistic consistency

Example scenario Replica A has accepted updates W1 and W2 Replica B has accepted updates W3 and W4. Update W1 has been propagated from A to B. The final serialization order of the four writes is W1W2W3W4.

System Model and Assumptions • Database is replicated in full at multiple replicas. • Each replica may accept reads and writes from clients, both called accesses • All replicas remain consistent at all times, that is, the numerical error, order error, and staleness on any replica are always within bounds.

…contd • Access typically a read / write from client to the network service • Each access is classified as: • a failed access if the request cannot reach any replica because of network failures • a rejected access if it is received by some replica but its acceptance would violate some consistency requirement • an accepted access otherwise. • Therefore, Availclient= accepted accesses/submitted accesses.

Availability Upper bound theory • Upper bound on service availability as a function of workload, faultload, and consistency. Availservice≤ F(consistency, workload, faultload). • Workload: describes the timestamped accesses reaching any of the replicas, that is, when and which access reaches which replica • Falutload: is a trace of timestamped failure events and recovery events for replicas and the network, fully specifies the failure pattern

Characteristics of F • Function F returns the availability upper bound, which is independent of the consistency maintenance protocol • Demonstrates the inherent effects of consistency, workload, and faultload on availability. • The availability achieved by any system will be less than or equal to this upper bound. • This a NP Hard problem

Derivation of Upper bound – Evolution Graph • The evolution graph of a faultload is a directed graph constructed as follows. • For each interval in the faultload, add a node to the graphfor each network partition in that interval. Let nodek,mcorrespond to intervalk, partitionm. • An edge from nodek,mto nodek`,m` is added if k = k` + 1, and partitionm` intersects with partitionmat one or more replicas. • A node in the evolution graph is an ancestor of another node if there is a path from the former to the latter.

Evolution graphs

Objective function To compute the availability upper bound, we only need to focus on writes writesk,mbe the number of writes accepted by partitionm during intervalk wsubmitk,m be the number of writes submitted from clients

Additional consistency constraints • Constraints from Order Error • Order error is the number of writes that are out of order at each replica • A serialization order is any total order among all accepted writes as long as it is agreed upon by all replicas. • Details: • Write is either accepted or rejected by the replica (originating replica) • After the write is accepted, the originating replica may apply the write to its local data store. • At the same time, the originating replica may propagate the write to other replicas, and the other replicas may then apply the write to their local data stores as well. • Finally, after the serialization order is determined, the write becomes committed if all writes before it in the serialization order have been seen and applied to the data store.

Table of notations

Determining the serialization order • At any stage in the system from a single replica’s point of view, there can be many serialization orders possible • We need to distill the serialization orders to a small size for practical problems

Hierarchy of Dominating serialization order sets

Final Objective function

Simulation Results • Simulation results prove that • Simple optimizations to existing consistency protocols can greatly improve the availability of replicated services • Staying as close to strong consistency as possible during times of good connectivity allows services to approach the upper bound on availability • Of the order-error bounding algorithms considered, voting and primary copy generally achieve the best availability (using our optimizations) with voting achieving slightly better availability, while primary copy incurs significantly less communication overhead. • Results on availability as a function of the number of replicas quantifies the intuition that additional replicas will not always improve service availability and can in fact reduce it.

Conclusions • Replication is a key approach for improving the availability of network services • Given the well-known trade-offs between strong and optimistic consistency models, this article explores the benefits of a continuous consistency model for improving service availability • The long-term goal of this work is to allow applications to dynamically set their consistency level, degree of replication, and placement of replicas based on changing network and service characteristics to achieve a target level of service availability.

The Costs and Limits of Availability for Replicated Services

The Costs and Limits of Availability for Replicated Services

Presentation Transcript

Minding the Gap: Access, Availability, and Services

The SMART Way to Migrate Replicated Stateful Services

Statistical tests for replicated experiments

THE INCIDENCE OF COMPENSATIVE COSTS FOR PUBLIC STANDARD SERVICES

Replicated Dictionary and Log

Infinity and the Limits of Mathematics

Costs of Central Services

Replicated Databases

Availability of interpretation serviceS

Costs of Central Services

The range and availability of smoking cessation services in Ireland

Availability of Network Application Services

Surrogacy Services and Costs

“Everyone takes the limits of his own vision for the limits of the world”

The Costs of Limousine Rental Services

Sustainability and the Limits of Innovation

Benefits of car services and their costs