Adaptive Hybrid Quorums for Enhanced Consistency in Distributed Systems
This paper explores Adaptive Hybrid Quorums, a novel approach that balances efficiency and consistency in distributed systems such as Cassandra. By employing models like PBSPredictor and WARS, the authors present a mechanism to track latencies and workloads, enabling adaptive quorum selection based on current system performance. The study illustrates the significance of quorums in ensuring data replication while addressing challenges due to node failures. Furthermore, it highlights ongoing developments in modeling message drops and improving overall consistency in practical environments.
Adaptive Hybrid Quorums for Enhanced Consistency in Distributed Systems
E N D
Presentation Transcript
Adaptive Hybrid Quorums in Practical Settings Aaron Davidson, Aviad Rubinstein, Anirudh Todi, Peter Bailis and Shivaram Venkataraman Introduction Implementation in Cassandra Evaluation What are Quorums? Adaptive Hybrid Quorums Experimental Consistency • Use PBSPredictor() and WARS Model [1] to track latencies and predict consistency • Track workload properties, • e.g. is it read- or write- oriented? • Optimization algorithm → adaptively find the best quorum for current workload and system performance • Replication → consistency + durability + availability. • quorum – subset of the replicas • Write and read from a quorum → efficiency • Traditionally: full quorums – • every 2 quorums intersect → consistency • In practice: partial quorums → • sacrificing consistency for efficiency Phase 2 Phase 2 recovery recovery node fails node fails Failure Modeling Experimental Latency • Node failure model: • Incorporate model into adaptive hybrid consistency • Ongoing work: model message drops… t-visibility Phase 1 – node crashes but we don’t know it… • Dynamo: send to N replicas (full quorum), wait for R/W (partial quorum). • Eventually all replicas receive request • → eventual consistency • Bailis et al. [1]: bound the time for consistency node fails detection (1,2) Phase 2 – Writes go to hinted node N-1 replicas remain (1,1) recovery node restarts Phase 3 – Hinted-handoff Recovering node is partially available Key Observation References Quorums are temporary → change them adaptively! [1] Peter Bailis, Shivaram Venkataraman, Michael J. Franklin, Joseph M. Hellerstein, and Ion Stoica. Probabilistically bounded staleness for practical partial quorums. PVLDB, 5(8):776787, 2012.