Dynamo: Amazon's Highly Available Key-Value Store

Dynamo: Amazon's Highly Available Key-Value Store Offense: Jori and Ning

Outline • Presentation (Ning) • Symmetry (Jori) • WAN considerations (Ning) • Consistency (Jori) • Disaster Recovery (Ning) • Minor Quibbles (Jori, Ning)

Presentation (Ning) • Dynamo: • The basic functions are simple; • System implementation could be very complex; • Leads to many gaps in the explanation. Missing things that are mentioned, but not explained include: • overload handling • state transfer • concurrency • job scheduling • request marshalling • request routing • system monitoring • alarming • configuration management • If you don't want to talk about them, don't mention them.

Presentation contd. • Almost impossible to understand some concepts without reading the cited material. • Some concepts are used but not well explained: • the gossip protocol • vector clock • Some concepts are not so important: SLA • Too wordy: at least give a numbered list • No clear graph: please use flow chart!! • Despite the length and many cited resources, it is still very difficult to use the article as a design document. • Many open-source clones (Cassandra, Voldemort, Riak) have tried. • Many design concerns aren't touched upon • Why the decentralized structure is better? • Must be well-versed in distributed computing concepts in order to really understand whats going on on the first read-through.

Symmetry (Jori) • There are direct contradictions in regard to symmetry: • In section 2.3: "Symmetry: Every node in Dynamo should have the same set of responsibilities as its peers; there should be no distinguished node or nodes that take special roles or extra set of responsibilities." • In section 4.8.2: "To prevent logical partitions, some Dynamo nodes play the role of seeds... Seeds can be obtained either from static configuration or from a configuration service. Typically seeds are fully functional nodes in the Dynamo ring."

Symmetry contd. • No justification for this design choice except that it "simplifies the process of system provisioning and maintenance." • Membership and failure detection are presented in a hand-wavy manner. • In this sort of system, specialization can simplify the overall design. It is not necessary for high availability. • Chubby/Paxos (google-designed distributed storage system) uses a master coordinator approach which results in much simpler consistency algorithms. It allows updates to be serialized which prevents conflicts. • A distributed directory service layer for lookup would fix dynamo's scalability issue, since nodes would no longer have to gossip the entire routing table.

Symmetry contd. • Network connectivity is not symmetric. e.g. connections between nodes in the same data center are different than those between nodes in separate data centers. • The symmetric ring-based system does not reflect this inherent asymmetry. • Server hardware configurations are inherently asymmetric. By making a symmetric system, you rule out the advantages of specialization. One can no longer use different hardware for different components of a complex system.

WAN Considerations (Ning) • Non clear introduction for the interactions between data centers. • When a Dynamo clusters span a WAN, the odds of nodes rejoining the clusters and remaining out of date are signficantly increased. • If a node goes down, ‘hinted handoff’ sends updates to the next node in the ring. Since nodes of two data centers alternate, the updates are sent to the remote data center. When the node re-joins the cluster, if the network is partitioned (which happen all the time), the node will not catch up on pending updates for a long time (until the network partitioning is healed). • Authentication and authorization are ignored in this paper. However, these could cause problems in the ring membership management.

Consistency (Jori) • Principle for Symmetry and Decentralizaion • Centralization does not mean low availability and consistency does not need to be sacrificed for high availability: BigTable+GFS • Decentralized Architecture usually causes a lot of complexity • For handling transient failures, hinted handoff is complicated. • "0.06% of inconsistent values" • millions of transactions a day for Amazon, so this ends up being a lot.

Consistency contd. • Stale reads are possible and inconvenient • A node that has been down for a significant amount of time can rejoin a cluster completely out-of-date. There is no resynchronization barrier for reentry and no concept of how far behind it is. Merkle trees lead to slow catch-up. • Dynamo provides no bounds on stale reads to the detriment of developers e.g. a stale read could indirectly lead to an incorrect write, which is hard to track. • Practical implications: • Committed writes don't show up in subsequent reads. • Committed writes may show up in some subsequent reads, but then go missing. • There is no SLA for when writes are globally committed i.e. no nodes are still playing catch-up.

Consistency contd. • Conflict Resolution • Dynamo exposes resolution logic to the developer, making application logic more complex. • Since there are no bounds for stale reads or any centralized commit logs, data returned may be woefully out-of-date. • As noted before, this data loss can lead to unexpected situations that are hard to predict. • If the returned object is a list, deleted objects may reemerge after a conflict (shopping cart example)

Disaster Recovery (Ning) • Disaster: • Entire data center fails: no way to describe the state of surviving data centers, so data loss is unbounded: • One cannot quantify exactly how much data was lost. • The lost data will be possibly corrupted forever. • Lost data can result in stale reading: • transactional inconsistencies are that most applications are ill-equipped to handle. • Recovery: • The paper does not outline how disk corruptions and failures are handled. • Standard log-shipping based replication: one can at least keep track of replication log, and therefore have a general idea of how far behind a surviving cluster is.

Minor Quibbles • Amazon implemented the system in Java, but gave no justification as to why. If the concern is providing high-speed availability, why do it in a slow language like Java? • There are a few grammar mistakes and spelling mistakes throughout - could have used a couple more read-throughs. • Wish there were comparisons of various (N,R,W) configuration schemes • The size constraint on objects limits its applications. • End of section 4.4 "However, this problem has not surfaced in production and therefore this issue has not been thoroughly investigated."

Dynamo: Amazon's Highly Available Key-Value Store

Dynamo: Amazon's Highly Available Key-Value Store

Presentation Transcript

GEOMAGNETISM: a dynamo at the centre of the Earth

Amazon Web Services: Building Highly Scalable Web Applications Institutional Web Management Workshop July 2007

Hadoop and Amazon Web Services

Amazon RDS vs. SQL Azure

Amazon Web Services ( aws )

Encantado: Pink Dolphin of the Amazon

Using Amazon EC2

Column-Stores vs. Row-Stores How Different are they Really?

The AmazoN River

Amazon Rainforest

The solar dynamo

Land Use Conflict in the Amazon Rainforest

Chapter

Magento Store Development

Store

Store Design

Video Store Pro review - I was shocked! . TRUST review and Download MEGA bonuses of Video Store ProVIDEO STORE PRO REVIE

Why Amazon product reviews are so crucial for succeeding on Amazon?

Amazon Marketing Made Easy