Distributed Storage System Survey Yang Kun
Agenda • 1. History of DSS • 2. Definition & Terminology • 3. Basic Factors • 4. DSS Common Design • 5. Basic Theories • 6. Popular Algorithms • 7. Replication Strategies • 8. Implementations • 9. Open Source & Business
History of DSS • Network File System （1980s）
History of DSS • Storage Area Network（SAN）File System（1990s）
History of DSS • Object oriented parallel file system （2000s）
History of DSS • Cloud Storage
Definition & Terminology • Transparency • network-transparency, user-mobility • Performance Measurement • The amount of time needed to satisfy service requests. • The performance should be comparable to that of a conventional file system.
Definition & Terminology • Fault Tolerance: 1. Communication faults, machine failures ( of type fail stop), storage device crashes, decays of storage media. • Scalability: A scalable system should react more gracefully to increased load • The performance should degrade more moderately than that of a non-scalable system. • The resources should reach a saturated state later compared with a non-scalable system.
Definition & Terminology Consistency: Consistency requires that there must exist a total order on all operations such that each operation looks as if it were completed at a single instant. Availability: Every request received by a non-failing node in the system must result in a response. Reliability
Basic Factors • Location Transparency • User mobility • Security • Performance • Scalability • Availability • Failure Tolerance
DSS Common Design Client: Writing Client: Reading
Basic Theories • CAP Theory • ACID vs. BASE Model • Quorum NRW
CAP Theory • In a partition network(both in synchronous and partially synchronous), it is impossible for a web service to provide consistency, availability and partition-tolerance at the same time. • Consistency • Availability • Partition-tolerance
CAP Theory • CP: All data in only one node, and other node read/write from this node. • CA: Database System • AP: Make sure that returns the value every time. • Cassandra = A + P + Eventually Consistency
Quorum NRW • N: Replica's mount, that is how many backup for each data object. • R: The minimum mount of successful reading, that is the minimum mount for identifying a reading operation is successful. • W: The minimum mount of successful writing, that is the minimum mount for identifying a writing operation is successful. • The three factors decide the availability, consistency and fault-tolerance. And Strong consistency can be guaranteed only if W + R > N.
Popular Algorithms • PAXOS Algorithms • Roles: Proposer, Acceptor, Learner • Phases: Accept, Learn
Popular Algorithms • Consistent Hashing
Popular Algorithms • Mutual Algorithms • Lamport Algorithm (3*(n - 1)) • Improved Lamport Algorithm (3*(n - 1)) • Ricart–Agrawala algorithm (2*(n - 1)) • Maekawa Algorithm • Roucairol-CarvalhoAlgorithm
Popular Algorithms • Election Algorithms • Chang-Roberts Algorithm ( n log n) • Garcia-Molina's bully Algorithm • Non-based on Comparison Algorithms
Popular Algorithms • Bidding Algorithms • Self Stabilization Algorithms
Replication Strategies • Asynchronous Master/Slave Replication Log appends are acknowledged at the master in parallel with transmission to slaves. (Not support ACID) • Synchronous Master/Slave Replication A master waits for changes to be mirrored to slaves before acknowledging them. (Need timely detection) • Optimistic Replication Any member of a homogeneous replica group can accept mutations (Order is not known, transaction is impossible)
CRAQ • Chain Replication with Apportioned Queries
Funnel Replication • Topology • Vector Clock • Total Order • Write Request (key, value, vector clock, originating head replica)
Atomic Commit Protocol • Two-PC 1. Voting phase The coordinator requests all participating sites to prepare to commit. 2. Decision phase The coordinator either commits the transaction if all participants are prepared-to-commit (voted “yes”), or aborts the transaction if any participant has decided to abort (voted “no”).
Atomic Commit Protocol • Presumed Abort Protocol It is designed to reduce the cost associated with aborting transactions. • Presumed Commit Protocol It is designed to reduce the cost associated with committing transactions through interpret missing information about transactions as commit decisions. One-PC One-Phase Commit protocol consists of only a single phase which is the decision phase of 2PC. One-Two-PC
Implementations • BigTable • Windows Azure Storage • Google MegaStore • Chubby