1 / 33

Distributed Storage System Survey

Distributed Storage System Survey. Yang Kun. Agenda. 1. History of DSS 2. Definition & Terminology 3. Basic Factors 4. DSS Common Design 5. Basic Theories 6. Popular Algorithms 7. Replication Strategies 8. Implementations 9. Open Source & Business. History of DSS.

dusty
Download Presentation

Distributed Storage System Survey

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Distributed Storage System Survey Yang Kun

  2. Agenda • 1. History of DSS • 2. Definition & Terminology • 3. Basic Factors • 4. DSS Common Design • 5. Basic Theories • 6. Popular Algorithms • 7. Replication Strategies • 8. Implementations • 9. Open Source & Business

  3. History of DSS • Network File System (1980s)

  4. History of DSS • Storage Area Network(SAN)File System(1990s)

  5. History of DSS • Object oriented parallel file system (2000s)

  6. History of DSS • Cloud Storage

  7. Definition & Terminology • Transparency • network-transparency, user-mobility • Performance Measurement • The amount of time needed to satisfy service requests. • The performance should be comparable to that of a conventional file system.

  8. Definition & Terminology • Fault Tolerance: 1. Communication faults, machine failures ( of type fail stop), storage device crashes, decays of storage media. • Scalability: A scalable system should react more gracefully to increased load • The performance should degrade more moderately than that of a non-scalable system. • The resources should reach a saturated state later compared with a non-scalable system.

  9. Definition & Terminology Consistency: Consistency requires that there must exist a total order on all operations such that each operation looks as if it were completed at a single instant. Availability: Every request received by a non-failing node in the system must result in a response. Reliability

  10. Basic Factors • Location Transparency • User mobility • Security • Performance • Scalability • Availability • Failure Tolerance

  11. DSS Common Design Client: Writing Client: Reading

  12. Basic Theories • CAP Theory • ACID vs. BASE Model • Quorum NRW

  13. CAP Theory

  14. CAP Theory • In a partition network(both in synchronous and partially synchronous), it is impossible for a web service to provide consistency, availability and partition-tolerance at the same time. • Consistency • Availability • Partition-tolerance

  15. CAP Theory • CP: All data in only one node, and other node read/write from this node. • CA: Database System • AP: Make sure that returns the value every time. • Cassandra = A + P + Eventually Consistency

  16. ACID vs. BASE Model

  17. Quorum NRW • N: Replica's mount, that is how many backup for each data object. • R: The minimum mount of successful reading, that is the minimum mount for identifying a reading operation is successful. • W: The minimum mount of successful writing, that is the minimum mount for identifying a writing operation is successful. • The three factors decide the availability, consistency and fault-tolerance. And Strong consistency can be guaranteed only if W + R > N.

  18. Popular Algorithms • PAXOS Algorithms • Roles: Proposer, Acceptor, Learner • Phases: Accept, Learn

  19. PAXOS

  20. PAXOS

  21. Popular Algorithms • Consistent Hashing

  22. Popular Algorithms • Mutual Algorithms • Lamport Algorithm (3*(n - 1)) • Improved Lamport Algorithm (3*(n - 1)) • Ricart–Agrawala algorithm (2*(n - 1)) • Maekawa Algorithm • Roucairol-CarvalhoAlgorithm

  23. Popular Algorithms • Election Algorithms • Chang-Roberts Algorithm ( n log n) • Garcia-Molina's bully Algorithm • Non-based on Comparison Algorithms

  24. Popular Algorithms • Bidding Algorithms • Self Stabilization Algorithms

  25. Replication Strategies • Asynchronous Master/Slave Replication Log appends are acknowledged at the master in parallel with transmission to slaves. (Not support ACID) • Synchronous Master/Slave Replication A master waits for changes to be mirrored to slaves before acknowledging them. (Need timely detection) • Optimistic Replication Any member of a homogeneous replica group can accept mutations (Order is not known, transaction is impossible)

  26. Chain Replication

  27. CRAQ • Chain Replication with Apportioned Queries

  28. Funnel Replication • Topology • Vector Clock • Total Order • Write Request (key, value, vector clock, originating head replica)

  29. Atomic Commit Protocol • Two-PC 1. Voting phase The coordinator requests all participating sites to prepare to commit. 2. Decision phase The coordinator either commits the transaction if all participants are prepared-to-commit (voted “yes”), or aborts the transaction if any participant has decided to abort (voted “no”).

  30. Atomic Commit Protocol • Presumed Abort Protocol It is designed to reduce the cost associated with aborting transactions. • Presumed Commit Protocol It is designed to reduce the cost associated with committing transactions through interpret missing information about transactions as commit decisions. One-PC One-Phase Commit protocol consists of only a single phase which is the decision phase of 2PC. One-Two-PC

  31. Implementations • BigTable • Windows Azure Storage • Google MegaStore • Chubby

  32. Open Source & Business

  33. Thank you!!!

More Related