1 / 20

6.4 Data And File Replication

6.4 Data And File Replication. By Shruti poundarik. Data Objects and Files are replicated to increase system performance and availability. Increased system performance achieved through concurrent access of replicas. High availability of data due to redundancy of data objects.

nolen
Download Presentation

6.4 Data And File Replication

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 6.4 Data And File Replication By Shrutipoundarik

  2. Data Objects and Files are replicated to increase system performance and availability. • Increased system performance achieved through concurrent access of replicas. • High availability of data due to redundancy of data objects. • Parallelism and failure transparencies are desirable in distributed systems. • Not useful unless replication and concurrency transparency is provided.

  3. Atomicity • In database systems, atomicity is one of the ACID transaction properties. An atomic transaction is a series of database operations which either all occur, or all do not occur[1]. • All or nothing. • In DFS (Distributed File System), replicated objects (data or file) should follow atomicity rules, i.e., all copies should be updated (synchronously or asynchronously) or none

  4. Goal • One-copy serializability: The effect of transactions performed by clients on replicated objects should be the same as if they had been performed one at a time on a single set of objects.[2]

  5. Architecture for Replica Management • (FSA) File service agent, client interface • (RM) replica manager, provide replication functions [3] . • Client chooses one or more FSA to access data object. • FSA acts as front end to replica managers RMs to provide replication transparency. • FSA contacts one or more RMs for actual updating and reading of data objects.

  6. Architecture[3]

  7. Read operations [3] Three options for Read operations. • Read-one-primary, FSA only read from a primary RM to enforce consistency. • Read-one, FSA may read from any RM for concurrency. • Read-quorum, FSA must read from a quorum of RMs to decide the currency of data

  8. Object access operations may be Read or Update. • In this architecture read operation needs to be addressed to one of the replicas. • Replica’s transparent to the client. • File Services invoked by the client may be required by RM protocol to ensure data read is most recent.

  9. Write Operations[3] • From systems view point write operations should be addressed to all replicas automatically. Scenarios for Write:- • Write-one-primary:-Only write to primary RM, primary RM update all other RMs • Write-all:- update to all RMs. • Write-all- available:- Write to all functioning RMs. Faulty RM need to be synched before bring online.

  10. Write-quorum:- Update to a predefined quorum of RMs • Write-gossip :- Update to any RM and lazily propagated to other RMs.

  11. Read One Primary/Write One Primary • Both Read and write operations must be directed to primary replica manager. • No replication issue. • All operations are serialized by the primary RM. • Secondary RMs supply redundancy in case of primary failures. • Consistency is easy to achieve but not concurrency.

  12. Read one, Write all [3] • To provide Concurrency, read operation performed at any RM site. • leads to Coherency problem since propagation from one RM to the other secondary RM leads to communication delay. • Therefore the propagation of updates must be made atomic. • Updates can be initiated at any RM ,preferably the one closer to the requesting client. • Provides concurrency and coherency

  13. Achieves one copy serializability ,execution of transaction on replicated objects is equivalent to execution of same transaction on non replicated objects. • In this the data objects are replicated to faulty and non faulty replicas. • This contradicts the purpose of replication ,as atomic updates should be made available to non faulty replicas.

  14. Read one Write all available • Variation of Read One write All. • Atomic updates should be made available to non faulty replicas. • Therefore one copy serializability gets slightly complicated.

  15. Read quorum Write quorum • Each read operation to replicated data object d must obtain a read quorum R(d) to perform read. • Each write operation needs a write quorum W(d) to complete write. • Client gets most recently completed update of data as version number attached to replicated object. • Read operation queries all R(d) replicas, replica with highest version number is returned. • Write operation advances version by 1.

  16. Gossip Update [3] • Updates are less frequent than reads ,updates can be propagated lazily to replicas. • This read-one/write-gossip approach is gossip update propagation protocol • Both read and update operations are directed by FSA to any RM • FSA shields replication details from clients.

  17. Main purpose is to support high availability in an environment where failure of replicas are likely . • Disadvantages of File replication:- • Contents of the file needs to be known before replication operation takes place . • Existing System cant work in limited bandwidth networks. • DFS replication will not work well when there are large number of changes to replicate [4].

  18. Current Advancements • File Replication is optimized over limited bandwidth networks using remote differential compression.[5] • RDC Remote Differential Compression protocol heuristically negotiates a set of differences between a recipient and sender that have two sufficiently similar versions of the same file. • RDC optimizes communication between sender and recipient by having both sides sub divide all of the files into chunks and compute strong checksums or signatures for each chunks.

  19. RDC needs to be applied to compressed chunk files. • Windows Server uses Remote Differential Compression to propagate change only to save bandwidth [4].

  20. Reference [1] Wikipedia; http://en.wikipedia.org/wiki/Atomicity [2] M. T. Harandi;J. Hou (modified: I. Gupta);"Transactions with Replication";http://www.crhc.uiuc.edu/~nhv/428/slides/repl-trans.ppt [3] Randy Chow,Theodore Johnson, “Distributed Operating Systems & Algorithms”, 1998 [4] "Overview of the Distributed File System Solution in Microsoft Windows Server 2003 http://technet2.microsoft.com/WindowsServer/en/library/d3afe6ee-3083-4950-a093-8ab748651b761033.mspx?mfr=true [5] “Optimizing File Replication over Limited-Bandwidth Networks using Remote Differential Compression” IEEE Infocom Conference, 2006.

More Related