1 / 29

Database Replication Using Generalized Snapshot Isolation

Database Replication Using Generalized Snapshot Isolation. Sameh Elnikety, EPFL Fernando Pedone, USI Willy Zwaenepoel, EPFL. Snapshot Isolation (SI). Snapshot = committed state of database On begin: Snapshot(T) = latest snapshot at start(T) On read or write operation:

Download Presentation

Database Replication Using Generalized Snapshot Isolation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Database Replication Using Generalized Snapshot Isolation Sameh Elnikety, EPFL Fernando Pedone, USI Willy Zwaenepoel, EPFL

  2. Snapshot Isolation (SI) • Snapshot = committed state of database • On begin: • Snapshot(T) = latest snapshot at start(T) • On read or write operation: • T reads from and writes to its snapshot • On commit: • Read-only T commits immediately • Update T commits if no conflicting writes between its start & commit times

  3. Advantages of SI • Read-only T’s never block or abort • Read-only T’s never cause update T’s to block or abort • Compare to 2PL • No read-locks are used in SI • Important for read-dominated workloads

  4. Drawbacks of SI • Not serializable • Permits certain anomalies • But • Anomalies are rare in practice • Conditions on workload can identify and avoid them • Developers use SI serializably

  5. Summary of SI • SI is here to stay • Used in several databases, e.g., • Oracle • PostgreSQL • Microsoft SQL Server ( 2PL & SI ) • Borland InterBase

  6. SI Replication • Replicate SI to scale performance for dynamic content Web servers • E.g., E-commerce, bulletin boards • Workload is suitable for SI • Read-only T’s dominate workload • Update T’s are short & few • How to maintain SI properties?

  7. SI in Replicated Database • On begin: • Snapshot(T) = latest snapshot at start(T) • On read or write operation: • T reads from and writes to its snapshot • On commit: • Read-only T commits immediately • Update T commits if no conflicting writes between its start & commit times

  8. Strict SI in Replicated Database • On begin: • Snapshot(T) = latest snapshot at start(T) • On read or write operation: • T reads from and writes to its snapshot • On commit: • Read-only T commits immediately • Update T commits if no conflicting writes between its start & commit times

  9. Generalized Snapshot Isolation (GSI) • On begin: • Snapshot(T) = (latest) older snapshot • At replica, use latest local snapshot • On read or write operation: • T reads from and writes to its snapshot • On commit: • Read-only T commits immediately • Update T commits if no conflicting writes between its (start) snapshot & commit times

  10. Generalized Snapshot Isolation (GSI) • On begin: • Snapshot(T) = (latest) older snapshot • At replica, use latest local snapshot • On read or write operation: • T reads from and writes to its snapshot • On commit: • Read-only T commits immediately • Update T commits if no conflicting writes between its (start) snapshot & commit times Certification for update T

  11. Advantages of GSI • All T’s reads and writes are local • Important for replicated databases • Read-only T’s never block or abort • Read-only T’s never cause update T’s to block or abort • Important for read-dominated workloads

  12. A - GSI Serializability • Not serializable • Permits certain anomalies as in SI But • Anomalies are rare in practice • Two serializability conditions (in the paper) • Static: examine transaction templates • Dynamic: at run time • Easy to verify workload is serializable • Easy to modify workload to be serializable

  13. A - GSI Serializability • Not serializable • Permits certain anomalies as in SI But • Anomalies are rare in practice • Two serializability conditions (in the paper) • Static: examine transaction templates • Dynamic: at run time • Easy to verify workload is serializable • Easy to modify workload to be serializable Similar to what many OracleDBA’s already do

  14. B - GSI Older Snapshots 1- On begin: Snapshot(T) = (latest) older snapshot • GSI uses older snapshots But • Clear definition, always consistent data • No new anomalies ( same as in SI ) • In replicated database • Transparent: db appears as running SI • Efficient: reads are non-blocking • Staleness: can be bounded

  15. C - GSI Abort Rates • 3- On commit: • - Read-only T commits immediately • - Update T commits if no conflicting writes between its (start) snapshot & commit times • Potentially higher abort rate for updates But • Abort rates are small in target workloads • GSI Abort rates can be higher or lower Certification for update T

  16. GSI in Replicated Databases • System consists of • Many SI replicas, full replication • Centralized certifier ( distributed in the paper ) • A client connects to one replica • Issues read and update transactions • Algorithm implements an instance GSI • Snapshot(T) = latest local snapshot at replica

  17. Algorithm at Replica • On begin: • Provide T with a local Snapshot • Record T.version = Snapshot.version • On read or write operation: • Run transaction (reads/writes) locally • Record T.writeset • On commit: • IF ( T is read-only ) THEN { commit } • ELSE { Invoke certification ( T.version, T.writeset ). . . }

  18. Algorithm at Certifier • Check for conflicting writes from committed T’s with larger version number • IF ( yes ) THEN { Reply ( abort ) } • ELSE { Advance certifier-version Record (writeset, certifier-version) to log Reply ( 1 - commit, 2 - certifier-version, 3 - “missing” writesets ) }

  19. Algorithm at Replica (cont.) • On begin: . . . • On read or write operation: . . . • On commit: • IF ( T is read-only ) THEN { commit } • ELSE { Invoke certification (T.version, T.writeset )1- Apply “missing” writesets 2- Commit locally 3- Advance local version }

  20. Performance Tradeoff GSI : SI • GSI • better response time • SI • “fresher” data (latest snapshot in the system) • lower abort rate for updates (?) • Analytical performance model • Model used by Jim Gray • Replicated database over WAN

  21. Analytical Model • GSI • Execute T immediately • Updates are certified remotely (communication) • SI • Block T to obtain latest version (communication) • Updates are certified remotely (communication) • Objective is to compare GSI : SI • Response time • Abort rate

  22. Analytical Equations • Parameters x = round trip delay / transaction length • Response time ratio (GSI : SI) Read-only update

  23. Analytical Equations • Parameters x = round trip delay / transaction length t = snapshot age / transaction length • Response time ratio (GSI : SI) Read-only update • Abort rate ratio (GSI : SI) Read-only (never aborted!) update

  24. Analytical Results • Parameters x = round trip delay / transaction length t = snapshot age / transaction length • X-axis x = round trip delay / transaction length x = 0  centralized database x is increasing as technology advances • Y-axis Response time ratio (for reads & updates) Abort ratio (updates)

  25. Response Time Ratio of GSI : SI • . GSI is better

  26. Abort Ratio of GSI : SI for Updates • . SI better GSI better Parameter t = ( snapshot age / transaction length )

  27. Abort Ratio of GSI : SI for Updates • . t decreasing fresher snapshot SI better GSI better Parameter t = ( snapshot age / transaction length )

  28. GSI : SI - Summary • GSI response times are better • Read-only T’s ratio : significantly better • Update T’s ratio : reaches ½ • GSI abort rate • maybe higher or lower • COST: observing older data in GSI • Favorable trade-off • Distributed environments • Read-dominated workloads

  29. Conclusions • GSI is appealing for replication • All T’s read & write operations are local • Read-only T’s never block or abort • GSI can be made serializable • Algorithm for GSI in replicated databases • Analytical results are encouraging

More Related