1 / 14

Calvin: Fast Distributed Transactions for Partitioned Database Systems Thomson et al SIGMOD 2012

Calvin: Fast Distributed Transactions for Partitioned Database Systems Thomson et al SIGMOD 2012. Presented by Dr. Greg Speegle April 12, 2013. Distributed Transactions. Two-phase commit slow relative to local transaction processing CAP Theorem Option 1: Reduce availability

mab
Download Presentation

Calvin: Fast Distributed Transactions for Partitioned Database Systems Thomson et al SIGMOD 2012

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Calvin: Fast Distributed Transactions for Partitioned Database SystemsThomson et alSIGMOD 2012 Presented by Dr. Greg Speegle April 12, 2013

  2. Distributed Transactions • Two-phase commit slow relative to local transaction processing • CAP Theorem • Option 1: Reduce availability • Option 2: Reduce consistency • Goal: Provide availability and consistency by changing transaction semantics

  3. Deterministic Transactions • Normal transaction execution • Submit SQL statements • Subsequent operations dependent on results • Deterministic transaction execution • Submit all requests before start • Example: Auto-commit • Difficult for dependent execution

  4. Architecture • Sequencing Layer • Per replica • Creates universal transaction execution order • Scheduling Layer • Per data store • Executes transactions consistently with order • Storage Layer • CRUD interface

  5. Architecture Overview

  6. Data Model • Dataset partitioned • Partitions are replicated • One copy of each partition forms replica • All replicas of one partition form replication group • Master/slave within replication group (for asynchronous replication)

  7. Sequencer • Requests (deterministic transaction) submitted locally • Epoch – 10ms group of requests • Asynchronous replication – master receives all requests & determines order • Synchronous replication – Paxos determines order • Batch sent to scheduler

  8. Scheduler • Logical concurrency control & recovery (e.g., no TIDs) • Lock manager distributed (lock only keys stored locally) • Strict 2PL with changes: • If t0 and t1 conflict and t0 precedes t1 in sequence order, t0 locks before t1 • All lock requests by transaction processed together in sequence order

  9. Scheduler II • Transaction executes after all locks acquired • Read/Write set analysis • Local vs Remote • Read-only nodes are passive participants • Write nodes are active participants • Local Reads • Distribute reads to active participants • Collect remote read results • Apply local writes

  10. Scheduler III • Deadlock Free (acyclic waits-for graph) • Dependent Transactions • Read-only reconnaissance query generates read set • Transaction executed with resulting read/write locks • Re-execute if changes • Maximum conflict footprint under 2PL

  11. Storage • Disk I/O problem • Pause t0 when I/O required • A t1 can “jump ahead” of t0 (get conflicting lock before t0) • Solution: Delay t0, but request data • So t1 may precede t0 in sequence (assume) and execution

  12. Checkpointing • Logging requires only ordered transactions to restore after failure • At checkpoint time (global epoch time) • Keep two versions of data, before & after • Transaction access appropriate data • After all “before” transactions terminate, flush all data • Throw away “before” version if “after” exists • 20% throughput impact

  13. Performance • TPC-C benchmark (order placing) • Throughput scales linearly with number of machines • Per-node throughput appears asymptotic • At high contention, outperforms RDBMS • At low contention, worse performance

  14. Conclusion • Adds ACID capability to any CRUD system • Performs nearly linear scale-up • Requires deterministic transactions

More Related