EEC-681/781 Distributed Computing Systems

EEC-681/781Distributed Computing Systems Lecture 12 Wenbing Zhao wenbing@ieee.org Cleveland State University

Outline • Project report requirement • Transaction processing concepts • Distributed transaction and two phase commit • Midterm #2 • 12/6 Wednesday EEC-681: Distributed Computing Systems

Project Report Requirement • Theory track • Introduction: define the problem and provide motivation why we need a solution • Background: so that readers can understand the techniques used to solve the problem • Current state of the art: what are the fundamental techniques used to solve the problem. Ideally, provide a taxonomy of the techniques • Open issues and future research directions: what are the hard problems remaining to be solved? EEC-681: Distributed Computing Systems

Project Report Requirement • Implementation track • Introduction: define the problem domain and your implementation. Provide motivation on your system • System model: assumption, restrictions, models • Design: component diagram, class diagram, pseudo code, algorithms, header explanation • Implementation: what language, tools, libraries did you use, a simple user guide on how to user your system • Performance and testing: throughput, latency, test cases • Related work • Conclusion and future work EEC-681: Distributed Computing Systems

Project Requirement • What you should NOT do • Take an application from Internet or your friend => F grade • False claim of working prototype, fabricate performance data and test cases => F grade • Use other’s slides for presentation • What you should do • If used any open source code, acknowledge it in both your source code and your report, and provide reference • Extensively comment your code • Follow good naming and coding conventions • Use a source version control system, such as cvs, svn • If your code does not work, acknowledge it in your report EEC-681: Distributed Computing Systems

Project Report Requirement • Report format: IEEE Transactions format. 4-10 pages • MS Word Template • http://www.ieee.org/portal/cms_docs/pubs/transactions/TRANS-JOUR.DOC • LaTex Template • http://www.ieee.org/portal/cms_docs/pubs/transactions/IEEEtran.zip (main text) • http://www.ieee.org/portal/cms_docs/pubs/transactions/IEEEtranBST.zip (bibliography) • Report due: Dec 13 mid-night (electronic copy of the report & source code is required) EEC-681: Distributed Computing Systems

Why Transaction Processing? • To achieve a form of fault tolerance • If something bad happens in a middle of a set of operations, we abort and rollback to the original state EEC-681: Distributed Computing Systems

Transaction and ACID Properties A transaction is a collection of operations on the state of an object (database, object composition, etc.) that satisfies the following properties: • Atomicity: All operations either succeed, or all of them fail. When the transaction fails, the state of the object will remain unaffected by the transaction. • Consistency: A transaction establishes a valid state transition. • Isolation: Concurrent transactions do not interfere with each other. It appears to each transaction T that other transactions occur either before T, or after T, but never both. • Durability: After the execution of a transaction, its effects are made permanent: changes to the state survive failures. EEC-681: Distributed Computing Systems

Primitives for Transactions Example transactions EEC-681: Distributed Computing Systems

Transaction Classification • Flat transactions: a sequence of operations that satisfies the ACID properties (the most common one) • Nested transactions: A hierarchy of transactions that allows • Concurrent processing of subtransactions, and • Recovery per subtransaction • Distributed transactions: A (flat) transaction that span multiple databases distributed across the network EEC-681: Distributed Computing Systems

Implementation of Transactions • Private workspace • Writeahead log EEC-681: Distributed Computing Systems

Private Workspace A transaction gets its own copy of the (part of the) database. When things go wrong delete copy, otherwise commit the changes to the original The file index and disk blocks for a three-block file The situation after a transaction has modified block 0 and appended block 3 After committing EEC-681: Distributed Computing Systems

Writeahead Log Use a writeahead log in which changes are recorded allowing one to roll backwhen things go wrong A transaction The log before & after each statement is executed EEC-681: Distributed Computing Systems

Concurrency Control • Goal: Increase efficiency by allowing several transactions to execute at the same time • Constraint: Effect should be the same as if the transactions were executed in some serial order General organization of managers for handling transactions EEC-681: Distributed Computing Systems

Concurrency Control General organization of managers for handling distributed transactions EEC-681: Distributed Computing Systems

Serializability • Consider a collection E of transactions T1, … Tn • Goal is to conduct a serializable executionof E: • Transactions in E are possibly concurrently executed according to some schedule S • Schedule S is equivalent to some totally ordered execution of T1, … Tn • Two operations Op(Ti,x) and Op(Tj,x) on the same data item x, and from a set of logs may conflictat a data manager: • read-write conflict (rw):One is a read operation while the other is a write operation on x • write-write conflict (ww):Both are write operations on x EEC-681: Distributed Computing Systems

Basic Scheduling Theorem • Concurrency control - process conflicting reads and writes in certain relative orders • Read-write and write-write conflicts can be synchronized independently, as long as we stick to a total ordering of transactions that is consistent with both types of conflicts EEC-681: Distributed Computing Systems

Synchronization Techniques • Two-phase locking:Before reading or writing a data item, a lock must be obtained. After a lock is released, the transaction is not allowed to acquire any more locks • Timestamp ordering:Operations in a transaction are timestamped, and data managers are forced to handle operations in timestamp order • Optimistic control:Don’t prevent things from going wrong, but correct the situation if conflicts actually did happen EEC-681: Distributed Computing Systems

Two-phase Locking • There are only READ and WRITE operations within transactions • Locks are granted and released only by scheduler • Locking policy is to avoid conflicts between operations EEC-681: Distributed Computing Systems

Two-phase Locking • Rule 1: When client submits Op(Ti,x), scheduler tests whether it conflicts with an operation Op(Tj,x) from some other client. If no conflict then grant Op(Ti,x), otherwise delay execution of Op(Ti,x) • Conflicting operations are executed in the same order as that locks are granted • Rule 2: If Op(Ti,x) has been granted, do not release the lock until Op(Ti,x) has been executed by data manager • Guarantees LOCK => Op => RELEASE order • Rule 3: If RELEASE(Ti,x) has taken place, no more locks for Ti may be granted • Combined with rule 1, guarantees that all pairs of conflicting operations of two transactions are done in the same order EEC-681: Distributed Computing Systems

Two-Phase Locking • Centralized 2PL: A single site handles all locks • Primary 2PL: Each data item is assigned a primary site to handle its locks. Data is not necessarily replicated • Distributed 2PL: Assumes data can be replicated. Each primary is responsible for handling locks for its data, which may reside at remote data managers EEC-681: Distributed Computing Systems

Two-phase Locking: Problems • Problem 1: System can come into a deadlock. How? • Practical solution: put a timeout on locks and abort transaction on expiration. • Problem 2: When should the scheduler actually release a lock: • (1) when operation has been executed • (2) when it knows that no more locks will be requested • No good way of testing condition (2) unless transaction has been committed or aborted • Moreover: Assume the following execution sequence takes place: RELEASE(Ti,x) => LOCK(Tj,x) => ABORT(Ti). • Consequence: scheduler will have to abort Tj as well (cascaded aborts) • Solution: Release all locks only at commit/abort time (strict two-phase locking) EEC-681: Distributed Computing Systems

Strict Two-Phase Locking EEC-681: Distributed Computing Systems

Two-Phase Commit – Achieving Atomicity in Distributed Transactions • Model: The client who initiated the computation acts as a coordinator; processes required to commit are the participants • Phase 1a: Coordinator sends VOTE_REQUEST to participants (also called a pre-write) • Phase 1b: When participant receives VOTE_REQUEST it returns either YES or NO to coordinator. If it sends NO, it aborts its local computation • Phase 2a: Coordinator collects all votes; if all are YES, it sends COMMIT to all participants, otherwise it sends ABORT • Phase 2b: Each participant waits for COMMIT or ABORT and handles accordingly EEC-681: Distributed Computing Systems

Two-Phase Commit The finite state machine for the coordinator in 2PC The finite state machine for a participant EEC-681: Distributed Computing Systems

2PC – Failing Participant Consider participant crash in one of its states, and the subsequent recovery to that state: • Initial state:No problem, as participant was unaware of the protocol • Ready state:Participant is waiting to either commit or abort. After recovery, participant needs to know which state transition it should make => log the coordinator’s decision • Abort state:Need to make entry into abort state idempotent • Commit state:Also make entry into commit state idempotent EEC-681: Distributed Computing Systems

2PC – Failing Coordinator • If it fails, the final decision is not available until the coordinator recovers • Alternative: Let a participant P in the ready state timeout when it hasn’t received the coordinator’s decision • P tries to find out what other participants know • Question: Can P not succeed in getting the required information? EEC-681: Distributed Computing Systems

2PC – Failing Coordinator • Question: Can P not succeed in getting the required information? • Observation: Essence of the problem is that a recovering participant cannot make a local decision: it is dependent on other (possibly failed) processes • There might exist one participant that has received a COMMIT decision from the coordinator and subsequently failed (more or less concurrently failed with the coordinator) • The rest of participants cannot unilaterally decide to abort the transaction EEC-681: Distributed Computing Systems

EEC-681/781 Distributed Computing Systems

EEC-681/781 Distributed Computing Systems

Presentation Transcript

Distributed Computing Systems

Distributed Systems

Distributed computing

ACM SIGACT News Distributed Computing Column 9

EEC-681/781 Distributed Computing Systems

Chapter 18 – Distributed software engineering

COS 497 - Cloud Computing 2. Distributed Computing

Distributed Systems Meet Economics: Pricing in Cloud Computing

Distributed Systems and Architectures

Distributed Systems and Architectures

Distributed systems

Distributed Computing and Analysis

Distributed Computing

Distributed systems II AGREEMENT (2-3 phase CoM. )

Architecture of Cloud Computing and Distributed Database Systems

DISTRIBUTED COMPUTING

Chapter 18 – Distributed software engineering

Mobile Computing – A Distributed Systems Perspective

Java Distributed Computing

Distributed systems II Closing

1. Introduction II

COMP 734 -- Distributed File Systems