Introduction to
Download
1 / 124

Introduction to Database Management Systems (DBMS) - PowerPoint PPT Presentation


  • 159 Views
  • Uploaded on

Introduction to Database Management Systems (DBMS). Database Management System (DBMS). Definitions: Data: Known facts that can be recorded and that have implicit meaning Database: Collection of related data Ex. the names, telephone numbers and addresses of all the people you know

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Introduction to Database Management Systems (DBMS)' - halima


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript


Database management system dbms
Database Management System (DBMS)

Definitions:

  • Data: Known facts that can be recorded and that have implicit meaning

  • Database: Collection of related data

    • Ex. the names, telephone numbers and addresses of all the people you know

  • Database Management System: A computerized record-keeping system


Dbms contd
DBMS (Contd.)

  • Goals of a Database Management System:

    • To provide an efficient as well as convenient environment for accessing data in a database

    • Enforce information security: database security, concurrence control, crash recovery

  • It is a general purpose facility for:

    • Defining database

    • Constructing database

    • Manipulating database


Benefits of database approach
Benefits of database approach

  • Redundancy can be reduced

  • Inconsistency can be avoided

  • Data can be shared

  • Standards can be enforced

  • Security restrictions can be applied

  • Integrity can be maintained

  • Data independence can be provided


Dbms functions
DBMS Functions

  • Data Definition

  • Data Manipulation

  • Data Security and Integrity

  • Data Recovery and Concurrency

  • Data Dictionary

  • Performance


Database system

Users

DATABASE

SYSTEM

Application Programs/Queries

DBMS

Software

Software to process queries/programs

Software to access stored data

Stored Data Defn.

Stored Database

Database System

(META-DATA).


Database system1

user query Q1

Database scheme

Application program query Q2

Query processor

DDL compiler

Compiled query Q2

Database description

Database manager

File manager

Physical database

Database System


Data model
Data Model

  • A set of concepts used to describe the structure of a database

  • By structure, we mean the data types, relationships, and constraints that should holds for the data

Categories of Data Models

Conceptual

Physical

Representational


Database architecture

External level(individual user

views)

Conceptual level(community user view)

Internal level(storage view)

Database

Database Architecture


An example of the three levels
An example of the three levels

SNo

FName

LName

Age

Salary

BranchNo

Conceptual View

Internal View

struct STAFF {

int staffNo;

int branchNo;

char fName[15];

char lName[15];

struct date dateOfBirth;

float salary;

struct STAFF *next;

/* pointer to next Staff record */

};

index staffNo; index branchNo;

/* define indexes for staff */

FName

Salary

SNo

LName

Age

External View1

LName

SNo

BranchNo

External View2


Schema
Schema

  • Schema: Description of data in terms of a data model

  • Three-level DB Architecture defines following schemas:

    • External Schema (or sub-schema)

      • Written using external DDL

    • Conceptual Schema (or schema)

      • Written using conceptual DDL

    • Internal Schema

      • Written using internal DDL or storage structure definition


Data independence
Data Independence

  • Change the schema at one level of a database system without a need to change the schema at the next higher level

    • Logical data independence: Refers to the immunity of the external schemas to changes in the conceptual schema e.g., add new record or field

    • Physical data independence: Refers to the immunity of the conceptual schema to changes in the internal schema e.g., adding new index should not void existing ones


Types of database models

COLUMN

TABLE

ROW

VALUE

TYPES OF DATABASE MODELS

HIERARCHICAL

NETWORK

RELATIONAL


Database design phases
DATABASE DESIGN PHASES

DATA ANALYSIS

Entities - Attributes - Relationships - Integrity Rules

LOGICAL DESIGN

Tables - Columns - Primary Keys - Foreign Keys

PHYSICAL DESIGN

DDL for Tablespaces, Tables, Indexes


Introduction to relational databases rdbms

Introduction to Relational Databases:RDBMS


Definition rdbms
Definition : RDBMS

  • It is a system in which, at a minimum :

    • The data is perceived by the user as tables ( and nothing but tables ); and

    • The operators at the user’s disposal - e.g., for data retrieval - are operators that generate new tables from old, and those include at least SELECT, PROJECT, and JOIN.


Features of an rdbms
Features of an RDBMS

  • The ability to create multiple relations (tables) and enter data into them

  • An interactive query language

  • Retrieval of information stored in more than one table

  • Provides a Catalog or Dictionary, which itself consists of tables ( called system tables )


Some important terms
Some Important Terms

  • Relation : a table

  • Tuple : a row in a table

  • Attribute : a Column in a table

  • Degree : number of attributes

  • Cardinality : number of tuples

  • Primary Key : a unique identifier for the table

  • Domain : a pool of values from which specific attributes of specific relations draw their values


Properties of relations tables
Properties of Relations (Tables)

  • There are no duplicate rows (tuples)

  • Tuples are unordered, top to bottom

  • Attributes are unordered, left to right

  • All attribute values are atomic ( or scalar )

  • Relational databases do not allow repeating groups


Keys

  • Key

  • Super Key

  • Candidate Keys

    • Primary Key

    • Alternate Key

  • Secondary Keys


Keys and referential integrity

sid

cid

grade

sid

name

login

age

gpa

53666

carnatic101

C

53666

Jones

[email protected]

18

3.4

53688

reggae203

B

53688

Smith

[email protected]

18

3.2

53650

topology112

A

53650

Smith

[email protected]

19

3.8

53666

history105

B

Keys and Referential Integrity

Enrolled

Student

Primary key

Foreign key referring to

sid of STUDENT relation



Relational query languages
Relational Query Languages

  • Query languages: Allow manipulation and retrieval of data from a database.

  • Relational model supports simple, powerful QLs:

    • Strong formal foundation based on logic.

    • Allows for much optimization.

  • Query Languages != programming languages!


Example instances

sid

bid

day

22

101

10/10/99

58

103

11/12/99

sid

sname

rating

age

22

Deepa

7

45.0

31

Laxmi

8

55.5

58

Roopa

10

35.0

Example Instances

R1

S1

sid

sname

rating

age

28

Yamuna

9

35.0

S2

31

Laxmi

8

55.5

44

Geeta

5

35.0

58

Roopa

10

35.0


Relational algebra1
Relational Algebra

  • Basic operations:

    • Selection ( )

    • Projection ()

    • Cross- product ( )

    • Set- difference ( –)

    • Union ( )


Projection

sname

rating

Yamuna

9

Laxmi

8

Geeta

5

Roopa

10

Projection

sname, rating(S2)

age

35.0

age(S2)

55.5


Selection
Selection

sid

sname

rating

age

rating > 8(S2)

28

Yamuna

9

35.0

58

Roopa

10

35.0

sname

rating

Yamuna

9

sname, rating(S2) (rating > 8(S2))

Roopa

10


Union intersection set difference

sid

sname

rating

age

22

Deepa

7

45.0

sid

sname

rating

age

31

Laxmi

8

55.5

58

Roopa

10

35.0

Union, Intersection, Set Difference

sid

sname

rating

age

S1  S2

22

Deepa

7

45.0

31

Laxmi

8

55.5

58

Roopa

10

35.0

44

Geeta

5

35.0

28

Yamuna

9

35.0

S1  S2

S1  S2


Cross product
Cross- Product

(sid)

sname

rating

age

(sid)

bid

day

22

Deepa

7

45.0

22

101

10/10/99

22

Deepa

7

45.0

58

103

11/12/99

31

Laxmi

8

55.5

22

101

10/10/99

31

Laxmi

8

55.5

58

103

11/12/99

22

101

10/10/99

58

Roopa

10

35.0

58

Roopa

10

35.0

58

103

11/12/99


Joins

(sid)

sname

rating

age

(sid)

bid

day

22

Deepa

7

45.0

22

101

10/10/99

31

Laxmi

8

55.5

58

103

11/12/99

Joins

Condition Join :


Equi join
Equi-Join

(sid)

sname

rating

age

bid

day

22

Deepa

7

45.0

101

10/10/99

58

Roopa

10

35.0

103

11/12/99


Division

sno

pno

pno

pno

pno

s1

p1

p2

p2

p1

s1

p2

p4

p2

s1

p3

p4

sno

s1

p4

s1

s2

p1

sno

s2

s2

p2

sno

s1

s3

s3

p2

s1

s4

s4

s4

p2

s4

p4

Division

  • Not supported as a primitive operator, but useful for expressing queries like:

    • Find sailors who have reserved all boats .

A

B1

B2

B3

A/B1

A/B2

A/B3



Processing a high level query

Query in a high level language

SCANING, PARSING AND VALIDATING

Intermediate form of query

QUERY OPTIMIZER

Execution plan

QUERY CODE GENERATOR

Code to execute the query

RUNTIME DATABASE PROCESSOR

Result of query

Processing A High-level Query

Typical steps when processing a high level query.


Two main techniques for query optimization
Two Main Techniques for QueryOptimization

  • Heuristic Rules: A heuristic is a rule that works well in most of cases, but not always. General Idea:

    • Many different relational algebra expressions (and thus query trees) are equivalent.

    • Transform the initial query tree of a query into an equivalent final query tree that is efficient to execute.

  • Cost based query optimization

    • Estimate the cost for each execution plan, and choose the one with the lowest cost.

  • Can we get the best execution plan?


Motivating example
Motivating Example

select *

from R1, R2, R3

where R1.r2no=R2.r2no

and R2.r3no=R3.r3no

and R1.a=5000

NLJ

NLJ

SS(R1, “a=5000”)

SS(R2)

SS(R3)


Alternative plans 1 no indexes
Alternative Plans 1(No Indexes)

select *

from R1, R2, R3

where R1.r2no=R2.r2no

and R2.r3no=R3.r3no

and R1.a=5000

NLJ

NLJ

SS(R3)

SS(R1, “a=5000”)

SS(R2)


Alternative plans 2 with indexes
Alternative Plans 2 (With Indexes)

select *

from R1, R2, R3

where R1.r2no=R2.r2no

and R2.r3no=R3.r3no

and R1.a=5000

NLJ

NLJ

SS(R3)

IS(R1, “a=5000”)

SS(R2)


Conceptual design using the entity relationship model

Conceptual Design Using theEntity- Relationship Model


Overview of database design
Overview of Database Design

  • Conceptual design : (ER Model is used at this stage.)

  • Schema Refinement : (Normalization)

  • Physical Database Design and Tuning


E r modeling

Predicate Calculus

Domain Calculus

SQL / Tuple Based

Query By Examples

E R Modeling

  • Conceptual Schema Design

  • Relational Calculus - Formal Language for Relational D/B.

Relational Calculus


Design phases
Design Phases…

Requirements Collection& Analysis

Data Requirements

Functional Requirements

Conceptual Design

User Defined Operations

Data Flow DiagramsSequence Diagrams, Scenarios

Entity Types, Constraints , RelationshipsNo Implementation Details.

Ensures Requirements Meets the Design

Logical Design

Data Model Mapping – Type of Database is identified

Physical Design

Internal Storage Structures / Access Path / File Organizations


E r modeling1
E-R Modeling

  • Entity

    • is anything that exists and is distinguishable

  • Entity Set

    • a group of similar entities

  • Attribute

    • properties that describe an entity

  • Relationship

    • an association between entities


Notations
Notations

ENTITY TYPE ( REGULAR )

WEAK ENTITY TYPE

RELATIONSHIP TYPE

WEAK RELATIONSHIP TYPE


SSN

NAME

LOT

123- 22- 3666

Attishoo

48

231- 31- 5368

Smiley

22

131- 24- 3650

Smethurst

35

Entity

Attributes

ssn

name

lot

Employee

Entity Set

CREATE TABLE Employees

(ssn CHAR (11),

name CHAR (20),

lot INTEGER,

PRIMARY KEY (ssn))


ER Model

since

ssn

name

lot

budget

did

dname

Works_in

Employee

Department

supervisor

Sub-

ordinate

Reports_To


SSN

DID

SINCE

123-22-3666

51

1/1/91

123-22-3666

56

3/3/93

231-31-5368

51

2/2/92

ER Model (Contd.)

Works_ In

CREATE TABLE Works_ In(

ssn CHAR (11),

did INTEGER,

since DATE,

PRIMARY KEY (ssn, did),

FOREIGN KEY (ssn)

REFERENCES Employees,

FOREIGN KEY (did)

REFERENCES Departments)


since

ssn

name

lot

budget

did

dname

Employee

Department

Manages

Key Constraints


Key Constraints for Ternary Relationships

since

ssn

name

lot

did

dname

budget

Works_in

Employee

Department

Location

capacity

address


Participation Constraints

since

ssn

name

lot

budget

did

dname

Manages

Employee

Department

Works_in

since


Weak Entities

pname

cost

ssn

name

lot

age

Dependent

Employee

policy


ISA (‘is a’) Hierarchies

ssn

name

lot

Employee

Hrly_wages

IsA

Hrs_worked

contractid

Hourly_Emp

Contract_Emp


name

ssn

lot

Employee

monitors

until

pid

pbudget

Started on

did

dname

budget

sponsors

project

department

Aggregation


from

to

ssn

lot

name

budget

did

dname

Employee

Department

Works_in

Entity vs. Attribute

Works_ In does not allow an employee to work in a department for two or more periods (why?)


ssn

lot

name

budget

did

dname

Employee

Department

Works_in

from

Duration

to

Entity vs. Attribute (Contd.)


since

DB

ssn

lot

name

budget

did

dname

Employee

Department

manages

Entity vs. Relationship

DB - Dbudget


Entity vs. Relationship

ssn

lot

name

did

dname

budget

Employee

Department

manages

since

Appt num

Mgr_appt

DBudget


Binary vs. Ternary Relationships

age

ssn

name

lot

pname

Employee

Dependent

covers

Policy

policyid

cost


Binary vs. Ternary Relationships

Better Design

pname

ssn

name

lot

age

Dependent

Employee

Beneficiary

purchaser

Policy

policyid

cost


Constraints Beyond the ER Model

  • Some constraints cannot be captured in ER diagrams:

    • Functional dependencies

    • Inclusion dependencies

    • General constraints


E r diagram

DEPARTMENT

1

SUPPLIER

DEPT_EMP

M

PROJ_WORK

SUPP_PART_PROJ

M

M

M

M

M

PROJECT

EMPLOYEE

SUPP_PART

PROJ_MGR

M

1

1

M

EMP_DEP

PART

M

M

M

M

PART_STRUCTURE

DEPENDENT

E-R Diagram


Example to start with
Example to Start with ….

  • An Example Database Application called COMPANY which serves to illustrate the ER Model concepts and their schema design.The following are collection from the Client.


Analysis
Analysis…

  • Company :Organized into Departments, Each Department has a name, no and manager who manages the department. The Company keeps track of the date that employee managing the department. A Department may have a Several locations.


Analysis1
Analysis…

  • Department :A Department controls a number of Projects each of which has a unique name , no and a single Location.

  • Employee :Name, Age, Gender, BirthDate, SSN, Address, Salary. An Employee is assigned to one department, may work on several projects which are not controlled by the department. Track of the number of hours per week is also controlled.


Analysis2
Analysis….

  • Keep track of the dependents of each employee for insurance policies : We keep each dependant first name, gender, Date of birth and relationship to the employee.


Now to our company
Now to our Company…

DEPARTMENT ( Name , Number , { Locations } , Manager, Start Date )PROJECT

( Name, Number, Location , Controlling Department )

EMPLOYEE

(Name (Fname, Lname) , SSN , Gender, Address, Salary Birthdate, Department , Supervisor , (Workson ( Project , Hrs))

DEPENDENT

( Employee, Name, Gender, Birthdate , Relationship )


Example
Example …

  • Manage:

    • Department and Employee

    • Partial Participation

    • Relation Attribute : StartDate.

  • Works For:

    • Department and Employee

    • Total Participation


Example1
Example…

  • Control :

    • Department , Project

    • Partial Participation from Department

    • Total Participation from Project

    • Control Department is a RKA.

  • Supervisor :

    • Employee, Employee

    • Partial and Recursive


Example2
Example …

  • Works – On :

    • Project , Employee

    • Total Participation

    • Hours Worked is a RKA.

  • Dependants of:

    • Employee , Dependant

    • Dependant is a Weaker

    • Dependant is Total , Employee is Partial.


One possible mapping of the problem statement

Lname

Fname

Hours

Bdate

Bdate

Name

Name

Name

Sdate

Name

SSN

Sex

Loc

Sex

Loc

Sal

No

No

Address

Dependent

Relationship

One Possible mapping of the Problem Statement

Works For

Department

Controls

Employee

manages

WorksOn

Project

Supervises

Depend On


Schema refinement and normalization

Schema Refinement andNormalization


Normalization and normal forms
Normalization and Normal Forms

  • Normalization:

    • Decomposing a larger, complex table into several smaller, simpler ones.

    • Move from a lower normal form to a higher Normal form.

  • Normal Forms:

    • First Normal Form (1NF)

    • Second Normal Form (2NF)

    • Third Normal Form (3NF)

    • *Higher Normal Forms (BCNF, 4NF, 5NF ....)

  • In practice, 3NF is often good enough.


Why normal forms
Why Normal Forms

  • The first question to ask is whether any refinement is needed!

  • If a relation is in a certain normal form (BCNF, 3NF etc.), it is known that certain kinds of problems are avoided/ minimized. This can be used to help us decide whether decomposing the relation will help.


The evils of redundancy
The Evils of Redundancy

  • Redundancy is at the root of several problems associated with relational schemas

  • More seriously, data redundancy causes several anomalies: insert, update, delete

  • Wastage of storage.

  • Main refinement technique: decomposition (replacing ABCD with, say, AB and BCD, or ACD and ABD).


Refining an er diagram before
Refining an ER Diagram - Before

since

budget

ssn

name

lot

did

dname

Works_in

Employee

Department


Refining an er diagram after

since

budget

ssn

name

did

dname

lot

Employee

Department

Works_in

Refining an ER Diagram - After


First normal form
First Normal Form

  • A table is in 1NF, if every row contains exactly one value for each attribute.

  • Disallow multivalued attributes, composite attributes and their combinations.

  • 1NF states that :

    • domains of attributes must include only atomic (simple, indivisible) values and that value of any attribute in a tuple must be a single value from the domain of that attribute.

  • By definition, any relational table must be in 1NF.


Functional dependencies fds
Functional Dependencies (FDs)

  • Provide a formal mechanism to express constraints between attributes

  • Given a relation R, attribute Y of R is functionally dependent on the attribute X of R if & only if each X-value in R has associated with it precisely one Y-value in R.


Full dependency
Full Dependency

  • Concept of full functional dependency

    • A FD x  y is a full functional dependency if removal of any attribute A from X means that the dependency does not hold any more.


Partial dependency
Partial Dependency

  • An F.D. x  y is a partial dependency if there is some attribute A  X that can be removed from X and the dependency will still hold.


Example constraints on entity set

S

N

L

R

W

H

123- 22- 3666

Attishoo

48

8

10

40

231- 31- 5368

Smiley

22

8

10

30

131- 24- 3650

Smethurst

35

5

7

30

434- 26- 3751

Guldu

35

5

7

32

612- 67- 4134

Madayan

35

8

10

40

S

N

L

R

W

H

123- 22- 3666

Attishoo

48

5

7

40

231- 31- 5368

Smiley

22

8

10

30

131- 24- 3650

Smethurst

35

30

434- 26- 3751

Guldu

35

32

612- 67- 4134

Madayan

35

40

Example: Constraints on Entity Set

R

8

8

5

5

8


Second normal form 2nf
Second Normal Form (2NF)

  • A relation schema R is in 2NF if:

    • it is in 1NF and

    • every non-prime attribute A in R is fully functionally dependent on the primary key of R.

  • 2NF prohibits partial dependencies.


2nf an example
2NF: An Example

  • Emp{Eno, Dept, ProjCode, Hours}

    • Primary key: {Eno, ProjCode}

    • {Eno} -> {Dept}, {Eno, ProjCode} -> {Hours}

  • Test of 2NF

    • {Eno} -> {Dept}: partial dependency.

    • Emp is in 1NF, but not in 2NF.

  • Decomposition:

    • Emp {Eno, Dept}

    • Proj {Eno, ProjCode, Hours}


Transitive dependency
Transitive Dependency

  • An FD X  Y in a relation schema R is a transitive dependency if

    • there is a set of attributes Z that is not a subset of any key of R, and

    • both X  Z and Z  Y hold.


Third normal form
Third Normal Form

  • A relation schema R is in 3NF if

    • It is in 2NF and

    • No nonprime attribute of R is transitively dependent on the primary key.

  • 3NF means that each non-key attribute value in any tuple is truly dependent on the Primary Key and not even partially on other attributes.

  • 3NF prohibits transitive dependencies.


3nf an example
3NF: An Example

  • Emp{Eno, Dept, Dept_Head}

    • Primary key: {Eno}

    • {Eno} -> {Dept}, {Dept} -> {Dept_Head}

  • Test of 3NF

    • {Eno} -> {Dept} -> {Dept_Head}: Transitive dependency.

    • Emp is in 2NF, but not in 3NF.

  • Decomposition:

    • Emp {Eno, Dept}

    • Dept {Dept, Dept_Head}


Boyce codd normal form
Boyce –Codd Normal Form

  • The intention of BCNF is that- 3NF does not satisfactorily handle the case of a relation processing two or more composite or overlapping candidate keys


Bcnf boyce codd normal form
BCNF ( Boyce Codd Normal Form)

  • A Relation is said to be in Boyce Codd Normal Form (BCNF) if and only if every determinant is a candidate key.


Decomposition of a relation scheme
Decomposition of a Relation Scheme

  • Suppose that relation R contains attributes A1 ... An. A decomposition of R consists of replacing R by two or more relations such that:

    • Each new relation scheme contains a subset of the attributes of R (and no attributes that do not appear in R), and

    • Every attribute of R appears as an attribute of one of the new relations.



Transaction
Transaction

  • A sequence of many actions which are considered to be one atomic unit of work.

    • Read, write, commit, abort

  • Governed by four ACID properties:

    • Atomicity, Consistency, Isolation, Durability

  • Has a unique starting point, some actions and one end point


The acid properties
The ACID Properties

  • A tomicity: All actions in the transaction happen, or none happen.

  • C onsistency: If each transaction is consistent, and the DB starts consistent, it ends up consistent.

  • I solation: Execution of one transaction is isolated from that of other transactions.

  • D urability: If a transaction commits, its effects persist.


Automicity
Automicity

  • All-or-nothing, no partial results. An event either happens and is committed or fails and is rolled back.

    • e.g. in a money transfer, debit one account, credit the other. Either both debiting and crediting operations succeed, or neither of them do.

    • Transaction failure is called Abort

  • Commit and abort are irrevocable actions. There is no undo for these actions.

  • An Abort undoes operations that have already been executed

    • For database operations, restore the data’s previous value from before the transaction (Rollback-it); a Rollback command will undo all actions taken since the last commit for that user.

    • But some real world operations are not undoable.Examples - transfer money, print ticket, fire missile


Consistency
Consistency

  • Every transaction should maintain DB consistency

    • Referential integrity - e.g. each order references an existing customer number and existing part numbers

    • The books balance (debits = credits, assets = liabilities)

  • Consistency preservation is a property of a transaction, not of the database mechanisms for controlling it (unlike the A, I, and D of ACID)

  • If each transaction maintains consistency, then a serial execution of transactions does also


Isolation
Isolation

Intuitively, the effect of a set of transactions should

be the same as if they ran independently.

  • Formally, an interleaved execution of transactions is serializable if its effect is equivalent to a serial one.

  • Implies a user view where the system runs each user’s transaction stand-alone.

  • Of course, transactions in fact run with lots of concurrency, to use device parallelism – this will be covered later.

  • Transactions can use common data (shared data)

  • They can use the same data processing mechanisms

    (time sharing)


Durability
Durability

  • When a transaction commits, its results will survive failures (e.g. of the application, OS, DB system … even of the disk).

  • Makes it possible for a transaction to be a legal contract.

  • Implementation is usually via a log

    • DB system writes all transaction updates to a log file

    • to commit, it adds a record “commit(Ti)” to the log

    • when the commit record is on disk, the transaction is committed.

    • system waits for disk ack before acknowledging to user


Transaction processing
Transaction processing

Can be automatic (controlled by the RDBMS) or programmatic (programmed using SQL or other supported programming languages, like PL/SQL)


Why have concurrent processes
Why Have Concurrent Processes?

  • Better transaction throughput

  • Improved response time

  • Done via better utilization of resources:

    • While one processes is doing a disk read, another can be using the CPU or reading another disk.


Typical situations requiring concurrency control
Typical situations requiring concurrency control

  • Exclusive access to an external device or shared service (e.g., managing printer queues)

  • Coordination of applications which process parallel data (e.g. parallel DB servers)

  • Disabling or enabling execution of the client programs in a specific moment (typically for database administration - e.g. database backups, enforcing resource occupation, etc.)

  • Detection of transaction ends when managing multiple sessions for connection to the database (client/server architectures, Web access)


Problems with concurrency in absence of locking
Problems with Concurrency (in absence of locking)

  • Lost Update problem - losing values due to intervention of write operation from other overlapping transactions

  • Temporary Update problem - discarding previous changes made by overlapping transaction after rollback

  • Incorrect Summary problem - overwriting of certain

    values used for calculation by write operations from other transactions


Lost update problem
Lost Update Problem

Transaction A

Time

Value

Transaction B

T0

Start A

6

  • What should the final Order Value be?

  • Which Update has been lost?

6

Start B

T1

Read Value (6)

T2

Add 2 (6+2=8)

6

Read Value

(6)

T3

Write Value (8)

8

Add 3 (6+3=9)

T4

End A

9

Write Value (9)

T5

9

End B


Temporary update problem
Temporary Update Problem

Time

Transaction A

Value

Transaction B

T0

Start A

6

T1

Read Value (6)

6

  • What should the final Order Value be?

  • Where is the temporary update?

T2

Add 2 (8)

6

T3

Write Value (8)

8

Start B

T4

Failure: Rollback!

8

Read Value (8)

T5

Write Value (6)

6

Add 3 (8+3=11)

End A

T6

11

Write Value (11)

T5

11

End B


Incorrect summary problem
Incorrect Summary Problem

Time

Values

Transaction B

Transaction A

6

3

T0

Read 1stValue (6)

  • What should the total Order Value be?

  • Which order was accumulated before update, and which after?

6

3

T1

Add 2 (6+2=8)

Write 1stValue (8)

T2

8

3

8

3

Read 1stValue (8)

Read 2nd Value (3)

T3

8

3

Read 2nd Value (3)

T4

Add 2 (3+2 = 5)

8

5

Total Sum = 11

T5

Write 2nd Value (5)


3.1 Database State and Changes

D1, D2 - Logically consistent states of the database data

T - Transaction for changing the database

t1, t2 - Absolute time before and after the transaction


3.2 Transaction State and Progress

A transaction reaches its commit point when all operations accessing the database are completed and the result has been recorded in the log. It then writes a [commit, <transaction-id>] and terminates.

BEGIN

END

COMMIT

partially

active

committed

committed

ROLLBACK

,

WRITE

READ

ROLLBACK

terminated

aborted

  • When a system failure occurs, search the log file for entries

    • [start, <transaction-id>]

  • and if there are no logged entries [commit, <transaction-id>]

  • then undo all operations that have logged entries

    • [write, <transaction-id>, X, old_value, new_value]


Schedules

T1

T2

R(A)

W(A)

R(B)

W(B)

R(C)

W(C)

Schedules

  • Schedule: Actions of transactions as seen by the DBMS


Serializable schedule
Serializable Schedule

  • A schedule whose effect on the DB “state” is the same as that of some serial schedule

  • All serial schedules are serializable

    • But the reverse may not be true


Serializability violations

Transfer Rs.10,000 from A to B

Add 6% interest to A & B

T1

T2

R(A)

W(A)

R(A)

W(A)

R(B)

W(B)

Database is inconsistent!

commit

R(B)

W(B)

commit

Serializability Violations


Cascading aborts

T1

T2

R(A)

W(A)

R(A)

W(A)

abort

Cascading Aborts


Recoverable schedules
Recoverable Schedules

Unrecoverable Schedule

Recoverable Schedule

T1

T2

T1

T2

R(A)

R(A)

W(A)

W(A)

R(A)

R(A)

W(A)

W(A)

commit

commit

abort

commit


Locking
Locking

  • The concept of locking data items is one of the main techniques for controlling the concurrent execution of transactions.

  • A lock is a variable associated with a data item in the database.

    • Generally there is a lock for each data item in the database.

  • A lock describes the status of the data item with respect to possible operations that can be applied to that item

    • used for synchronising the access by concurrent transactions to the database items.

  • A transaction locks an object before using it

  • When an object is locked by another transaction, the requesting transaction must wait


Locking granularity
Locking Granularity

  • A database item which can be locked could be

    • a database record

    • a field value of a database record

    • the whole database

  • Trade-offs

    • coarse granularity

      • the larger the data item size, the lower the degree of concurrency

    • fine granularity

      • the smaller the data item size, the more locks to be managed and stored, and the more lock/unlock operations needed.


Locking a technique for concurrency control

--

S

X

--

S

X

Locking: A Technique for Concurrency Control

  • Locks are automatically obtained by DBMS.

  • Guarantees serializability!

Compatibility matrix for lock types X and S

S: Shared lock

X: Exclusive lock

-- No lock


Two phase locking 2pl
Two- Phase Locking (2PL)

  • Strict 2PL:

  • – If T wants to read an object, first obtains an S lock.

  • – If T wants to modify an object, first obtains X lock.

  • – Hold all locks until end of transaction.

  • – Guarantees serializability, and recoverable schedule, too!

    • also avoids WW problems!

  • 2PL:

  • – Slight variant of strict 2PL

  • – transactions can release locks before the end (commit or abort)

    • But after releasing any lock it can acquire no new locks

  • – Guarantees serializability


Handling a lock request
Handling a Lock Request

Lock Request (XID, OID, Mode)

Mode==X

Mode==S

Empty Wait Queue?

Currently Locked?

No

Yes

Yes

Currently X-locked?

Yes

No

Put on Queue

No

Grant Lock


Recovery
Recovery

  • Occurs in case of transaction failures.

  • Database (DB) is restored to the most recent consistent state just before the time of failure.

  • To do this, the DB system needs information about changes applied by various transactions. It is the system log.


Recovery motivation
Recovery: Motivation

crash

T1

T2

T3

T4

T5

  • Atomicity: Undoing actions of transaction that do not commit

  • Durability: Making sure all actions of committed transactions survive system crashes

  • The Recovery Manager guarantees Atomicity & Durability.


Recovery outline
Recovery Outline

  • Restore to most recent “consistent” state just before time of failure

    • Use data in the log file

  • Catastrophic Failure

    • Restore database from backup

    • Replay transactions from log file

  • Database becomes inconsistent (non-catastrophic errors)

    • Undo or Redo last transactions until consistent state is restored


Logging
Logging

  • Record REDO and UNDO information, for every update, in a log.

    – Sequential writes to log (put it on a separate disk).

    – Minimal info (diff) written to log, so multiple updates fit in a single log page.


Handling the buffer pool

Trivial

Desired

Handling the Buffer Pool

  • When is buffer written back to disk?

    • Steal/No-steal

      • Can it be written before commit? (steal)

      • Or does it have to wait till after commit? (no-steal)

    • Force/No-force

      • Is it written “immediately” after commit? (force)

      • Or can it remain in memory? (no-force)

NoSteal

Steal

Force

NoForce


Write ahead logging wal
Write- Ahead Logging (WAL)

  • The Write- Ahead Logging Protocol:

    • Must force the log record for an update before the corresponding data page gets to disk.

    • Must write all log records for a transaction before commit .

  • What goes into log:

    • BFIM needed for UNDO type algorithms

    • AFIM needed for REDO type algorithms


Checkpoints in the system log
Checkpoints in the System Log

  • Checkpoint record written in log when all updated DB buffers written out to disk

  • Any committed transaction occurring before checkpoint in log can be considered permanent (won’t have to be redone after crash)

  • Actions

    • suspend execution of all transactions

    • force-write all modified buffers to disk

    • write checkpoint entry in log and force write log

    • resume transactions

  • Fuzzy checkpointing

    • resume transactions as soon as buffers written


ad