Database system concepts
1 / 327

Database System Concepts - PowerPoint PPT Presentation

  • Uploaded on

Database System Concepts. Instructor: Jin Yuanping [email protected] Professor of College of Software Engineering, Southeast University The reference book: Database System Concepts (4th Edition), A. Silberschatz , et al, McGraw-Hill, 2002. The Content of the Course 1 Introduction

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Database System Concepts' - chandler-sanders

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Database system concepts

Database System Concepts


Jin Yuanping [email protected]

Professor of College of Software Engineering, Southeast University

The reference book:

Database System Concepts (4th Edition),

A. Silberschatz , et al, McGraw-Hill, 2002

The Content of the Course

1 Introduction

2 Entity-Relationship Model

3 Relational Model


5 Integrity Constraints

6 Transactions

7 Concurrency Control

8 Database System Architecture

9 New Applications

10 XML

Topic 1 introduction
Topic 1: Introduction

  • Database System

  • View of Data

  • Data Models

  • Data Definition Language

  • Data Manipulation Language

  • Transaction Management

  • Storage Management

  • Database Administrator

  • Database Users

  • Overall System Structure

A database system
A Database System

  • A collection of interrelated data (DB)

  • A set of programs to access the data (DBMS)

  • A set of application programs

  • A group of people to administrate it (DBA)

  • DB contains information about a particular enterprise

  • DBMS provides an environment that is both convenient and efficient to use.

View of data
View of Data

An architecture for a database system

Levels of abstraction
Levels of Abstraction

  • Physical level describes how a record (e.g., customer) is stored.

  • Logical level: describes data stored in database, and the relationships among the data.

    type customer = recordname : string;street : string;city : integer;end;

  • View level: application programs hide details of data types. Views can also hide information (e.g., salary) for security purposes.

Instances and schemas
Instances and Schemas

  • Similar to types, variables and values in programming languages

  • Schema – the logical structure of the database (e.g., set of customers and accounts and the relationship between them)

  • Instance – the actual content of the database at a particular point in time

Data independence
Data Independence

  • Ability to modify a schema definition in one level without affecting a schema definition in the next higher level.

  • The interfaces between the various levels and components should be well defined so that changes in some parts do not seriously influence others.

  • Two levels of data independence:

    • Physical data independence

    • Logical data independence

Data models
Data Models

  • A collection of concepts and tools for describing

    • data

    • data relationships

    • data semantics

    • data constraints

  • Entity-relationship model

  • Object-oriented model

  • Relational model (Oracle, DB2)

  • Semi-structured model (XML)

Entity relationship model
Entity-Relationship Model

Example of entity-relationship model

Relational model
Relational Model

  • Example of tabular data in the relational model

















Xml data
XML Data



<account-number> A-101 </account-number>

<branch-name> Downtown </branch-name>

<balance> 500 </balance>



<account-number> A-101 </account-number>

<customer-name> Johnson </customer-name>



Data definition language ddl
Data Definition Language (DDL)

  • Specification notation for defining the database schema

  • DDL compiler generates a set of tables

  • Data dictionary contains metadata (i.e., data about data)

  • Data storage and definition language – special type of DDL in which the storage structure and access methods used by the database system are specified

Data manipulation language dml
Data Manipulation Language (DML)

  • Language for accessing and manipulating the data organized by the appropriate data model

  • Two classes of languages

    • Procedural – user specifies what data is required and how to get those data

    • Nonprocedural – user specifies what data is required without specifying how to get those data

Transaction management
Transaction Management

  • A transaction is a collection of operations that performs a single logical function in a database application

  • Transaction-management component ensures that the database remains in a consistent (correct) state despite system failures (e.g., power failures and operating system crashes) and transaction failures.

  • Concurrency-control manager controls the interaction among the concurrent transactions, to ensure the consistency of the database.

Storage management
Storage Management

  • Storage manager is a program module that provides the interface between the low-level data stored in the database and the application programs and queries submitted to the system.

  • The storage manager is responsible for the following tasks:

    • interaction with the file manager

    • efficient storing, retrieving and updating of data

Database administrator
Database Administrator

  • Coordinates all the activities of the database system; the database administrator has a good understanding of the enterprise’s information resources and needs.

  • Database administrator's duties include:

    • Schema definition

    • Storage structure and access method definition

    • Schema and physical organization modification

    • Granting user authority to access the database

    • Specifying integrity constraints

    • Acting as liaison with users

    • Monitoring performance and responding to changes in requirements

Database users
Database Users

  • Users are differentiated by the way they expect to interact with the system

  • Application programmers – interact with system through DML calls

  • Sophisticated users – form requests in a database query language

  • Specialized users – write specialized database applications that do not fit into the traditional data processing framework

  • Naïve users – invoke one of the permanent application programs that have been written previously

Overall system structure cont
Overall System Structure (Cont.)

  • The storage manager components also include:

    Authorization and integrity manager

Topic 2 entity relationship model
Topic 2: Entity-Relationship Model

  • Entity Sets

  • Relationship Sets

  • Mapping Constraints

  • Keys

  • E-R Diagram

  • Design of an E-R Database Schema

  • Reduction of an E-R Schema to Tables

Entity sets
Entity Sets

  • A database can be modeled as:

    • a collection of entities,

    • relationship among entities.

  • An entity is an object that exists and is distinguishable from other objects.

    Example: specific person, company, event, plant

  • An entity set is a set of entities of the same type that share the same properties.

    Example: set of all persons, companies, trees, holidays


  • An entity is represented by a set of attributes, that is descriptive properties possessed by all members of an entity set.


    customer = (customer-name, social-security, customer-street, customer-city) account = (account-number, balance)

  • Domain – the set of permitted values for each attribute

  • Attribute types:

    • Simple and composite attributes.

    • Single-valued and multi-valued attributes.

    • Null attributes.

    • Derived attributes.

Composite attributes customer name and customer address
Composite Attributes customer-name and customer-address

Relationship sets
Relationship Sets

  • A relationship is an association among several entitiesExample:

    Hayes (Hayes, A-102) A-102customerdepositor account

    entity set relationship set entity set

  • A relationship set is a mathematical relation among n  2 entity sets. {(e1, e2, … en) | e1  E1, e2  E2, …, en  En}where (e1, e2, …, en) is a relationship

    • Example:

      (Hayes, A-102)  depositor

Relationship sets cont
Relationship Sets (Cont.)

  • an attribute can also be property of a relationship set. For instance, the depositor relationship set between entity sets customer and account may have the attribute access-date

    E-R Diagram with an Attribute Attached to a Relationship

Degree of a relationship set
Degree of a Relationship Set

  • Refers to number of entity sets that participate in a relationship set.

  • Relationship sets that involve two entity sets are binary (or degree two). Generally, most relationship sets in a database system are binary.

  • Relationship sets may involve more than two entity sets. The entity sets customer, loan, and branch may be linked by the ternary (degree three) relationship set CLB.


Entity sets of a relationship need not be distinct

  • The labels “manager” and “worker” are called roles; they specify how employee entities interact via the works-for relationship set.

  • Roles are indicated in E-R diagrams by labeling the lines that connect diamonds to rectangles.

  • Role labels are optional, and are used to clarify semantics of the relationship

Mapping cardinalities
Mapping Cardinalities

  • Express the number of entities to which another entity can be associated via a relationship set.

  • Most useful in describing binary relationship sets.

  • For a binary relationship set the mapping cardinality must be one of the following types:

    • One to one

    • One to many

    • Many to one

    • Many to many

  • We distinguish among these types by drawing either a directed line (), signifying “one,” or an undirected line (—), signifying “many,” between the relationship set and the entity set.

Mapping cardinalities1
Mapping Cardinalities

One to one

One to many

Mapping cardinalities2
Mapping Cardinalities

Many to one

Many to many

One to one relationship
One-To-One Relationship

  • A customer is associated with at most one loan via the relationship borrower

  • A loan is associated with at most one customer via borrower

One to many and many to one cont
One-To-Many and Many-To-One (Cont.)

  • In the one-to-many relationship (a), a loan is associated with at most one customer via borrower, a customer is associated with several (including 0) loans via borrower

  • In the many-to-one relationship (b), a loan is associated with several (including 0) customers via borrower, a customer is associated with at most one loan via borrower

Many to many relationship
Many-To-Many Relationship

  • A customer is associated with several (possibly 0) loans via borrower

  • A loan is associated with several (possibly 0) customers via borrower

Existence dependencies




Existence Dependencies

  • If the existence of entity x depends on the existence of entity y, then x is said to be existence dependent on y.

    • y is a dominant entity (in example below, loan)

    • x is a subordinate entity (in example below, payment)

If a loan entity is deleted, then all its associated payment entities must be deleted also.


  • A super key of an entity set is a set of one or more attributes whose values uniquely determine each entity.

  • A candidate key of an entity set is a minimal super key

    • social-security is candidate key of customer

    • account-number is candidate key of account

  • Although several candidate keys may exist, one of the candidate keys is selected to be the primary key.

  • The combination of primary keys of the participating entity sets forms a candidate key of a relationship set.

    • must consider the mapping cardinality and the semantics of the relationship set when selecting the primary key.

    • (social-security, account-number) is the primary key of depositor)

E r diagram components
E-R Diagram Components

  • Rectangles represent entity sets.

  • Ellipses represent attributes

  • Diamonds represent relationship sets.

  • Lines link attributes to entity sets and entity sets to relationship sets.

  • Double ellipses represent multivalued attributes.

  • Dashed ellipses denote derived attributes.

  • Primary key attributes are underlined.

Weak entity sets
Weak Entity Sets

  • An entity set that does not have a primary key is referred to as a weak entity set.

  • The existence of a weak entity set depends on the existence of a strong entity set, it must relate to the strong set via a one-to-many relationship set.

  • The discriminator (or partial key) of a weak entity set is the set of attributes that distinguishes among all the entities of a weak entity set that depend on one particular strong entity.

  • The primary key of a weak entity set is formed by the primary key of the strong entity set on which the weak entity set is existence dependent, plus the weak entity set’s discriminator.

Weak entity sets cont
Weak Entity Sets (Cont.)

  • We depict a weak entity set by double rectangles.

  • We underline the discriminator of a weak entity set with a dashed line.

  • payment-number – discriminator of the payment entity set

  • Primary key for payment – (loan-number, payment-number)


  • Top-down design process, we designate subgroupings within an entity set that are distinctive from other entities in the set.

  • These subgroupings become lower-level entity sets that have attributes or participate in relationships that do not apply to the higher-level entity set.

  • Depicted by a triangle component labeled ISA (i.e., saving-account “is an” account).


  • A bottom-up design process – combine a number of entity sets that share the same features into a higher-level entity set.

  • Specialization and generalization are simple inversions of each other; they are represented in an E-R diagram in the same way.

  • Attribute inheritance – a lower-level entity set inherits all the attributes and relationship participation of the higher-level entity set to which it is linked.

Design constraints on a generalization
Design Constraints on a Generalization

  • Constraint on which entities can be members of a given lower-level entity set.

    • condition-defined

    • user-defined

  • Constraint on whether or not entities may belong to more than one lower-level entity set within a single generalization.

    • disjoint

    • overlapping

  • Completeness constraint – specifies whether or not an entity in the higher-level entity set must belong to at least one of the lower-level entity sets within a generalization.

    • total

    • partial

Reduction of an e r schema to tables
Reduction of an E-R Schema to Tables

  • Primary keys allow entity sets and relationship sets to be expressed uniformly as tables which represent the contents of the database.

  • A database which conforms to an E-R diagram can be represented by a collection of tables.

  • For each entity set and relationship set there is a unique table which is assigned the name of the corresponding entity set or relationship set.

  • Each table has a number of columns (generally corresponding to attributes), which have unique names.

  • Converting an E-R diagram to a table format is the basis for deriving a relational database design from an E-R diagram.

Representing entity sets as tables
Representing Entity Sets as Tables

  • A strong entity set reduces to a table with the same attributes.

















  • A weak entity set becomes a table that includes a column for the primary key of the identifying strong entity set.











10 May 1996

17 May 1996

12 May 1996




Representing relationship sets as tables
Representing Relationship Sets as Tables

  • A many-to-many relationship set is represented as a table with columns for the primary keys of the two participating entity sets, and any descriptive attributes of the relationship set.







The depositor table

  • The table corresponding to a relationship set linking a weak entity set to its identifying strong entity set is redundant.The payment table already contains the information that would appear in the loan-payment table (i.e., the columns loan-number and payment-number).

Representing generalization as tables
Representing Generalization as Tables

  • Method 1: Form a table for the generalized entity account. Form a table for each entity set that is specialized (include primary key of generalized entity set) table table attributesaccount account-number, balance, account-type savings-account account-number, interest-rate checking-account account-number, overdraft-amount

  • Method 2: If the generalization is disjoint and total, form a table for each entity set that is specialized table table attributessavings-account account-number, balance, interest-rate checking-account account-number, balance, overdraft-amountMethod 2 has no table for generalized entity account

Topic 3 relational model
Topic 3: Relational Model

  • Structure of Relational Databases

  • Relational Algebra

  • Extended Relational-Algebra-Operations

  • Modification of the Database

  • Views

Basic structure
Basic Structure

  • Given sets A1, A2, …. An, a relation r is a subset of

    A1 x A2 x … x An

    Thus a relation is a set of n-tuples (a1, a2, …, an)

    where ai Ai

  • Example: if

    customer-name = {Jones, Smith, Curry, Lindsay}customer-street = {Main, North, Park}customer-city = {Harrison, Rye, Pittsfield}Then

    r = {(Jones, Main, Harrison), (Smith, North, Rye), (Curry, North, Rye),

    (Lindsay, Park, Pittsfield)}

    is a relation over

    customer-name x customer-street x customer-city

Relation schema
Relation Schema

  • A1, A2, …, Anare attributes and also as their domains

  • R = (A1, A2, …, An ) is a relation schema

    Customer-schema = (customer-name, customer-street, customer-city)

  • r(R) is a relation on the relation schema R

    customer (Customer-schema)

Relation instance
Relation Instance

  • The current values (relation instance) of a relation are specified by a table

  • An element t of r is a tuple, represented by a row in a table


















  • Let K  R

  • K is a superkey of R if values for K are sufficient to identify a unique tuple of each possible relation r(R) by “possible r” we mean a relation r that could exist in the enterprise we are modeling.Example: {customer-name, customer-street} and {customer-name} are both superkeys of Customer, if no two customers can possibly have the same name.

  • K is a candidate key if K is minimalExample: {customer-name} is a candidate key for Customer, since it is a superkey (assuming no two customers can possibly have the same name), and no subset of it is a superkey.

Determining keys from e r sets
Determining Keys from E-R Sets

  • Strong entity set. The primary key of the entity set becomes the primary key of the relation.

  • Weak entity set. The primary key of the relation consists of the union of the primary key of the strong entity set and the discriminator of the weak entity set.

  • Relationship set. The union of the primary keys of the related entity sets becomes a super key of the relation.For binary many-to-one relationship sets, the primary key of the “many” entity set becomes the relation’s primary key.For one-to-one relationship sets, the relation’s primary key can be that of either entity set.

Query languages
Query Languages

  • Language in which user requests information from the database.

  • Typically a level higher than a standard programming language.

  • Categories of languages

    • procedural

    • non-procedural

  • “Pure” languages---concise:

    • Relational Algebra

    • Tuple Relational Calculus

    • Domain Relational Calculus

  • Pure languages form underlying basis of query languages that people use.

Relational algebra
Relational Algebra

  • Procedural language

  • Six basic operators

    • select

    • project

    • union

    • set difference

    • Cartesian product

    • rename

  • The operators take one or two relations as inputs and give a new relation as a result.

Select operation
Select Operation

  • Notation: p(r)

  • Defined as:

    p(r) = {t | t  r andp(t)}

    Where P is a formula in propositional calculus, dealing with terms of the form:

    <attribute> = <attribute> or <constant>

     >


    “connected by” :  (and),  (or),  (not)

Select operation example
Select Operation – Example





  • Relation r









  • A=B ^ D > 5 (r)









Project operation
Project Operation

  • Notation:A1, A2, …, Ak (r)

    where A1, A2 … Ak are attribute names and r is a relation name.

  • The result is defined as the relation of k columns obtained by erasing the columns that are not listed

  • Duplicate rows removed from result, since relations are sets

Project operation example
Project Operation – Example

  • Relation r:
























Union operation
Union Operation

  • Notation: r s

  • Defined as:

    r s = {t | t  r or t  s}

  • For r s to be valid.

    1. r,s must have the same number of attributes

    2. The attribute domains must be compatible (e.g., 2nd column of r deals with the same type of values as does the 2nd column of s)

Union operation example
Union Operation – Example

  • Relations r, s:












r  s:







Set difference operation
Set Difference Operation

  • Notation r – s

  • Defined as:

    r – s = {t | t r and t  s}

  • Set differences must be taken between compatible relations.

    • r and s must have the same number of attributes

    • attribute domains of r and s must be compatible

Set difference operation example
Set Difference Operation – Example

  • Relations r, s:












r – s:





Cartesian product operation
Cartesian-Product Operation

  • Notation r x s

  • Defined as:

    r x s = {t q| t  r and q  s}

  • Assume that attributes of r(R) and s(S) are disjoint. (That is, R  S = ).

  • If attributes of r(R) and s(S) are not disjoint, then renaming must be used.

Cartesian product operation example
Cartesian-Product Operation – Example






Relations r, s:











r xs:


























Composition of operations
Composition of Operations

  • Can build expressions using multiple operations

  • Example: A=C(r x s)

  • Notation: rs (natural join)

    • Let r and s be relations on schemas R and S respectively. The result is a relation on schema R  S which is obtained by considering each pair of tuples tr from r and tsfrom s.

    • If tr and ts have the same value on each of the attributes in R  S, a tuple t is added to the result, where

      • t has the same value as tron r

      • t has the same value as ts on s

Composition of operations cont
Composition of Operations (Cont.)


R = (A, B, C, D)

S = (E, B, D)

  • Result schema = (A, B, C, D, E)

  • rs is defined as:

    r.A, r.B, r.C ,r.D, s.E (c.B=s.B ^ r.D=s.D(r x s))

Natural join operation example

r s

Natural Join Operation – Example

  • Relations r, s:













































Banking example
Banking Example

branch (branch-name, branch-city, assets)

customer (customer-name, customer-street, customer-city)

account (branch-name, account-number, balance)

loan (branch-name, loan-number, amount)

depositor (customer-name, account-number)

borrower (customer-name, loan-number)

Extended relational algebra operations
Extended Relational-Algebra-Operations

  • Generalized Projection

  • Outer Join

  • Aggregate Functions

Generalized projection
Generalized Projection

  • Extends the projection operation by allowing arithmetic functions to be used in the projection list.PF1, F2, …, Fn(E)

  • E is any relational-algebra expression

  • Each of F1, F2, …, Fn are arithmetic expressions involving constants and attributes in the schema of E.

  • Given relation credit-info(customer-name, limit, credit-balance), find how much more each person can spend:

    P customer-name, limit – credit-balance (credit-info)

Outer join
Outer Join

  • An extension of the join operation that avoids loss of information.

  • Computes the join and then adds tuples form one relation that does not match tuples in the other relation to the result of the join.

  • Uses null values:

    • null signifies that the value is unknown or does not exist

    • All comparisons involving null are false by definition.

Outer join example
Outer Join – Example

  • Relation loan













  • Relation borrower









Outer join example1

loan  borrower

Outer Join – Example

  • loan borrower





























Outer join example2

loan   borrower

Outer Join – Example

  • loan  borrower





































Aggregate functions
Aggregate Functions

  • Aggregation operator  takes a collection of values and returns a single value as a result.

    avg: average valuemin: minimum valuemax: maximum valuesum: sum of valuescount: number of values

    G1, G2, …, GnF1, A1, F2, A2, …, Fm, Am (E)

    • E is any relational-algebra expression

    • G1, G2 …, Gn is a list of attributes on which to group

    • Fiis an aggregate function

    • Aiis an attribute name

Aggregate function example
Aggregate Function – Example

  • Relation r:











Aggregate function example1
Aggregate Function – Example

  • Relation account grouped by branch-name:



















branch-name sum balance (account)


sum of balance







Modification of the database
Modification of the Database

  • The content of the database may be modified using the following operations:

    • Deletion

    • Insertion

    • Updating

  • All these operations are expressed using the assignment operator().


  • A delete request is expressed similarly to a query, except instead of displaying tuples to the user, the selected tuples are removed from the database.

  • Can delete only whole tuples; cannot delete values on only particular attributes

  • A deletion is expressed in relational algebra by:

    r r – E

    where r is a relation and E is a relational algebra query.

Deletion examples
Deletion Examples

  • Delete all account records in the Perryridge branch.

    account account – branch-name = “Perryridge” (account)

  • Delete all loan records with amount in the range of 0 to 50

    loan  loan – amount 0and amount  50 (loan)

  • Delete all accounts at branches located in Needham.

    r1  branch-city = “Needham” (account branch)

    r2  branch-name, account-number, balance (r1)

    r3   customer-name, account-number (r2 depositor)

    account  account – r2

    depositor  depositor – r3


  • To insert data into a relation, we either:

    • specify a tuple to be inserted

    • write a query whose result is a set of tuples to be inserted

  • in relational algebra, an insertion is expressed by:

    r  r  E

    where r is a relation and E is a relational algebra expression.

  • The insertion of a single tuple is expressed by letting E be a constant relation containing one tuple.

Insertion examples
Insertion Examples

  • Insert information in the database specifying that Smith has $1200 in account A-973 at the Perryridge branch.

    account  account  {(“Perryridge”, A-973, 1200)}

    depositor  depositor  {(“Smith”, A-973)}

  • Provide as a gift for all loan customers in the Perryridge branch, a $200 savings account. Let the loan number serve as the account number for the new savings account.

    r1  (branch-name = “Perryridge” (borrower loan))

    account  account  (branch-name, loan-number(r1) x{(200)})

    depositor  depositor   customer-name, loan-number(r1)


  • A mechanism to change a value in a tuple without changing all values in the tuple

  • Use the generalized projection operator to do this task

    r F1, F2, …, Fn(r)

  • Each Fi, is either the ith attribute of r, if the ith attribute is not updated, or, if the attribute is to be updated Fi is an expression, involving only constants and the attributes of r, which gives the new value for the attribute

Update examples
Update Examples

  • Make interest payments by increasing all balances by 5 percent.

    account   branch-name, account-number, balance * 1.05(account)

  • Pay all accounts with balances over $10,0006 percent interest and pay all others 5 percent

    account   branch-name, account-number, balance * 1.06( balance 10000 (account))   branch-name, account-number, balance * 1.05( balance 10000 (account))


  • In some cases, it is not desirable for all users to see the entire logical model (i.e., all the actual relations stored in the database.)

  • Consider a person who needs to know a customer’s loan number and branch name but has no need to see the loan amount. This person should see a relation described, in the relational algebra, by

    customer-name, loan-number, branch-name (borrower loan)

  • Any relation that is not of the conceptual model but is made visible to a user as a “virtual relation” is called a view.

View definition
View Definition

  • A view is defined using the create view statement which has the form

    create view v as <query expression>

    where <query expression> is any legal relational algebra query expression. The view name is represented by v.

  • Once a view is defined, the view name can be used to refer to the virtual relation that the view generates.

  • View definition is not the same as creating a new relation by evaluating the query expression. Rather, a view definition causes the saving of an expression to be substituted into queries using the view.

View examples
View Examples

  • Consider the view (named all-customer) consisting of branches and their customers.

    create view all-customer as

    branch-name, customer-name (depositor account)

      branch-name, customer-name (borrowerloan)

  • We can find all customers of the Perryridge branch by writing:

     customer-name( branch-name = “Perryridge” (all-customer))

Updates through view
Updates Through View

  • Database modifications expressed as views must be translated to modifications of the actual relations in the database.

  • Consider the person who needs to see all loan data in the loan relation except amount. The view given to the person, branch-loan, is defined as:

    create view branch-loan as

    branch-name, loan-number (loan)

  • Since we allow a view name to appear wherever a relation name is allowed, the person may write:

    branch-loan  branch-loan  {(“Perryridge”, L-37)}

Updates through views cont
Updates Through Views (Cont.)

  • The previous insertion must be represented by an insertion into the actual relation loan from which the view branch-loan is constructed.

  • An insertion into loan requires a value for amount. The insertion can be dealt with by either.

    • rejecting the insertion and returning an error message to the user.

    • inserting a tuple (“Perryridge”, L-37, null) into the loan relation

Views defined using other views
Views Defined Using Other Views

  • One view may be used in the expression defining another view

  • A view relation v1 is said to depend directly on a view relation v2 if v2 is used in the expression defining v1 , this is depicted by a directed edge < v2v1 >

  • A view relation v1 is said to depend on view relation v2, if there is a path from v2 to v1 .

  • A view relation v is said to be recursive if it depends on itself.

View expansion
View Expansion

  • A way to define the meaning of views defined in terms of other views.

  • Let view v1 be defined by an expression e1 that may itself contain uses of view relations.

  • View expansion of an expression repeats the following replacement step:

    repeatFind any view relation vi in e1 Replace the view relation vi by the expression defining viuntil no more view relations are present in e1

  • As long as the view definitions are not recursive, this loop will terminate

Topic 4 sql
Topic 4: SQL

SQL: Structured Query Language for commercial DBMS---more user-friendly

  • Basic Structure

  • Set Operations

  • Aggregate Functions

  • Null Values

  • Nested Subqueries

  • Derived Relations

  • Views

  • Modification of the Database

  • Joined Relations

  • Data Definition Language

  • Embedded SQL

Basic structure1
Basic Structure

  • SQL is based on set and relational operations with certain modifications and enhancements

  • A typical SQL query has the form:select A1, A2, ..., Anfromr1, r2, ..., rmwhere P

    • Ais represent attributes

    • ris represent relations

    • P is a predicate.

  • This query is equivalent to the relational algebra expression.

    A1, A2, ..., An (P (r1 x r2 x ... x rm))

  • The result of an SQL query is a relation.

The select clause
The select Clause

  • The select clause corresponds to the projection operation of the relational algebra. It is used to list the attributes desired in the result of a query.

  • Q1:Find the names of all branches in the loan relation

    select branch-namefrom loan

    the query is equivalent to:


  • An asterisk in the select clause denotes “all attributes”

    select *from loan

The select clause cont
The select Clause (Cont.)

  • SQL allows duplicates in relations as well as in query results.

  • To force the elimination of duplicates, insert the keyword distinct after select.

  • Q2: Find the names of all branches in the loan relations, and remove duplicates

    select distinct branch-namefrom loan

  • The keyword all specifies that duplicates not be removed.

    select allbranch-namefrom loan

The select clause cont1
The select Clause (Cont.)

  • The select clause can contain arithmetic expressions involving the operation +, –, , and /, and operating on constants or attributes of tuples---generalized projection.

  • Q3:

    selectbranch-name, loan-number, amount  100from loan

    would return a relation which is the same as the loan relations, except that the attribute amount is multiplied by 100.

The where clause
The where Clause

  • The where clause corresponds to the selection predicate of the relational algebra. It consists of a predicate involving attributes of the relations that appear in the from clause.

  • Q4: Find all loan number for loans made at the Perryridge branch with loan amounts greater than $1300.

    select loan-numberfrom loanwhere branch-name = “ Perryridge” and amount > 1300

  • SQL uses the logical connectives and, or, and not. It allows the use of arithmetic expressions as operands to the comparison operators.

The where clause cont
The where Clause (Cont.)

  • SQL Includes a between comparison operator in order to simplify where clauses that specify that a value be less than or equal to some value and greater than or equal to some other value.

  • Q5:Find the loan number of those loans with loan amounts between $900 and $1000 (that is, $900 and $1000)

    select loan-numberfrom loanwhere amountbetween 900 and 1000

The from clause
The from Clause

  • The from clause corresponds to the Cartesian product operation of the relational algebra. It lists the relations to be scanned in the evaluation of the expression.

  • Q6:Find the Cartesian product borrower x loan

    select from borrower, loan

  • Q7:Find the name and loan number of all customers having a loan at the Perryridge branch.

    select distinct customer-name, borrower, = andbranch-name = “Perryridge”

The rename operation
The Rename Operation

  • The SQL mechanism for renaming attributes is accomplished through the as clause:old-name as new-name

  • Q8: Find the name and loan number of all customers having a loan at the Perryridge branch; replace the column name loan-number with the name loan-id.

    select distinct customer-name, as loan-idfrom borrower, loan

    where = and branch-name = “Perryridge”

Tuple variables
Tuple Variables

  • Tuple variables are defined in the from clause via the use of the as clause.

  • Q9:Find the customer names, their loan numbers and amount for all customers having a loan at some branch.

    select distinct customer-name,, S.amountfrom borrower as T, loan as Swhere =

  • Q10:Find the names of all branches that have greater assets than some branch located in distinct T.branch-namefrom branch as T, branch as SWhere T.assets > S.assets and S.branch-city = “Brooklyn”

String operations
String Operations

  • SQL includes a string-matching operator for comparisons on character strings. Patterns are described using two special characters:

    • percent (%). The % character matches any substring.

    • underscore (_). The _ character matches any character.

  • Find the names of all customers whose street includes the substring “Main”.

    select customer-namefrom customerwherecustomer-street like “%Main%”

  • Match the name “Main%”

    like “Main\%” escape “\”

  • Note in ACCESS, “%” is replaced by “*”, see query X1.

Ordering the display of tuples
Ordering the Display of Tuples

  • Q11:List in alphabetic order the names of all customers having a loan in Perryridge branch

    select distinct customer-namefrom borrower, loanwhere = and branch-name = “Perryridge”order by customer-name

  • We may specify desc for descending order or asc for ascending order, for each attribute; ascending order is the default.

  • SQL must perform a sort to fulfil an order by request. Since sorting a large number of tuples may be costly. It is desirable to sort only when necessary.


  • In relations with duplicates, SQL can define how many copies of tuples appear in the result.

  • Multiset versions of some of the relational algebra operators – given multiset relations r1 and r2:

    1. If there are c1 copies of tuple t1 in r1, and t1 satisfies selection , then there are c1 copies of t1 in  (r1).

    2. For each copy of tuple t1in r1, there is a copy of tupleA(t1) in A(r1) where A(t1) denotes the projection of the single tuple t1.

    3. If there are c1 copies of tuple t1in r1 and c2 copies of tuple t2 in r2, there are c1 x c2 copies of the tuple t1. t2 in r1 x r2

Duplicates cont
Duplicates (Cont.)

  • Suppose relations r1 with schema (A, B) and r2 with schema (C) are the following multisets:

    r1 = {(1, a) (2,a)} r2 = {(2), (3), (3)}

  • Then B(r1) would be {(a), (a)}, while B(r1) x r2 would be

    {(a,2), (a,2), (a,3), (a,3), (a,3), (a,3)}

  • SQL duplicate semantics:

    select A1,, A2, ..., Anfrom r1, r1, ..., rmwhere P

    is equivalent to the multiset version of the expression:

     A1,, A2, ..., An(P(r1 x r2 x ... x rm))

Set operations
Set Operations

  • The set operations union, intersect, and except operate on relations and correspond to the relational algebra operations 

  • Each of the above operations automatically eliminates duplicates; to retain all duplicates use the corresponding multiset versions union all, intersect all and except all. Suppose a tuple occurs m times in r and n times in s, then, it occurs:

    • m + n times in r union all s

    • min (m, n) times in rintersect all s

Set operations1
Set Operations

  • Q12: Find all customers who have a loan, an account, or both:

    (selectcustomer-name from depositor)union (selectcustomer-name from borrower)

  • Find all customers who have both a loan and an account.

    (selectcustomer-name from depositor)intersect (selectcustomer-name from borrower)

  • Find all customers who have an account but no loan.

    selectcustomer-name from depositor)except (selectcustomer-name from borrower)

Aggregate functions1
Aggregate Functions

  • These functions operate on the multiset of values of a column of a relation, and return a value

    avg: average valuemin: minimum valuemax: maximum valuesum: sum of valuescount: number of values

Aggregate functions cont
Aggregate Functions (Cont.)

  • Q13:Find the average account balance at the Brighton branch.

    select avg (balance)from accountwhere branch-name = “Brighton”

  • Q14:Find the number of tuples in the customer relation.

    select count (*)from customer

  • Find the number of depositors in the bank.

    select count (distinct customer-name)

    from depositor

Aggregate functions group by
Aggregate Functions – Group By

  • Find the number of depositors for each branch.

    select branch-name, count (distinctcustomer-name)from depositor, accountwhere depositor.account-number = account.account-numbergroup by branch-name

    Note: Attributes in select clause outside of aggregate functions must appear in group by list

Aggregate functions having clause
Aggregate Functions – Having Clause

  • Q15: Find the names of all branches where the average account balance is more than $400.

    select branch-name, avg (balance)from accountgroup by branch-namehaving avg (balance) > 400

    Note: predicates in the having clause are applied after the formation of groups.

Null values
Null Values

  • It is possible for tuples to have a null value, denoted by null, for some of their attributes, null signifies an unknown value or that a value does not exist.

  • The result of any arithmetic expression involving null is null.

  • Roughly speaking, all comparisons involving null return false. More precisely.

    • Any comparison with null returns unknown

    • (unknown or unknown) = unknown,(true and unknown) = unknown, (false and unknown) = false, (unknown and unknown) = unknown

    • Result of where clause predicate is treated as false if it evaluates to unknown

    • “P is unknown” evaluates to true if predicate P evaluates to unknown

Null values cont
Null Values (Cont.)

  • Find all loan number which appear in the loan relation with nulll values for amount.

    select loan-numberfrom loanwhere amount is null

  • Total all loan amounts

    select sum (amount)from loan

    Above statement ignores null amounts; result is null if there is no non-null amount.

  • All aggregate operations except count(*) ignore tuples with null values on the aggregated attributes.

Nested subqueries
Nested Subqueries

  • SQL provides a mechanism for the nesting of subqueries.

  • A subquery is a select-from-where expression that is nested within another query.

  • A common use of subqueries is to perform tests for set membership, set comparisons, and set cardinality.

Set membership










Set Membership

  • F in r  t  r (t = F)

(5 in

) = true

(5 in

) = false

(5 notin

) = true

Example query
Example Query

  • Q16:Find all customers who have both an account and a loan at the bank.

    select distinct customer-namefrom borrowerwhere customer-name in (select customer-namefromdepositor)

  • Q17:Find all customers who have a loan at the bank but do not have an account at the bank

    select distinct customer-namefrom borrowerwhere customer-name not in (select customer-namefrom depositor)

Example query1
Example Query

  • Find all customers who have both an account and a loan at the Perryridge branch.

    select distinctcustomer-namefrom borrower, loanwhere = andbranch-name = “Perryridge” and(branch-name, customer-name) in (select branch-name, customer-namefrom depositor, accountwhere depositor.account-number = account.account-number)

The some clause




The Some Clause

  • F <comp> some r t(t r [F <comp> t])Where <comp> can be: 

(5< some

) = true

(read: 5 < some tuple in the relation)


) = false

(5< some



) = true

(5 = some



(5 some

) = true (since 0  5)


(= some)  in

However, ( some)  not in

Example query2
Example Query

  • Q18:Find all branches that have greater assets than some branch located in Brooklyn.

    select branch-namefrom branchwhere assets >some (select assetsfrom branchwhere branch-city = “Brooklyn”)

The all clause




The All Clause

  • F <comp> all r t(t r [F <comp>t])

(5< all

) = false


) = true

(5< all



) = false

(5 = all



(5 all

) = true (since 5  4 and 5  6)


(all)  not in

However, (= all)  in

Example query3
Example Query

  • Q19:Find the names of all branches that have greater assets than all branches located in Horsenneck.

    select branch-namefrom branchwhere assets > all (select assetsfrom branchwhere branch-city = “Horsenneck”)

Test for empty relations
Test for Empty Relations

  • The exists construct returns the value true if the argument subquery is nonempty.

  • exists r  r  Ø

  • not exists r  r = Ø

Example query4
Example Query

  • Find all customers who have an account at all branches located in Brooklyn.

    select distinct S.customer-namefrom depositor as Swhere not exists ( (select branch-namefrom branchwhere branch-city = “Brooklyn”)except (select R.branch-namefrom depositor as T, account as Rwhere T.account-number = R.account-number andS.customer-name=T.customer-name))

  • Note that X – Y = Ø  X Y

Test for absence of duplicate tuples
Test for Absence of Duplicate Tuples

  • The unique construct tests whether a subquery has any duplicate tuples in its result.

  • Q20:Find all customers who have only one account at the Perryridge branch.

    select T.customer-name

    from depositor as T

    where unique (

    select R.customer-namefrom account, depositor as Rwhere T.customer-name = R.customer-name andR.account-number = account.account-number

    andaccount.branch-name = “Perryridge”)

    Note: in ACCESS, it does not work when the subquery return more than one tuple.

Example query5
Example Query

  • Find all customers who have at least two accounts at the Perryridge branch.

    select distinct T.customer-name

    from depositor T

    where not unique (

    select R.customer-name

    from account, depositor as R

    where T.customer-name = R.customer-name and

    R.account-number - account.account-number and

    account.branch-name = “Perryridge”)

Derived relations
Derived Relations

  • Find the average account balance of those branches where the average account balance is greater than $1200.

    select branch-name, avg-balancefrom (select branch-name, avg (balance)from accountgroup by branch-name)as result (branch-name, avg-balance)where avg-balance > 1200

    Note that we do not need to use the having clause, since we compute in the from clause the temporary relation result, and the attributes of result can be used directly in the where clause.


  • Provide a mechanism to hide certain data from the view of certain users. To create a view we use the command:

    create view v as<query expression>


    • <query expression> is any legal expression

    • The view name is represented by v

Example queries
Example Queries

  • A view consisting of branches and their customers

    create view all-customer as(select branch-name, customer-namefrom depositor, accountwhere depositor.account-number = account.account-number)

    union(select branch-name, customer-namefrom borrower, loanwhere =

  • Find all customers of the Perryridge branch

    select customer-namefrom all-customerwhere branch-name = “Perryridge”

Modification of the database deletion
Modification of the Database – Deletion

  • Delete all account records at the Perryridge branch

    delete fromdepositorwhere account-number in (select account-numberfrom accountwhere branch-name = “Perryridge”)

    delete from accountwhere branch-name = “Perryridge”

Example deletion
Example Deletion

  • Delete all accounts at every branch located in Needham.

    delete fromdepositorwhere account-number in (select account-numberfrom branch, accountwhere branch-city = “Needham”and branch.branch-name = account.branch-name)

    delete from accountwhere branch-name in (select branch-namefrom branchwhere branch-city = “Needham”)

Example deletion cont
Example Deletion (Cont.)

  • Delete the record of all accounts with balances below the average at the bank.

    delete from accountwhere balance < (select avg (balance)from account)

    • Problem: as we delete tuples from account, the average balance changes

    • Solutions used in SQL:

      1. First, compute avg balance and find all tuples to delete

      2. Next, delete all tuples found above (without recomputing avg or retesting the tuples)

Modification of the database insertion
Modification of the Database – Insertion

  • Add a new tuple to account

    insert into accountvalues (“Perryridge”, “A-9732”, 1200)or equivalentlyinsert into account (branch-name, balance, account-number)values (“Perryridge”, 1200, “A-9732”)

  • Add a new tuple to account with balance set to null

    insert into accountvalues (“Perryridge”, “A-777”, null)

Modification of the database insertion1
Modification of the Database – Insertion

  • Provide as a gift for all loan customers of the Perryridge branch, a $200 savings account. Let the loan number serve as the account number for the new savings account

    insert into accountselect branch-name, loan-number, 200from loanwhere branch-name = “Perryridge”insert into depositorselect customer-name, loan-numberfrom loan, borrowerwhere branch-name = “Perryridge”and loan.account-number = borrower.account-number

Modification of the database updates
Modification of the Database – Updates

  • Increase all accounts with balances over $10,000 by 6%, all other accounts receive 5%.

    • Write two update statements:

      update accountset balance = balance  1.06where balance > 10000

      update accountset balance = balance  1.05where balance  10000

    • The order is important

Update of a view
Update of a View

  • Create a view of all loan data in loan relation, hiding the amount attribute

    create view branch-loan as select branch-name, loan-numberfrom loan

  • Add a new tuple to branch-loan

    insert into branch-loanvalues (“Perryridge”,” L-307”)

    This insertion must be represented by the insertion of the tuple

    (“Perryridge”, “L-307”, null)

    into the loan relation

  • Updates on more complex views are difficult or impossible to translate, and hence are disallowed.

Joined relations
Joined Relations

  • Join operations take two relations and return as a result another relation.

  • These additional operations are typically used as subquery expressions in the from clause.

  • Join condition – defines which tuples in the two relations match, and what attributes are present in the result of the join.

  • Join type – defines how tuples in each relation that do not match any tuple in the other relation (based on the join condition) are treated.

Join Types

Join Conditions

inner join

left outer join

right outer join

full outer join


on <predicate>

using (A1, A2, ..., An)

Joined relations datasets for examples
Joined Relations – Datasets for Examples

  • Relation loan













Relation borrower









Joined relations examples
Joined Relations – Examples

  • Q21:loan inner join borrower =
















  • Q22: loan left outer join borrower =





















Joined relations examples1
Joined Relations – Examples

  • loan natural inner joinborrower













  • loan natural right outer join borrower

















Joined relations examples2
Joined Relations – Examples

  • loan natural full outer join borrower using (loan-number)





















  • Find all customers who have either an account or a loan (but not both) at the bank.

    select customer-namefrom (depositor natural full outer join borrower)where account-number is null or loan-number is null

Data definition language ddl1
Data Definition Language (DDL)

  • The schema for each relation.

  • The domain of values associated with each attribute.

  • Integrity constraints

  • The set of indices to be maintained for each relations.

  • Security and authorization information for each relation.

  • The physical storage structure of each relation on disk.

Allows the specification of not only a set of relations but also information about each relation, including:

Domain types in sql
Domain Types in SQL

  • char(n). Fixed length character string, with user-specified length n.

  • varchar(n). Variable length character strings, with user-specified maximum length n.

  • integer. Integer (a finite subset of the integers that is machine-dependent).

  • smallint. Small integer (a machine-dependent subset of the integer domain type).

  • numeric(p,d). Fixed point number, with user-specified precision of p digits, with d digits to the right of decimal point.

Domain types in sql cont
Domain Types in SQL (Cont.)

  • real, double precision. Floating point and double-precision floating point numbers, with machine-dependent precision.

  • float(n). Floating point number, with user-specified precision of at least n digits.

  • date. Dates, containing a (4 digit) year, month and date.

  • time. Time of day, in hours, minutes and seconds.

    • Null values are allowed in all the domain types. Declaring an attribute to be not null prohibits null values for that attribute.

    • create domain construct in SQL-92 creates user-defined domain types

      create domain person-name char(20) not null

Create table construct
Create Table Construct

  • An SQL relation is defined using the create table command:

    create table r (A1D1, A2D2, ..., An Dn,(integrity-constraint1), ..., (integrity-constraintk))

    • r is the name of the relation

    • each Ai is an attribute name in the schema of relation r

    • Di is the data type of values in the domain of attribute Ai

  • Example:

    create table branch (branch-name char(15) not null,branch-city char(30),assets integer)

Integrity constraints in create table
Integrity Constraints in Create Table

  • not null

  • primary key (A1, ..., An)

  • check (P), where P is a predicate

Example: Declare branch-name as the primary key for branch and ensure that the values of assets are non-negative.

create table branch (branch-name char(15) not null,branch-city char(30),assets integer,primary key (branch-name),check(assets >=0))

primary key declaration on an attribute automatically ensures not null in SQL-92

Drop and alter table constructs
Drop and Alter Table Constructs

  • The drop table command deletes all information about the dropped relation from the database.

  • The alter table command is used to add attributes to an existing relation. All tuples in the relation are assigned null as the value for the new attribute. The form of the alter table command is

    alter table r add A D

    where A is the name of the attribute to be added to relation r and D is the domain of A.

  • The alter table command can also be used to drop attributes of a relationalter table r drop Awhen A is the name of an attribute of relation r.

Embedded sql
Embedded SQL

  • The SQL standard defines embeddings of SQL in a variety of programming languages such as Pascal, PL/I, Fortran, C, and Cobol.

  • A language to which SQL queries are embedded is referred to as a host language, and the SQL structures permitted in the host language comprise embedded SQL.

  • The basic form of these languages follows that of the system R embedding of SQL into PL/I.

  • EXEC SQL statement is used to identify embedded SQL request to the preprocessor

    EXEC SQL <embedded SQL statement > END-EXEC

Example query6
Example Query

  • Specify the query in SQL and declare a cursor for it


    declare c cursor for select customer-name, account-numberfrom depositor, accountwhere depositor account-number = account.account-numberand account.balance > :amount


From within a host language, find the names and account numbers of customers with more than the variable amount dollars in some account.

Embedded sql cont
Embedded SQL (Cont.)

  • The open statement causes the query to be evaluated


  • The fetch statement causes the values of one tuple in the query result to be placed on host language variables.

    EXEC SQL fetch c into :cn , :an END-EXECRepeated calls to fetch get successive tuples in the query result; a variable in the SQL communication area indicates when end-of-file is reached.

  • The close statement causes the database system to delete the temporary relation that holds the result of the query.

    EXEC SQL closec END-EXEC

Dynamic sql
Dynamic SQL

  • Allows programs to construct and submit SQL queries at run time.

  • Example of the use of dynamic SQL from within a C program.char * sqlprog = “ update account set balance = balance * 1.05where account-number = ? ”EXEC SQL prepare dynprog from :sqlprog;char account [10] = “A-101”;EXEC SQL execute dynprog using :account;

  • The dynamic SQL program contains a ?, which is a place holder for a value that is provided when the SQL program is executed.

Other sql features
Other SQL Features

  • Fourth-generation languages – special language to assist application programmers in creating templates on the screen for a user interface, and in formatting data for report generation, available in most commercial database products

  • SQL sessions – provide the abstraction of a client and a server (possibly remote)

    • client connects to an SQL server, establishing a session

    • executes a series of statements

    • disconnects the session

    • can commit or rollback the work carried out in the session

  • An SQL environment contains several components, including a user identifier, and a schema, which identifies which of several schemas a session is using.

  • the instructions appear in each individual transaction.

  • serializable schedules to be generated, while others allow view-serializable schedules that are not conflict-serializable.

Topic 5 integrity constraints
Topic 5: Integrity Constraints user identifier, and a

  • Domain Constraints

  • Referential Integrity

  • Assertions

  • Triggers

  • Functional Dependencies

Domain constraints
Domain Constraints user identifier, and a

  • Integrity constraints guard against accidental damage to the database, by ensuring that authorized changes to the database do not result in a loss of data consistency.

  • Domain constraints are the most elementary form of integrity constraint.

  • They test values inserted in the database, and test queries to ensure that the comparisons make sense.

Domain constraints cont
Domain Constraints (Cont.) user identifier, and a

  • The check clause in SQL-92 permits domains to be restricted:

    • Use check clause to ensure that an hourly-wage domain allows only values greater than a specified value.

      create domain hourly-wage numeric(5,2) constraintvalue-test check(value> = 4.00)

    • The domain hourly-wage is declared to be a decimal number with 5 digits, 2 of which are after the decimal point

    • The domain has a constraint that ensures that the hourly-wage is greater than or equal to 4.00

    • The clause constraintvalue-test is optional; useful to indicate which constraint an update violated.

Referential integrity
Referential Integrity user identifier, and a

  • Ensures that a value that appears in one relation for a given set of attributes also appears for a certain set of attributes in another relation.

    • Example: If “Perryridge” is a branch name appearing in one of the tuples in the account relation, then there exists a tuple in the branch relation for branch “Perryridge”.

  • Formal Definition

    • Let r1(R1) and r2(R2) be relations with primary keys K1 and K2 respectively.

    • The subset  of R2 is a foreign key referencing K1in relation r1, if for every t2 in r2 there must be a tuple t1 in r1 such that t1[K1] = t2[].

    • Referential integrity constraint:  (r2)  K1 (r1)

Referential integrity in the e r model
referential Integrity in the E-R Model user identifier, and a

  • Consider relationship set R between entity sets E1 and E2. The relational schema for R includes the primary keys K1 of E1 and K2 of E2.Then K1 and K2 form foreign keys on the relational schemas for E1and E2 respectively.

  • Weak entity sets are also a source of referential integrity constraints. For the relation schema for a weak entity set must include the primary key of the entity set on which it depends.

Database modification
Database Modification user identifier, and a

  • The following tests must be made in order to preserve the following referential integrity constraint:

     (r2)  K1 (r1)

  • Insert. If a tuple t2 is inserted into r2, the system must ensure that there is a tuple t1 in r1 such that t1[K1] = t2[]. That is

    t2 []  K1 (r1)

  • Delete. If a tuple, t1 is deleted from r1, the system must compute the set of tuples in r2 that reference t1:

     = t1[K1] (r2)

    If this set is not empty, either the delete command is rejected as an error, or the tuples that reference t1 must themselves be deleted (cascading deletions are possible).

Database modification cont
Database Modification (Cont.) user identifier, and a

  • Update. There are two cases:

    • If a tuple t2 is updated in relation r2and the update modifies values for foreign key , then a test similar to the insert case is made. Let t2’ denote the new value of tuple t2. The system must ensure that

      t2’[]  K1(r1)

    • If a tuple t1 is updated in r1, and the update modifies values for the primary key (K1), then a test similar to the delete case is made. The system must compute

       = t1[K1] (r2)

      using the old value of t1 (the value before the update is applied). If this set is not empty, the update may be rejected as an error, or the update may be cascaded to the tuples in the set, or the tuples in the set may be deleted.

Referential integrity in sql
Referential Integrity in SQL user identifier, and a

  • Primary and candidate keys and foreign keys can be specified as part of the SQL create table statement:

    • The primary key clause of the create table statement includes a list of the attributes that comprise the primary key.

    • The unique key clause of the create table statement includes a list of the attributes that comprise a candidate key.

    • The foreign key clause of the create table statement includes both a list of the attributes that comprise the foreign key and the name of the relation referenced by the foreign key.

Referential integrity in sql example
Referential Integrity in SQL – Example user identifier, and a

create table customer(customer-name char(20) not null,customer-street char(30),customer-city char(30),primary key (customer-name))

create table branch(branch-name char(15) not null,branch-city char(30),assets integer,primary key(branch-name))

Referential integrity in sql example cont
Referential Integrity in SQL – Example (Cont.) user identifier, and a

create table account(branch-name char(15),account-number char(10) not null,balance integer,primary key (account-number), foreign key (branch-name) references branch)

create table depositor(customer-name char(20) not null,account-number char(10) not null,primary key (customer-name, account-number),foreign key (account-number) references account,foreign key (customer-name) references customer)

Cascading actions in sql
Cascading Actions in SQL user identifier, and a

create table account

. . .foreign key(branch-name) references branchon delete cascade on update cascade.. . . )

  • Due to the on delete cascade clauses, if a delete of a tuple in branchresults in referential-integrity constraint violation, the delete “cascades” to the account relation, deleting the tuple that refers to the branch that was deleted.

  • Cascading updates are similar.

Cascading actions in sql cont
Cascading Actions in SQL (Cont.) user identifier, and a

  • If there is a chain of foreign-key dependencies across multiple relations, with on delete cascade specified for each dependency, a deletion or update at one end of the chain can propagate across the entire chain.

  • If a cascading update to delete causes a constraint violation that cannot be handled by a further cascading operation, the system aborts the transaction. As a result, all the changes caused by the transaction and its cascading actions are undone.

Assertions user identifier, and a

  • An assertion is a predicate expressing a condition that we wish the database always to satisfy.

  • An assertion in SQL-92 takes the form

    create assertion <assertion-name> check <predicate>

  • When an assertion is made, the system tests it for validity. This testing may introduce a significant amount of overhead; hence assertions should be used with great care.

Assertion example
Assertion Example user identifier, and a

  • The sum of all loan amounts for each branch must be less than the sum of all account balances at the branch.

    create assertion sum-constraint check(not exists (select * from branchwhere (select sum(amount) from loanwhere loan.branch-name = branch.branch-name) >=(select sum(balance) from accountwhere account.branch-name = branch.branch-name)))

Assertion example1
Assertion Example user identifier, and a

  • Every loan has at least one customer who maintains an account with a minimum balance of $1000.

    create assertion balance-constraint check(not exists (select * from loanwhere not exists ( select * from borrower, depositor, accountwhere = borrower.customer-name = depositor.customer-nameand depositor.account-number = account.account-numberand account.balance >=1000)))

    Note if a loan has no customer who maintains an account with a minimum balance of $1000, the sub-query will evaluate to .

Assertion example2
Assertion Example user identifier, and a

  • All borrower of every loan should maintain an account with a minimum balance of $1000.

    create assertion balance-constraint check(not exists (select * from loanwhere exists ( select * from borrower, depositor, accountwhere = borrower.customer-name = depositor.customer-nameand depositor.account-number = account.account-numberand account.balance <1000))

    and not exists ( select * from loan

    where not exists ( select *

    from borrower, depositor

    whereborrower.customer-name = depositor.customer-name

    and =

Triggers user identifier, and a

  • A trigger is a statement that is executed automatically by the system as a side effect of a modification to the database.

  • To design a trigger mechanism, we must:

    • Specify the conditions under which the trigger is to be executed.

    • Specify the actions to be taken when the trigger executes.

  • The SQL-92 standard does not include trigger, but many implementations support triggers.

Trigger example
Trigger Example user identifier, and a

  • Suppose that instead of allowing negative account balances, the bank deals with overdrafts by

    • setting the account balance to zero

    • creating a loan in the amount of the overdraft

    • giving this loan a loan number identical to the account number of the overdrawn account

  • The condition for executing the trigger is an update to the account relation that results in a negative balance value.

Trigger example cont
Trigger Example (Cont.) user identifier, and a

define trigger overdrafton update of account T(if new T.balance < 0then (insert into loan values(T.branch-name, T.account-number, –new T.balance)insert into borrower(select customer-name, account-numberfrom depositorwhere T.account-number = depositor account-number)update account Sset S.balance = 0where S.account-number = T.Account-number))

The keyword new used before T.balance indicates that the value of T.balance after the update should be used; if it is omitted, the value before the update is used.

Functional dependencies
Functional Dependencies user identifier, and a

  • Constraints on the set of legal relations.

  • Require that the value for a certain set of attributes determines uniquely the value for another set of attributes.

  • A functional dependency is a generalization of the notion of a key.

Functional dependencies cont
Functional Dependencies (Cont.) user identifier, and a

  • Let R be a relation schema

      R   R

  • The functional dependency

     holds on R if and only if for any legal relations r(R), whenever any two tuples t1and t2 of r agree on the attributes , they also agree on the attributes . That is,

    t1[] = t2 []  t1[] = t2 []

  • K is a superkey for relation schema R if and only if K R

  • K is a candidate key for R if and only if

    • K R, and

    • There is no   K,  R

Functional dependencies cont1
Functional Dependencies (Cont.) user identifier, and a

  • Functional dependencies allow us to express constraints that cannot be expressed using superkeys. Consider the schema:

    Loan-info-schema = (branch-name, loan-number, customer-name, amount).

    We expect this set of functional dependencies to hold:

    loan-numberamount loan-number  branch-name

    but would not expect the following to hold:

    loan-number customer-name

Use of functional dependencies
Use of Functional Dependencies user identifier, and a

  • We use functional dependencies to:

    • test relations to see if they are legal under a given set of functional dependencies. If a relation r is legal under a set F of functional dependencies, we say that r satisfies F.

    • specify constraints on the set of legal relations; we say that F holds on R if all legal relations on R satisfy the set of functional dependencies F.

  • Note: A specific instance of a relation schema may satisfy a functional dependency even if the functional dependency does not hold on all legal instances. For example, a specific instance of Loan-schema may, by chance, satisfy loan-number  customer-name.

Topic 6 transactions
Topic 6: Transactions user identifier, and a

  • Transaction Concept

  • Transaction State

  • Implementation of Atomicity and Durability

  • Concurrent Executions

  • Serializability

  • Recoverability

  • Implementation of Isolation

  • Transaction Definition in SQL

  • Testing for Serializability.

Transaction concept
Transaction Concept user identifier, and a

  • A transaction is a unit of program execution that accesses and possibly updates various data items.

  • A transaction must see a consistent database.

  • During transaction execution the database may be inconsistent.

  • When the transaction is committed, the database must be consistent.

  • Two main issues to deal with:

    • Failures of various kinds, such as hardware failures and system crashes

    • Concurrent execution of multiple transactions

Acid properties
ACID Properties user identifier, and a

To preserve integrity of data, the database system must ensure:

  • Atomicity. Either all operations of the transaction are properly reflected in the database or none are.

  • Consistency. Execution of a transaction in isolation preserves the consistency of the database.

  • Isolation. Although multiple transactions may execute concurrently, each transaction must be unaware of other concurrently executing transactions. Intermediate transaction results must be hidden from other concurrently executed transactions. That is, for every pair of transactions Tiand Tj, it appears to Tithat either Tjfinished execution before Ti started, or Tj started execution after Ti finished.

  • Durability. After a transaction completes successfully, the changes it has made to the database persist, even if there are system failures.

Example of fund transfer
Example of Fund Transfer user identifier, and a

  • Transaction to transfer $50 from account A to account B:

    1. read(A)

    2. A := A – 50

    3. write(A)

    4. read(B)

    5. B := B + 50

    6. write(B)

  • Consistency requirement — the sum of A and B is unchanged by the execution of the transaction.

  • Atomicity requirement — if the transaction fails after step 3 and before step 6, the system should ensure that its updates are not reflected in the database, otherwise an inconsistency will result.

Example of fund transfer cont
Example of Fund Transfer (Cont.) user identifier, and a

  • Durability requirement — once the user has been notified that the transaction has completed (i.e., the transfer of the $50 has taken place), the updates to the database by the transaction must persist despite failures.

  • Isolation requirement — if between steps 3 and 6, another transaction is allowed to access the partially updated database, it will see an inconsistent database (the sum A + B will be less than it should be).Can be ensured trivially by running transactions serially, that is one after the other. However, executing multiple transactions concurrently has significant benefits, as we will see.

Transaction state
Transaction State user identifier, and a

  • Active, the initial state; the transaction stays in this state while it is executing

  • Partially committed, after the final statement has been executed.

  • Failed, after the discovery that normal execution can no longer proceed.

  • Aborted, after the transaction has been rolled back and the database restored to its state prior to the start of the transaction.

  • Committed, after successful completion.

Transaction state cont
Transaction State (Cont.) user identifier, and a

Implementation of atomicity and durability
Implementation of Atomicity and Durability user identifier, and a

  • The recovery-management component of a database system implements the support for atomicity and durability.

Concurrent executions
Concurrent Executions user identifier, and a

  • Multiple transactions are allowed to run concurrently in the system. Advantages are:

    • increased processor and disk utilization, leading to better transaction throughput: one transaction can be using the CPU while another is reading from or writing to the disk

    • reduced average response time for transactions: short transactions need not wait behind long ones.

  • Concurrency control schemes – mechanisms to control the interaction among the concurrent transactions in order to prevent them from destroying the consistency of the database

Schedules user identifier, and a

  • Schedules – sequences that indicate the chronological order in which instructions of concurrent transactions are executed

    • a schedule for a set of transactions must consist of all instructions of those transactions

    • must preserve the order in which the instructions appear in each individual transaction.

Example schedules
Example Schedules user identifier, and a

  • Let T1 transfer $50 from A to B, and T2 transfer 10% of the balance from A to B. The following is a serial schedule (Schedule 1), in which T1 is followed by T2.

    T1T2read(A)A := A – 50write(A)read(B)B := B +50write(B)read(A)temp := A*0.1;A := A – tempwrite(A)read(B)B := B + tempwrite(B)

  • Let user identifier, and a T1 and T2 be the transactions defined previously. The following schedule (Schedule 3) is not a serial schedule, but it is equivalent to Schedule 1.

    T1T2read(A)A := A – 50write(A)read (A)temp := A*0.1A = A – tempwrite(A)

    read(B)B := B +50write(B)read(B)B := B + tempwrite(B)

In both Schedule 1 and 3, the sum A + B is preserved.

Example schedules cont
Example Schedules (Cont.) user identifier, and a

  • The following concurrent schedule (Schedule 4) does not preserve the value of the sum A + B.

    T1T2read(A) A := A – 50 read (A)temp := A*0.1A = A – tempwrite(A)read(B)write(A)read(B)B := B +50write(B)

    B := B + tempwrite(B)

A=90 A=40

temp=4 A=36





Serializability user identifier, and a

  • Basic Assumption – Each transaction preserves database consistency.

  • Thus serial execution of a set of transactions preserves database consistency.

  • A (possibly concurrent) schedule is serializable if it is equivalent to a serial schedule. Different forms of schedule equivalence give rise to the notions of:

    1. conflict serializability

    2. view serializability

  • We ignore operations other than read and write instructions, and we assume that transactions may perform arbitrary computations on data in local buffers in between reads and writes. Our simplified schedules consist of only read and write instructions.

Conflict serializability
Conflict Serializability user identifier, and a

  • Instructions li and lj of transactions Ti and Tj respectively, conflict if and only if there exists some item Q accessed by both li and lj, and at least one of these instructions wrote Q.

    1. li = read(Q), lj = read(Q). li and ljdon’t conflict.2. li = read(Q), lj = write(Q). They conflict.3. li = write(Q), lj = read(Q). They conflict4. li = write(Q), lj = write(Q). They conflict

  • Intuitively, a conflict between liand lj forces a (logical) temporal order between them. If li and lj are consecutive in a schedule and they do not conflict, their results would remain the same even if they had been interchanged in the schedule.

Conflict serializability cont
Conflict Serializability (Cont.) user identifier, and a

  • If a schedule S can be transformed into a schedule S´ by a series of swaps of non-conflicting instructions, we say that S and S´ are conflict equivalent.

  • We say that a schedule S is conflict serializable if it is conflict equivalent to a serial schedule

  • Example of a schedule that is not conflict serializable:

    T3T4read(Q)write(Q)write(Q)We are unable to swap instructions in the above schedule to obtain either the serial schedule < T3, T4 >, or the serial schedule < T4, T3 >.

Conflict serializability cont1
Conflict Serializability (Cont.) user identifier, and a

  • Schedule 3 below can be transformed into Schedule 1, a serial schedule where T2 follows T1, by series of swaps of non-conflicting instructions. Therefore Schedule 3 is conflict serializable.

    T1T2read(A)write(A)read (A)write(A)read (B)write(B)read (B)write(B)

View serializability
View Serializability* user identifier, and a

  • Let S and S´ be two schedules with the same set of transactions. S and S´ are view equivalent if the following three conditions are met:

    1. For each data item Q, if transaction Tireads the initial value of Q in schedule S, then transaction Ti must, in schedule S´, also read the initial value of Q.

    2. For each data item Q if transaction Tiexecutes read(Q) in schedule S, and that value was produced by transaction Tj (if any), then transaction Ti must in schedule S´ also read the value of Q that was produced by transaction Tj.

    3. For each data item Q, the transaction (if any) that performs the final write(Q) operation in schedule S must perform the finalwrite(Q) operation in schedule S´.

    As can be seen, view equivalence is also based purely on reads and writes alone.

View serializability cont
View Serializability *(Cont.) user identifier, and a

  • A schedule S is view serializable if it is view equivalent to a serial schedule.

  • Every conflict serializable schedule is also view serializable.

  • Schedule 9 — a schedule which is view-serializable but not conflict serializable.

    T3 T4T6read(Q) write(Q)write(Q)write (Q)

  • Every view serializable schedule which is not conflict serializable has bind writes ( no read before write).

Recoverability user identifier, and a

Need to address the effect of transaction failures on concurrently running transactions.

  • Recoverable schedule — if a transaction Tj reads a data items previously written by a transaction Ti , the commit operation of Ti appears before the commit operation of Tj.

  • The following schedule (Schedule 11) is not recoverable if T9commits immediately after the readT8T9read(A)write(A)read(A)read(B)If T8should abort, T9 would have read (and possibly shown to the user) an inconsistent database state. Hence database must ensure that schedules are recoverable.

Recoverability cont
Recoverability (Cont.) user identifier, and a

  • Cascading rollback – a single transaction failure leads to a series of transaction rollbacks. Consider the following schedule where none of the transactions has yet committed (so the schedule is recoverable)T10T11T12read(A)read(B)write(A)read(A)write(A)read(A) If T10 fails, T11 and T12 must also be rolled back.

  • Can lead to the undoing of a significant amount of work

Recoverability cont1
Recoverability (Cont.) user identifier, and a

  • Cascadeless schedules — cascading rollbacks cannot occur; for each pair of transactions Tiand Tj such that Tj reads a data item previously written by Ti, the commit operation of Ti appears before the read operation of Tj.

  • Every cascadeless schedule is also recoverable

  • It is desirable to restrict the schedules to those that are cascadeless

Implementation of isolation
Implementation of Isolation user identifier, and a

  • Schedules must be conflict or view serializable, and recoverable, for the sake of database consistency, and preferably cascadeless.

  • A policy in which only one transaction can execute at a time generates serial schedules, but provides a poor degree of concurrency.

  • Concurrency-control schemes tradeoff between the amount of concurrency they allow and the amount of overhead that they incur.

  • Some schemes allow only conflict-serializable schedules to be generated, while others allow view-serializable schedules that are not conflict-serializable.

Transaction definition in sql
Transaction Definition in SQL user identifier, and a

  • Data manipulation language must include a construct for specifying the set of actions that comprise a transaction.

  • In SQL, a transaction begins implicitly.

  • A transaction in SQL ends by:

    • Commit work commits current transaction and begins a new one.

    • Rollback work causes current transaction to abort.

  • Levels of consistency specified by SQL-92:

    • Serializable — default

    • Repeatable read

    • Read committed

    • Read uncommitted

Levels of consistency in sql 92
Levels of Consistency in SQL-92 user identifier, and a

  • Serializable — default

  • Repeatable read —only committed records to be read, repeated reads of same record must return same value. However, a transaction may not be serializable – it may find some records inserted by a transaction but not find others that have not yet been commited.

  • Read committed —only committed records can be read, but successive reads of record may return different (but committed) values.

  • Read uncommitted —even uncommitted records may be read.

Lower degrees of consistency useful for gathering approximateinformation about the database, e.g., statistics for query optimizer.

Testing for serializability
Testing for Serializability user identifier, and a

  • Consider some schedule of a set of transactions T1, T2, ..., Tn

  • Precedence graph — a direct graph where the vertices are the transactions (names).

  • We draw an arc from Tito Tjif the two transaction conflict, and Tiaccessed the data item on which the conflict arose earlier.

  • We may label the arc by the item that was accessed.

  • Example 1



Example schedule schedule a
Example Schedule (Schedule A) user identifier, and a

T1 T2 T3 T4 T5 read(X)read(Y)read(Z) read(V) read(W) read(W) read(Y) write(Y) write(Z)read(U) read(Y) write(Y) read(Z) write(Z)


Precedence graph for schedule a
Precedence Graph for Schedule A user identifier, and a






Test for conflict serializability
Test for Conflict Serializability user identifier, and a

  • A schedule is conflict serializable if and only if its precedence graph is acyclic.

  • Cycle-detection algorithms exist which take order n2 time, where n is the number of vertices in the graph. (Better algorithms take order n + e where e is the number of edges.)

  • If precedence graph is acyclic, the serializability order can be obtained by a topological sorting of the graph. This is a linear order consistent with the partial order of the graph.For example, a serializability order for Schedule A would beT5T1T3T2T4.

Test for view serializability
Test for View Serializability user identifier, and a

  • The precedence graph test for conflict serializability must be modified to apply to a test for view serializability.

  • Construct a labeled precedence graph. Look for an acyclic graph which is derived from the labeled precedence graph. Schedule is view serializable if and only if such an acyclic graph can be found.

  • The problem of looking for such an acyclic graph falls in the class of NP-complete problems. Thus existence of an efficient algorithm is unlikely.

    However practical algorithms that just check some sufficient conditions for view serializability can still be used.

Concurrency control vs serializability tests
Concurrency Control vs. Serializability Tests user identifier, and a

  • Testing a schedule for serializability after it has executed is a little too late!

  • Goal – to develop concurrency control protocols that will assure serializability. They will generally not examine the precedence graph as it is being created; instead a protocol will impose a discipline that avoids nonseralizable schedules.Will study such protocols later.

  • Tests for serializability help understand why a concurrency control protocol is correct.

Topic 7 concurrency control

Lock-Based Protocols user identifier, and a

Multiple Granularity

Deadlock Handling

Insert and Delete Operations

Topic 7: Concurrency Control

Lock based protocols

A lock is a mechanism to control concurrent access to a data item

Data items can be locked in two modes :

1. exclusive (X) mode. Data item can be both read as well as

written. X-lock is requested using lock-X instruction.

2. shared (S) mode. Data item can only be read. S-lock is

requested using lock-S instruction.

Lock requests are made to concurrency-control manager. Transaction can proceed only after request is granted.

Lock-Based Protocols

Lock based protocols cont

Lock-compatibility matrix item

A transaction may be granted a lock on an item if the requested lock is compatible with locks already held on the item by other transactions

The matrix allows any number of transactions to hold shared locks on an item, but if any transaction holds an exclusive on the item no other transaction may hold any lock on the item.

Lock-Based Protocols (Cont.)









Lock based protocols cont1

If a lock cannot be granted, the requesting transaction is made to wait till all incompatible locks held by other transactions have been released. The lock is then granted.

Lock-Based Protocols (Cont.)

Lock based protocols cont2

Example of a transaction performing locking: made to wait till all incompatible locks held by other transactions have been released. The lock is then granted.

T2: lock-S(A);

read (A);



read (B);



Locking as above is not sufficient to guarantee serializability — if A and B get updated in-between the read of A and B, the displayed sum would be wrong.

A locking protocol is a set of rules followed by all transactions while requesting and releasing locks. Locking protocols restrict the set of possible schedules.

Lock-Based Protocols (Cont.)

Pitfalls of lock based protocols

Consider the partial schedule made to wait till all incompatible locks held by other transactions have been released. The lock is then granted.

Neither T3 nor T4 can make progress — executing lock-S(B) causes T4 to wait for T3 to release its lock on B, while executing lock-X(A) causes T3 to wait for T4 to release its lock on A.

Such a situation is called a deadlock. To handle a deadlock one of T3 or T4 must be rolled back and its locks released.

Pitfalls of Lock-Based Protocols





B:= B-50






Pitfalls of lock based protocols cont

The potential for deadlock exists in most locking protocols. Deadlocks are a necessary evil.

Starvation is also possible if concurrency control manager is badly designed. For example:

A transaction may be waiting for an X-lock on an item, while a sequence of other transactions request and are granted an S-lock on the same item.

The same transaction is repeatedly rolled back due to deadlocks.

Concurrency control manager can be designed to prevent starvation.

Pitfalls of Lock-Based Protocols (Cont.)

The two phase locking protocol

This is a protocol which ensures conflict-serializable schedules.

Phase 1: Growing Phase

transaction may obtain locks

transaction may not release locks

Phase 2: Shrinking Phase

transaction may release locks

transaction may not obtain locks

The protocol assures serializability. It can be proved that the transactions can be serialized in the order of their lock points (i.e. the point where a transaction acquired its final lock).

The Two-Phase Locking Protocol

The two phase locking protocol cont

Two-phase locking schedules.does not ensure freedom from deadlocks

Cascading roll-back is possible under two-phase locking. To avoid this, follow a modified protocol called strict two-phase locking. Here a transaction must hold all its exclusive locks till it commits/aborts.

Rigorous two-phase locking is even stricter: here all locks are held till commit/abort. In this protocol transactions can be serialized in the order in which they commit.

The Two-Phase Locking Protocol (Cont.)

The two phase locking protocol cont1

There can be conflict serializable schedules that cannot be obtained if two-phase locking is used.


The Two-Phase Locking Protocol (Cont.)




read (X)



write (Y)



write (X)


The two phase locking protocol cont2

However, in the absence of extra information (e.g., ordering of access to data), two-phase locking is needed for conflict serializability in the following sense:

Given a transaction Ti that does not follow two-phase locking, we can find a transaction Tj that uses two-phase locking, and a schedule for Ti and Tj that is not conflict serializable.

The Two-Phase Locking Protocol (Cont.)

Lock conversions

Two-phase locking with lock conversions: of access to data), two-phase locking is needed for conflict serializability in the following sense:

– First Phase:

can acquire a lock-S on item

can acquire a lock-X on item

can convert a lock-S to a lock-X (upgrade)

– Second Phase:

can release a lock-S

can release a lock-X

can convert a lock-X to a lock-S (downgrade)

This protocol assures serializability. But still relies on the programmer to insert the various locking instructions.

Lock Conversions

Automatic acquisition of locks

A transaction of access to data), two-phase locking is needed for conflict serializability in the following sense:Ti issues the standard read/write instruction, without explicit locking calls.

The operation read(D) is processed as:

if Ti has a lock on D





if necessary wait until no other

transaction has a lock-X on D

grant Ti a lock-S on D;



Automatic Acquisition of Locks

Automatic acquisition of locks cont

write of access to data), two-phase locking is needed for conflict serializability in the following sense:(D) is processed as:

if Ti has a lock-X on D





if necessary wait until no other trans. has any lock on D,

if Ti has a lock-S on D


upgrade lock on D to lock-X


grant Ti a lock-X on D



All locks are released after commit or abort

Automatic Acquisition of Locks (Cont.)

Multiple granularity

Allow data items to be of various sizes and define a hierarchy of data granularities, where the small granularities are nested within larger ones

Can be represented graphically as a tree

when a transaction locks a node in the tree explicitly, it implicitly locks all the node's descendents in the same mode.

Granularity of locking (level in tree where locking is done):

fine granularity (lower in tree): high concurrency, high locking overhead

coarse granularity (higher in tree): low locking overhead, low concurrency

Multiple Granularity

Example of granularity hierarchy

The highest level in the example hierarchy is the entire database.

The levels below are of type area, file and record in that order.

Example of Granularity Hierarchy

Intention lock modes

In addition to S and X lock modes, there are three additional lock modes with multiple granularity:

intention-shared (IS): indicates explicit locking at a lower level of the tree but only with shared locks.

intention-exclusive (IX): indicates explicit locking at a lower level with exclusive or shared locks

shared and intention- exclusive (SIX): the subtree rooted by that node is locked explicitly in shared mode and indicates explicit locking at a lower level with exclusive-mode locks.

intention locks allow a higher level node to be locked in S or X mode without having to check all descendent nodes.

Intention Lock Modes

Compatibility matrix with intention lock modes

The compatibility matrix for all lock modes is: additional lock modes with multiple granularity:

Compatibility Matrix with Intention Lock Modes











Multiple granularity locking scheme

Transaction additional lock modes with multiple granularity:Ti can lock a node Q, using the following rules:

1. The lock compatibility matrix must be observed.

2. The root of the tree must be locked first, and may be locked in

any mode.

3. A node Q can be locked by Ti in IS or S mode only if the parent

of Q is currently locked by Ti in either IX or IS mode.

4. A node Q can be locked by Ti in X, SIX, or IX mode only if the

parent of Q is currently locked by Ti in either IX or SIX mode.

5. Ti can lock a node only if it has not previously unlocked any node

(that is, Tiis two-phase).

6. Tican unlock a node Q only if none of the children of Q are

currently locked by Ti.

Observe that locks are acquired in root-to-leaf order, whereas they are released in leaf-to-root order.

Multiple Granularity Locking Scheme

Deadlock handling

Consider the following two transactions: additional lock modes with multiple granularity:

T1: write (X) T2: write(Y)

write(Y) write(X)

Schedule with deadlock

Deadlock Handling



lock-X on X

write (X)

lock-X on Y

write (Y)

wait for lock-X on X

wait for lock-X on Y

Deadlock handling1

System is deadlocked if there is a set of transactions such that every transaction in the set is waiting for another transaction in the set.

Deadlock prevention protocols ensure that the system will never enter into a deadlock state. Some prevention strategies :

Require that each transaction locks all its data items before it begins execution (predeclaration).

Impose partial ordering of all data items and require that a transaction can lock data items only in the order specified by the partial order (graph-based protocol).

Deadlock Handling

More deadlock prevention strategies

Following schemes use transaction timestamps for the sake of deadlock prevention alone.

wait-die scheme — non-preemptive

older transaction may wait for younger one to release data item. Younger transactions never wait for older ones; they are rolled back instead.

a transaction may die several times before acquiring needed data item

wound-wait scheme — preemptive

older transaction wounds (forces rollback) younger transaction instead of waiting for it. Younger transactions may wait for older ones.

may be fewer rollbacks than wait-die scheme.

More Deadlock Prevention Strategies

Deadlock prevention cont

Both in deadlock prevention alone.wait-die and in wound-wait schemes, a rolled back transactions is restarted with its original timestamp. Older transactions thus have precedence over newer ones, and starvation is hence avoided.

Timeout-Based Schemes :

a transaction waits for a lock only for a specified amount of time. After that, the wait times out and the transaction is rolled back.

thus deadlocks are not possible

simple to implement; but starvation is possible. Also difficult to determine good value of the timeout interval.

Deadlock prevention (Cont.)

Deadlock detection

Deadlocks can be described as a deadlock prevention alone.wait-for graph, which consists of a pair G = (V,E),

V is a set of vertices (all the transactions in the system)

E is a set of edges; each element is an ordered pair TiTj.

If Ti Tjis in E, then there is a directed edge from Ti to Tj, implying that Ti is waiting for Tj to release a data item.

When Ti requests a data item currently being held by Tj, then the edge TiTj is inserted in the wait-for graph. This edge is removed only when Tj is no longer holding a data item needed by Ti.

The system is in a deadlock state if and only if the wait-for graph has a cycle. Must invoke a deadlock-detection algorithm periodically to look for cycles.

Deadlock Detection*

Deadlock detection cont
Deadlock Detection* (Cont.) deadlock prevention alone.

Wait-for graph without a cycle

Wait-for graph with a cycle

Deadlock recovery

When deadlock is detected : deadlock prevention alone.

Some transaction will have to rolled back (made a victim) to break deadlock. Select that transaction as victim that will incur minimum cost.

Rollback -- determine how far to roll back transaction

Total rollback: Abort the transaction and then restart it.

More effective to roll back transaction only as far as necessary to break deadlock.

Starvation happens if same transaction is always chosen as victim. Include the number of rollbacks in the cost factor to avoid starvation

Deadlock Recovery*

Topic 8 database system architectures

Centralized Systems deadlock prevention alone.

Client--Server Systems

Distributed Systems

Topic 8: Database System Architectures

Centralized systems

Run on a single computer system and do not interact with other computer systems.

General-purpose computer system: one to a few CPUs and a number of device controllers that are connected through a common bus that provides access to shared memory.

Single-user system (e.g., personal computer or workstation): desk-top unit, single user, usually has only one CPU and one or two hard disks; the OS may support only one user.

Multi-user system: more disks, more memory, multiple CPUs, and a multi-user OS. Serve a large number of users who are connected to the system via terminals. Often called server systems.

Centralized Systems

Client server systems

Server systems satisfy requests generated at other computer systems.m client systems, whose general structure is shown below:

Client-Server Systems








Client server systems cont

Database functionality can be divided into: other computer systems.

Back-end: manages access structures, query evaluation and optimization, concurrency control and recovery.

Front-end: consists of tools such as forms, report-writers, and graphical user interface facilities.

The interface between the front-end and the back-end is through SQL or through an application program interface.

Client-Server Systems (Cont.)

Client server systems cont1

Advantages of replacing mainframes with networks of workstations or personal computers connected to back-end server machines:

better functionality for the cost

flexibility in locating resources and expanding facilities

better user interfaces

easier maintenance

One kind of server systems is transaction servers which are widely used in relational database systems.

Client-Server Systems (Cont.)

Transaction servers

Also called workstations or personal computers connected to back-end server machines:query server systems or SQL server systems; clients send requests to the server system where the transactions are executed, and results are shipped back to the client.

Requests specified in SQL, and communicated to the server through a remote procedure call (RPC) mechanism.

Transactional RPC allows many RPC calls to collectively form a transaction.

Open Database Connectivity (ODBC) is an application program interface standard from Microsoft for connecting to a server, sending SQL requests, and receiving results.

Transaction Servers

Distributed systems

Data spread over multiple machines (also referred to as workstations or personal computers connected to back-end server machines:sites or nodes.

Network interconnects the machines

Data shared by users on multiple machines

Differentiate between local and global transactions

A local transaction accesses data in the single site at which the transaction was initiated.

A global transaction either accesses data in a site different from the one at which the transaction was initiated or accesses data in several different sites.

Distributed Systems

Trade offs in distributed systems

Sharing data – users at one site able to access the data residing at some other sites.

Autonomy – each site is able to retain a degree of control over data stored locally.

Higher system availability through redundancy — data can be replicated at remote sites, and system can function even if a site fails.

Disadvantage: added complexity required to ensure proper coordination among sites.

Software development cost.

Greater potential for bugs.

Increased processing overhead.

Trade-offs in Distributed Systems

Topic 9 new applications

Decision-Support Systems residing at some other sites.

Data Analysis

Data Mining

Data Warehousing

Spatial and Geographic Databases

Multimedia Databases

Mobility and Personal Databases

Information-Retrieval Systems

Topic 9: New Applications

Decision support system

Decision-Support systems are used to make business decisions often based on data collected by on-line transaction-processing systems.

Examples of business decisions:

what items to stock?

What insurance premium to change?

Who to send advertisements to?

Examples of data used for making decisions

Retail sales transaction details

Customer profiles (income, age, sex, etc.)

Decision Support System

Decision support systems overview

Data analysis tasks are simplified by SQL extensions. often based on data collected by on-line transaction-processing systems.

Statistical analysis packages (e.g., S++) can be interfaced with databases

Statistical analysis is a large field, will not study it here

Data mining seeks to discover knowledge automatically in the form of statistical rules and patterns from Large databases.

A data warehouse archives information gathered from multiple sources, and stores it under a unified schema, at a single site.

Decision-Support Systems: Overview

Data analysis

Aggregate functions summarize large volumes of data often based on data collected by on-line transaction-processing systems.

A histogram partitions the values taken by an attribute into ranges, and computes an aggregate over the values in each range; cumbersome to use standard SQL to construct a histogram. Extension proposed by Red Brick:

select percentile, avg (balance)

from account

group by N_tile (balance, 10) as percentile

Here, N_tile(balance,10) divides the values for balance into 10 consecutive ranges, with an equal number of values in each range, duplicates are not eliminated.

Data Analysis

Data Analysis (Cont.) often based on data collected by on-line transaction-processing systems.




















  • Cross-tabulation of number by size and color of sample relation sales with the schema Sales(color, size, number).

Data analysis cont

Can represent subtotals in relational form by using the value all

E.g. : obtain (Light, all, 53) and (Dark, all, 35) by aggregating individual tuples with different values for size for each color.

Data Analysis (Cont.)








































Data analysis cont1

Rollup value all: Moving from finer-granularity data to a coarser granularity by means of aggregation.

Drill down: Moving from coarser-granularity data to finer-granularity data.

Proposed extensions to SQL, such as the cube operation help to support generation of summary data

The following query generates the previous table.

selectcolor, size, sum (number)


groupbycolor, sizewith cube

Data Analysis (Cont.)

Data analysis cont2

Figure shows the combinations of dimensions size, color, price

In general computing cube operation with n groupby columns gives 2n different groupby combinations.

Data Analysis (Cont.)

Data mining

Like knowledge discovery in artificial intelligence data mining discovers statistical rules and patterns it differs from machine learning in that it deals with large volumes of data stored primarily on disk.

Knowledge discovered from a database can be represented by a set of rules.

e.g.,: “Young women with annual incomes greater than $50,000 are most likely to buy sports cars”

Discover rules using one of two models:

1. The user is involved directly in the process of knowledge discovery.

2. The system is responsible for automatically discovering knowledge from the database by detecting patterns and correlation's in the data.

Data Mining

Knowledge representation using rules

General form of rules mining discovers statistical rules and patterns it differs from machine learning in that it deals with large volumes of data stored primarily on disk. X antecedent  consequent

X is a list of one or more variables with associated ranges.

The rule  transactions T, buys (T, bread)  buys(T, milk) states: if there is a tuple (t1, bread) in the relation buys, there must also be a tuple (t1, milk) in the relation buys.

Population: Cross-product of the ranges of the variables in the rule.

In the above example, the set of all transactions.

Support: Measure of what fraction of the population satisfies both the antecedent and the consequent of the rule.

e.g., 10% of transactions buy bread and milk.

Confidence : Measure of how often the consequent is true when the antecedent is true.

e.g., 80% of transactions that buy bread also buy milk

Knowledge Representation Using Rules

Some classes of data mining problems

Classification : Finding rules that partition the given data into disjoint groups (classes) that are relevant for making a decision

(e.g.,: which of several factors help classify a person’s credit worthiness).

Associations: Useful to determine associations between different items

(e.g.,: someone who buys bread is quite likely also to buy milk).

Sequence correlations: determine correlations between related sequence data.

(e.g., : when bond rates go up stock prices go down within two days.)

Some Classes of Data-Mining Problems

User guided data mining

In user-guided data mining, primary responsibility for discovering rules is with the user.

User may runs tests on the database to verify or refute a hypothesis. Confidence and support for rules expressing a hypothesis are derived from the database.

An iterative process of forming and refining rules is used. Example: Test the hypothesis “People who hold master’s degrees are the most likely to have an excellent credit rating.” If confidence of rule is low, may refine it into the rule:

 peopleP, = MastersandP.income  75, 000

 = excellent

Data-visualization though graphical representations like maps, charts, and color-coding, helps detect patterns in data

User-Guided Data Mining

Classification rules

Classification rules help assign new objects to a set of classes. E.g., given a new automobile insurance applicant, should he or she be classified as low risk, medium risk or high risk?

Classification rules for above example could use a variety of knowledge, such as educational level of applicant, salary of applicant, age of applicant, etc.

Classification rules can be compactly shown as a Classification tree.

Classification Rules

Part of credit risk classification tree
Part of Credit Risk Classification Tree classes. E.g., given a new automobile insurance applicant, should he or she be classified as low risk, medium risk or high risk?

Discovery of classification rules

Training set: a data sample in which the grouping for each tuple is already known.

Top down generation of classification tree.

Each internal node of the tree partitions the data into groups based on the attribute.

The data at a node is not partitioned further if either all (or most) of the items at the node belong to the same class, or all attributes have been considered. Such a node is a leaf node.

Otherwise the data at the node is partitioned further by picking an attribute for partitioning data at the node.

Discovery of Classification Rules*

Discovery of classification rules cont

Consider credit risk example: Suppose degree is chosen to partition the data at the root. Since degree has a small number of possible values, one child is created for each value.

At each child node of the root, further classification is done if required. Here, partitions are defined by income. Since income is a continuous attribute, some number of intervals are chosen, and one child created for each interval.

Different classification algorithms use different ways of choosing which attribute to partition on at each node, and what the intervals, if any, are.

In general, different branches of the tree could grow to different levels. Different nodes at the same level may use different partitioning attributes.

Discovery of Classification Rules *(Cont.)

Discovery of association rules

Example: partition the data at the root. Since degree has a small number of possible values, one child is created for each value. transactions T, buys (T, bread)  buys (T, milk)

In general: notion of transaction , and its itemset, the set of items contained in the transaction

General form of rule:

 transactions T, c(T, i1) and . . . and c(T, in)  c(T, i0)

where c(T, ik) denotes that transaction T contains item ik.

Above can be represented as A  b where A = {i1, i2, . . . , in}

and b = i0.

Support of rule = number of transactions whose itemsets contain A  {b}

Usually desire rules with strong support, which will involve only items purchased in a significant percentage of the transactions.

Discovery of Association Rules

Discovery of association rules cont

Consider all possible sets of relevant items. partition the data at the root. Since degree has a small number of possible values, one child is created for each value.

For each set find its support (i.e. , how many transactions purchase all items in the set).

Use sets with sufficiently high support to generate association rules.

From set A generate the rules A - {b} b for each b  A.

Support of each of the rules is support of A.

Confidence of a rule is support of A divided by support of A - {b}.

Discovery of Association Rules (Cont.)

Finding support

Few sets: Determine level of support via a single pass. partition the data at the root. Since degree has a small number of possible values, one child is created for each value.

A count is maintained for each set, initially set to 0.

When a transaction is fetched, the count is incremented for each set of items which contained in the itemset of the transaction.

Sets with a high count at the end of the pass correspond to items with a high degree of association.

Many sets: If memory not enough to hold all counts for all sets Use multiple passes, considering only some sets in each pass.

Optimization: Once a set is eliminated because it occurs in too small a fraction of the transactions, none of its supersets needs to be considered.

Finding Support

Data warehousing

A data warehouse is a repository of information gathered from multiple sources.

Data Warehousing

Data warehousing cont

Provides a single consolidated interface to data from multiple sources.

Data stored for an extended period, providing access to historical data

Data/updates are periodically downloaded from online transaction processing (OLTP) systems.

Typically, download happens each night.

Data may not be completely up-to-date, but is recent enough for analysis.

Running large queries at the warehouse ensures that OLTP systems are not affected by the decision-support workload.

Data Warehousing (Cont.)

Issues in building a warehouse

When and how to gather data. from multiple sources.

Source driven: data source initiates data transfer

Destination driven: warehouse initiates data transfer

What schema to use.

Schema integration

Cleaning and conversion of incoming data

What data to summarize.

Raw data may be too large to store on-line

Aggregate values (totals/subtotals) often suffice

Queries on raw data can often be transformed by query optimizer to use aggregate values

How to propagate updates.

Date at warehouse is a view on source data

Efficient view maintenance techniques required

Issues in Building a Warehouse

Spatial and geographic databases

Spatial databases store information related to spatial locations, and support efficient storage, indexing and querying of spatial data.

Special purpose index structures are important for accessing spatial data, and for processing spatial join queries.

Design databases (or CAD databases) store design information about how objects are constructed e.g.: designs of buildings, aircraft, layouts of integrated-circuits

Geographic databases store geographic information (e.g., maps): often called geographic information systems or GIS.

Spatial and Geographic Databases

Representation of geometric information

Various geometric constructs can be represented in a database in a normalized fashion.

Represent a line segment by the coordinates of its endpoints.

Approximate a curve by partitioning it into a sequence of segments; represents each segment as a separate tuple that also carries with it the identifier of the curve.

Closed polygons: list its vertices in order, starting vertex is the same as the ending vertex.

Alternative: triangulation — divide polygon into triangles; give the polygon an identifier with each of its triangles.

Representation of Geometric Information

Representation of geometric constructs cont
Representation of Geometric Constructs (Cont.) database in a normalized fashion.

Representation of geometric information cont

Representation of points and line segment in 3-D similar to 2-D, except that points have an extra z component

Represent arbitrary polyhedra by dividing them into tetrahedrons, like triangulating polygons.

Alternative: List their faces, each of which is a polygon, along with an indication of which side of the face is inside the polyhedron.

Representation of Geometric Information (Cont.)

Design databases

Represent design components as objects (generally geometric objects); the connections between the objects indicate how the design is structured.

Simple two-dimensional objects: points, lines, triangles, rectangles, polygons.

Complex two-dimensional objects: formed from simple objects via union, intersection, and difference operations.

Complex three-dimensional objects: formed from simpler objects such as spheres, cylinders, and cuboids, by union, intersection, and difference operations.

Wireframe models represent three-dimensional surfaces as a set of simpler objects.

Design Databases

Representation of geometric constructs

Design databases also store non-spatial information about objects (e.g., construction material, color, etc.)

Spatial integrity constraints are important.

E.g., pipes should not intersect, wires should not be too close to each other, etc.

Representation of Geometric Constructs

(a) Difference of cylinders

(b) Union of cylinders

Geographic data

Raster data consist of bit maps or pixel maps, in two or more dimensions.

Example 2-D raster image: satellite image of cloud cover, where each pixel stores the cloud visibility in a particular area.

Additional dimensions might include the temperature at different altitudes at different regions, or measurements taken at different points in time.

Design databases generally do not store raster data.

Geographic Data

Geographic data cont

Vector data are constructed from basic geometric objects: points, line segments, triangles, and other polygons in two dimensions, and cylinders, speheres, cuboids, and other polyhedrons in three dimensions.

Vector format often used to represent map data.

Roads can be considered as two-dimensional and represented by lines and curves.

Some features, such as rivers, may be represented either as complex curves or as complex polygons, depending on whether their width is relevant.

Features such as regions and lakes can be depicted as polygons.

Geographic Data (Cont.)

Applications of geographic data

Examples of geographic data points, line segments, triangles, and other polygons in two dimensions, and cylinders, speheres, cuboids, and other polyhedrons in three dimensions.

map data for vehicle navigation

distribution network information for power, telephones, water supply, and sewage

Vehicle navigation systems store information about roads and services for the use of drivers:

Spatial data: e.g, road/restaurant/gas-station coordinates

Non-spatial data: e.g., one-way streets, speed limits, traffic congestion

Global Positioning System or GPS unit - utilizes information broadcast from GPS satellites to find the current location of user with an accuracy of tens of meters.

increasingly used in vehicle navigation systems as well as utility maintenance applications.

Applications of Geographic Data

Spatial queries

Nearness queries request objects that lie near a specified location.

Nearest neighbor queries, given a point or an object, find the nearest object that satisfies given conditions.

Region queries deal with spatial regions. e.g., ask for objects that lie partially or fully inside a specified region.

Queries that compute intersections or unions of regions.

Spatial join of two spatial relations with the location playing the role of join attribute.

Spatial Queries

Spatial queries cont

Spatial data is typically queried using a graphical query language; results are also displayed in a graphical manner.

Graphical interface constitutes the front-end

Extensions of SQL with abstract data types, such as lines, polygons and bit maps, have been proposed to interface with back-end.

allows relational databases to store and retrieve spatial information

queries can mix spatial and nonspatial conditions

extensions also include and allowing spatial conditions (contains or overlaps).

Spatial Queries (Cont.)

Multimedia databases

To provide such database functions as indexing and consistency, it is desirable to store multimedia data in a database (rather than storing them outside the database, in a file system).

The database must handle large object representation.

Similarity-based retrieval must be provided by special index structures.

Must provide guaranteed steady retrieval rates for continuos-media data.

Multimedia Databases

Similarity based retrieval

Pictorial data - Two pictures or images that are slightly different as represented in the database may be considered the same by a user.

E.g., identify similar designs for registering a new trademark.

Audio data- Speech-based user interfaces allow the user to give a command or identify a data item by speaking.

E.g., test user input against stored commands.

Handwritten data -- Identify a handwritten data item or command stored in the database (requires similarity testing).

Similarity-Based Retrieval

Continuos media data

Most important types are video and audio data. different as represented in the database may be considered the same by a user.

Characterized by high data volumes and real-time information-delivery requirements.

Data must be delivered sufficiently fast that there are no gaps in the audio or video.

Data must be delivered at a rate that does not cause overflow of system buffers.

Synchronization among distinct data streams must be maintained (video of a person speaking must show lips moving synchronously with the audio).

Continuos-Media Data

Multimedia data formats

Store and transmit multimedia data in compressed form;JPEG is the most widely used format for image data.

Encoding each frame of a video using JPEG is wasteful since, successive frames of a video are often nearly the same.

MPEG standards use commonalties among a sequence of frames to achieve a greater degree of compression.

MPEG-1 stores a minute of 30-frame-per-second video and audio in approximately 12.5 MB (compares with 75 MB for video using only JPEG); quality comparable to VHS video tape.

MPEG-2 designed for digital broadcast systems and DVD; negligible loss of video quality. Compresses 1 minute of audio-video to approximately 17 MB.


Multimedia Data Formats

Video servers

Current video-on-demand servers are based on file systems; existing database systems do not meet real-time response requirements.

Multimedia data are stored on several disks (RAID configuration), or on tertiary storage for less frequently accessed data.

Head-end terminals - used to view multimedia data; PCs or TVs attached to a small, inexpensive computer called a set-top box.

Network - Transmission of multimedia data from a server to multiple head-end terminals requires a high-capacity network, such as asynchronous-transfer-mode (ATM) network.

Video Servers

Mobility and personal databases

A mobile computing environment consists of existing database systems do not meet real-time response computers, referred to as mobile hosts, and a wired network of computers.

Mobile host may be able to communicate with wired network through a wireless digital communication network.

A model for mobile communication

Mobile hosts communicate to the wired network via computers referred to as mobile support stations.

Each mobile support station manages those mobile hosts within its cell.

When mobile hosts move between cells, there is a handoff of control from one mobile support station to another.

Mobility and Personal Databases

Database issues in mobile computing

New issues for query optimization. existing database systems do not meet real-time response requirements.

energy (battery power) is a scarce resource and its usage must be minimized

mobile user’s locations may be a parameter of the query.

Users may need to be able to perform database updates even while the mobile computer is disconnected.

e.g., mobile salesman records sale of products on (local copy of) database.

Can result in conflicts detected on reconnection, which may need to be resolved manually.

Database Issues in Mobile Computing

Disconnectivity and consistency

A mobile host may remain in operation during periods of disconnection.

Problems created if the user of the mobile host issues queries and updates on data that resides or is cached locally:

Recoverability: Updates entered on a disconnected machine may be lost if the mobile host fails. Since the mobile host represents a single point of failure, stable storage cannot be simulated well.

Consistency : Cached data may become out of date, but the mobile host cannot discover this until it is reconnected.

Disconnectivity and Consistency

Information retrieval systems

Information retrieval disconnection. (IR) systems use a simpler data model than database systems, but provide more powerful querying capabilities within the restricted model.

Queries attempt to locate documents that are of interest by specifying, for example, sets of keywords.

e.g., find documents containing the words “database systems”

Information retrieval systems order answers based on their estimated relevance.

e.g., user may really only want documents about database systems, but the system may retrieve all documents that mention the phrase “database systems”.

Documents may be ordered by, for example, how many times the phrase appears in the document.

Information Retrieval Systems

Differences from database systems

Information retrieval systems, unlike traditional database systems, handle:

Unstructured documents

Searching using keywords and relevance ranking

Most information retrieval systems do not handle:

High update rates

Concurrency control

Data structured using more complex data models (e.g., relational or object oriented data models)

Complex queries written in, e.g., SQL

Differences From Database Systems

Topic 10 xml
Topic 10: XML systems, handle:

  • XML: Extensible Markup Language

  • Defined by the WWW Consortium (W3C)

  • Originally intended as a document markup language not a database language

    • Documents have tags giving extra information about sections of the document

      • E.g. <title> XML </title> <slide> Introduction …</slide>

    • Derived from SGML (Standard Generalized Markup Language), but simpler to use than SGML

    • Extensible, unlike HTML

      • Users can add new tags, and separately specify how the tag should be handled for display

    • Goal was (is?) to replace HTML as the language for publishing documents on the Web

Xml introduction cont
XML Introduction (Cont.) systems, handle:

  • The ability to specify new tags, and to create nested tag structures made XML a great way to exchange data, not just documents.

    • Much of the use of XML has been in data exchange applications, not as a replacement for HTML

  • Tags make data (relatively) self-documenting

    • E.g.<bank>


      <account-number> A-101 </account-number>

      <branch-name> Downtown </branch-name>

      <balance> 500 </balance>



      <account-number> A-101 </account-number>

      <customer-name> Johnson </customer-name>



Xml motivation
XML: Motivation systems, handle:

  • Data interchange is critical in today’s networked world

    • Examples:

      • Banking: funds transfer

      • Order processing (especially inter-company orders)

      • Scientific data

        • Chemistry: ChemML, …

        • Genetics: BSML (Bio-Sequence Markup Language), …

    • Paper flow of information between organizations is being replaced by electronic flow of information

  • Each application area has its own set of standards for representing information

  • XML has become the basis for all new generation data interchange formats

Xml motivation cont
XML Motivation (Cont.) systems, handle:

  • Earlier generation formats were based on plain text with line headers indicating the meaning of fields

    • Similar in concept to email headers

    • Does not allow for nested structures, no standard “type” language

    • Tied too closely to low level document structure (lines, spaces, etc)

  • Each XML based standard defines what are valid elements, using

    • XML type specification languages to specify the syntax

      • DTD (Document Type Descriptors)

      • XML Schema

    • Plus textual descriptions of the semantics

  • XML allows new tags to be defined as required

    • However, this may be constrained by DTDs

  • A wide variety of tools is available for parsing, browsing and querying XML documents/data

Structure of xml data
Structure of XML Data systems, handle:

  • Tag: label for a section of data

  • Element: section of data beginning with <tagname> and ending with matching </tagname>

  • Elements must be properly nested

    • Proper nesting

      • <account> … <balance> …. </balance> </account>

    • Improper nesting

      • <account> … <balance> …. </account> </balance>

    • Formally: every start tag must have a unique matching end tag, that is in the context of the same parent element.

  • Every document must have a single top-level element

Example of nested elements

<bank-1> systems, handle:<customer>

<customer-name> Hayes </customer-name>

<customer-street> Main </customer-street>

<customer-city> Harrison </customer-city>


<account-number> A-102 </account-number>

<branch-name> Perryridge </branch-name>

<balance> 400 </balance>




</customer> . .


Example of Nested Elements

Motivation for nesting
Motivation for Nesting systems, handle:

  • Nesting of data is useful in data transfer

    • Example: elements representing customer-id, customer name, and address nested within an order element

  • Nesting is not supported, or discouraged, in relational databases

    • With multiple orders, customer name and address are stored redundantly

    • normalization replaces nested structures in each order by foreign key into table storing customer name and address information

    • Nesting is supported in object-relational databases

  • But nesting is appropriate when transferring data

    • External application does not have direct access to data referenced by a foreign key

Structure of xml data cont

Mixture of text with sub-elements is legal in XML. systems, handle:



This account is seldom used any more.

<account-number> A-102</account-number>

<branch-name> Perryridge</branch-name>

<balance>400 </balance></account>

Useful for document markup, but discouraged for data representation

Structure of XML Data (Cont.)


Elements can have systems, handle:attributes

<account acct-type = “checking” >

<account-number> A-102 </account-number>

<branch-name> Perryridge </branch-name>

<balance> 400 </balance>


Attributes are specified by name=value pairs inside the starting tag of an element

An element may have several attributes, but each attribute name can only occur once

<account acct-type = “checking” monthly-fee=“5”>


Attributes vs subelements
Attributes Vs. Subelements systems, handle:

  • Distinction between subelement and attribute

    • In the context of documents, attributes are part of markup, while subelement contents are part of the basic document contents

    • In the context of data representation, the difference is unclear and may be confusing

      • Same information can be represented in two ways

        • <account account-number = “A-101”> …. </account>

        • <account> <account-number>A-101</account-number> … </account>

    • Suggestion: use attributes for identifiers of elements, and use subelements for contents

More on xml syntax
More on XML Syntax systems, handle:

  • Elements without subelements or text content can be abbreviated by ending the start tag with a /> and deleting the end tag

    • <account number=“A-101” branch=“Perryridge” balance=“200 />

  • To store string data that may contain tags, without the tags being interpreted as subelements, use CDATA as below

    • <![CDATA[<account> … </account>]]>

      • Here, <account> and </account> are treated as just strings

Namespaces systems, handle:

  • XML data has to be exchanged between organizations

  • Same tag name may have different meaning in different organizations, causing confusion on exchanged documents

  • Specifying a unique string as an element name avoids confusion

  • Better solution: use unique-name:element-name

  • Avoid using long unique names all over document by using XML Namespaces

    <bank Xmlns:FB=‘’> …



    <FB:branchcity> Brooklyn </FB:branchcity>



Xml document schema
XML Document Schema systems, handle:

  • Database schemas constrain what information can be stored, and the data types of stored values

  • XML documents are not required to have an associated schema

  • However, schemas are very important for XML data exchange

    • Otherwise, a site cannot automatically interpret data received from another site

  • Two mechanisms for specifying XML schema

    • Document Type Definition (DTD)

      • Widely used

    • XML Schema

      • Newer, increasing use

Document type definition dtd
Document Type Definition (DTD) systems, handle:

  • The type of an XML document can be specified using a DTD

  • DTD constraints structure of XML data

    • What elements can occur

    • What attributes can/must an element have

    • What subelements can/must occur inside each element, and how many times.

  • DTD does not constrain data types

    • All values represented as strings in XML

  • DTD syntax

    • <!ELEMENT element (subelements-specification) >

    • <!ATTLIST element (attributes) >

Element specification in dtd
Element Specification in DTD systems, handle:

  • Subelements can be specified as

    • names of elements, or

    • #PCDATA (parsed character data), i.e., character strings

    • EMPTY (no subelements) or ANY (anything can be a subelement)

  • Example

    <! ELEMENT depositor (customer-name account-number)>

    <! ELEMENT customer-name (#PCDATA)>

    <! ELEMENT account-number (#PCDATA)>

  • Subelement specification may have regular expressions

    <!ELEMENT bank ( ( account | customer | depositor)+)>

    • Notation:

      • “|” - alternatives

      • “+” - 1 or more occurrences

      • “*” - 0 or more occurrences

Bank dtd

<!DOCTYPE bank [ systems, handle:

<!ELEMENT bank ( ( account | customer | depositor)+)>

<!ELEMENT account (account-number branch-name balance)>

<! ELEMENT customer(customer-name customer-street customer-city)>

<! ELEMENT depositor (customer-name account-number)>

<! ELEMENT account-number (#PCDATA)>

<! ELEMENT branch-name (#PCDATA)>

<! ELEMENT balance(#PCDATA)>

<! ELEMENT customer-name(#PCDATA)>

<! ELEMENT customer-street(#PCDATA)>

<! ELEMENT customer-city(#PCDATA)>


Bank DTD

Attribute specification in dtd

Attribute specification : for each attribute systems, handle:


Type of attribute


ID (identifier) or IDREF (ID reference) or IDREFS (multiple IDREFs)

more on this later


mandatory (#REQUIRED)

has a default value (value),

or neither (#IMPLIED)


<!ATTLIST account acct-type CDATA “checking”>

<!ATTLIST customer

customer-id ID # REQUIRED

accounts IDREFS # REQUIRED >

Attribute Specification in DTD

Ids and idrefs
IDs and IDREFs systems, handle:

  • An element can have at most one attribute of type ID

  • The ID attribute value of each element in an XML document must be distinct

    • Thus the ID attribute value is an object identifier

  • An attribute of type IDREF must contain the ID value of an element in the same document

  • An attribute of type IDREFS contains a set of (0 or more) ID values. Each ID value must contain the ID value of an element in the same document

Bank dtd with attributes
Bank DTD with Attributes systems, handle:

  • Bank DTD with ID and IDREF attribute types.

    <!DOCTYPE bank-2[

    <!ELEMENT account (branch, balance)>

    <!ATTLIST account

    account-number ID # REQUIRED

    owners IDREFS # REQUIRED>

    <!ELEMENT customer(customer-name, customer-street,


    <!ATTLIST customer

    customer-id ID # REQUIRED

    accounts IDREFS # REQUIRED>

    … declarations for branch, balance, customer-name, customer-street and customer-city]>

Xml data with id and idref attributes

<bank-2> systems, handle:

<account account-number=“A-401” owners=“C100 C102”>

<branch-name> Downtown </branch-name>

<balance> 500 </balance>


<customer customer-id=“C100” accounts=“A-401”>

<customer-name>Joe </customer-name>

<customer-street> Monroe </customer-street>

<customer-city> Madison</customer-city>


<customer customer-id=“C102” accounts=“A-401 A-402”>

<customer-name> Mary </customer-name>

<customer-street> Erin </customer-street>

<customer-city> Newark </customer-city>



XML data with ID and IDREF attributes

Limitations of dtds
Limitations of DTDs systems, handle:

  • No typing of text elements and attributes

    • All values are strings, no integers, reals, etc.

  • Difficult to specify unordered sets of subelements

    • Order is usually irrelevant in databases

    • (A | B)* allows specification of an unordered set, but

      • Cannot ensure that each of A and B occurs only once

  • IDs and IDREFs are untyped

    • The owners attribute of an account may contain a reference to another account, which is meaningless

      • owners attribute should ideally be constrained to refer to customer elements

Xml schema
XML Schema systems, handle:

  • XML Schema is a more sophisticated schema language which addresses the drawbacks of DTDs. Supports

    • Typing of values

      • E.g. integer, string, etc

      • Also, constraints on min/max values

    • User defined types

    • Is itself specified in XML syntax, unlike DTDs

      • More standard representation, but verbose

    • Is integrated with namespaces

    • Many more features

      • List types, uniqueness and foreign key constraints, inheritance ..

  • BUT: significantly more complicated than DTDs, not yet widely used.

Xml schema version of bank dtd

<xsd:schema xmlns:xsd= systems, handle:>

<xsd:element name=“bank” type=“BankType”/>

<xsd:element name=“account”><xsd:complexType> <xsd:sequence> <xsd:element name=“account-number” type=“xsd:string”/> <xsd:element name=“branch-name” type=“xsd:string”/> <xsd:element name=“balance” type=“xsd:decimal”/> </xsd:squence></xsd:complexType>


…..definitions of customer and depositor ….


<xsd:element ref=“account” minOccurs=“0” maxOccurs=“unbounded”/>

<xsd:element ref=“customer” minOccurs=“0” maxOccurs=“unbounded”/>

<xsd:element ref=“depositor” minOccurs=“0” maxOccurs=“unbounded”/>




XML Schema Version of Bank DTD

Querying and transforming xml data
Querying and Transforming XML Data systems, handle:

  • Translation of information from one XML schema to another

  • Querying on XML data

  • Above two are closely related, and handled by the same tools

  • Standard XML querying/translation languages

    • XPath

      • Simple language consisting of path expressions

    • XQuery

      • An XML query language with a rich set of features

  • Wide variety of other languages have been proposed, and some served as basis for the Xquery standard

    • XML-QL, Quilt, XQL, …

Tree model of xml data
Tree Model of XML Data systems, handle:

  • Query and transformation languages are based on a tree model of XML data

  • An XML document is modeled as a tree, with nodes corresponding to elements and attributes

    • Element nodes have children nodes, which can be attributes or subelements

    • Text in an element is modeled as a text node child of the element

    • Children of a node are ordered according to their order in the XML document

    • Element and attribute nodes (except for the root node) have a single parent, which is an element node

    • The root node has a single child, which is the root element of the document

  • We use the terminology of nodes, children, parent, siblings, ancestor, descendant, etc., which should be interpreted in the above tree model of XML data.

XPath systems, handle:

  • XPath is used to address (select) parts of documents usingpath expressions

  • A path expression is a sequence of steps separated by “/”

    • Think of file names in a directory hierarchy

  • Result of path expression: set of values that along with their containing elements/attributes match the specified path

  • E.g. /bank-2/customer/customer-name evaluated on the bank-2 data we saw earlier returns



  • E.g. /bank-2/customer/customer-name/text( )

    returns the same names, but without the enclosing tags

Xpath cont

The initial “/” denotes root of the document (above the top-level tag)

Path expressions are evaluated left to right

Each step operates on the set of instances produced by the previous step

Selection predicates may follow any step in a path, in [ ]

E.g. /bank-2/account[balance > 400]

returns account elements with a balance value greater than 400

/bank-2/account[balance] returns account elements containing a balance subelement

Attributes are accessed using [email protected]

E.g. /bank-2/account[balance > 400][email protected]

returns the account numbers of those accounts with balance > 400

IDREF attributes are not dereferenced automatically (more on this later)

XPath (Cont.)

Functions in xpath
Functions in XPath top-level tag)

  • XPath provides several functions

    • The function count() at the end of a path counts the number of elements in the set generated by the path

      • E.g. /bank-2/account[customer/count() > 2]

        • Returns accounts with > 2 customers

    • Also function for testing position (1, 2, ..) of node w.r.t. siblings

  • Boolean connectives and and or and function not() can be used in predicates

  • IDREFs can be referenced using function id()

    • id() can also be applied to sets of references such as IDREFS and even to strings containing multiple references separated by blanks

    • E.g. /bank-2/account/id(@owner)

      • returns all customers referred to from the owners attribute of account elements.

More xpath features

Operator “|” used to implement union top-level tag)

E.g. /bank-2/account/id(@owner) |/bank-2/loan/id(@borrower)

gives customers with either accounts or loans

However, “|” cannot be nested inside other operators.

“//” can be used to skip multiple levels of nodes

E.g. /bank-2//customer-name

finds any customer-name element anywhere under the /bank-2 element, regardless of the element in which it is contained.

A step in the path can go to:

parents, siblings, ancestors and descendants

of the nodes generated by the previous step, not just to the children

“//”, described above, is a short from for specifying “all descendants”

“..” specifies the parent.

We omit further details,

More XPath Features

XQuery top-level tag)

  • XQuery is a general purpose query language for XML data

  • Currently being standardized by the World Wide Web Consortium (W3C)

    • The textbook description is based on a March 2001 draft of the standard. The final version may differ, but major features likely to stay unchanged.

  • Alpha version of XQuery engine available free from Microsoft

  • XQuery is derived from the Quilt query language, which itself borrows from SQL, XQL and XML-QL

  • XQuery uses a for … let … where .. result … syntaxfor SQL fromwhere  SQL whereresult  SQL selectlet allows temporary variables, and has no equivalent in SQL

Flwr syntax in xquery

For clause uses XPath expressions, and variable in for clause ranges over values in the set returned by XPath

Simple FLWR expression in XQuery

find all accounts with balance > 400, with each result enclosed in an <account-number> .. </account-number> tagfor$x in /bank-2/account let $acctno := [email protected] where $x/balance > 400 return <account-number> $acctno </account-number>

Let clause not really needed in this query, and selection can be done In XPath. Query can be written as:

for $x in /bank-2/account[balance>400] return <account-number> [email protected] </account-number>

FLWR Syntax in XQuery

Path expressions and functions
Path Expressions and Functions clause ranges over values in the set returned by XPath

  • Path expressions are used to bind variables in the for clause, but can also be used in other places

    • E.g. path expressions can be used in let clause, to bind variables to results of path expressions

  • The function distinct( ) can be used to removed duplicates in path expression results

  • The functiondocument(name)returns root of named document

    • E.g. document(“bank-2.xml”)/bank-2/account

  • Aggregate functions such as sum( ) and count( ) can be applied to path expression results

  • XQuery does not support group by, but the same effect can be got by nested queries, with nested FLWR expressions within a result clause

    • More on nested queries later

Joins clause ranges over values in the set returned by XPath

  • Joins are specified in a manner very similar to SQLfor $a in /bank/account,

    $c in/bank/customer,

    $d in /bank/depositor

    where $a/account-number = $d/account-number and $c/customer-name = $d/customer-name

    return <cust-acct> $c $a </cust-acct>

  • The same query can be expressed with the selections specified as XPath selections:

    for $a in /bank/account $c in /bank/customer $d in /bank/depositor[ account-number = $a/account-number and customer-name = $c/customer-name]

    return <cust-acct> $c $a</cust-acct>

Changing nesting structure

The following query converts data from the flat structure for bank information into the nested structure used in bank-1


for $c in /bank/customer




for $d in /bank/depositor[customer-name = $c/customer-name],

$a in /bank/account[account-number=$d/account-number]

return $a



$c/* denotes all the children of the node to which $c is bound, without the enclosing top-level tag

Exercise for reader: write a nested query to find sum of accountbalances, grouped by branch.

Changing Nesting Structure

Xquery path expressions
XQuery Path Expressions for

  • $c/text() gives text content of an element without any subelements/tags

  • XQuery path expressions support the “–>” operator for dereferencing IDREFs

    • Equivalent to the id( ) function of XPath, but simpler to use

    • Can be applied to a set of IDREFs to get a set of results

    • June 2001 version of standard has changed “–>” to “=>”

Sorting in xquery

Sortby for clause can be used at the end of any expression. E.g. to return customers sorted by namefor $c in /bank/customerreturn <customer> $c/* </customer> sortby(name)

Can sort at multiple levels of nesting (sort by customer-name, and by account-number within each customer)

<bank-1>for $c in /bank/customerreturn <customer> $c/*for $d in /bank/depositor[customer-name=$c/customer-name], $a in /bank/account[account-number=$d/account-number]return <account> $a/* </account> sortby(account-number) </customer> sortby(customer-name)


Sorting in XQuery

Functions and other xquery features
Functions and Other XQuery Features for

  • User defined functions with the type system of XMLSchemafunction balances(xsd:string $c) returns list(xsd:numeric) {for $d in /bank/depositor[customer-name = $c], $a in /bank/account[account-number=$d/account-number]return $a/balance


  • Types are optional for function parameters and return values

  • Universal and existential quantification in where clause predicates

    • some $e inpathsatisfiesP

    • every $e inpathsatisfiesP

  • XQuery also supports If-then-else clauses

Application program interface
Application Program Interface for

  • There are two standard application program interfaces to XML data:

    • SAX (Simple API for XML)

      • Based on parser model, user provides event handlers for parsing events

        • E.g. start of element, end of element

        • Not suitable for database applications

    • DOM (Document Object Model)

      • XML data is parsed into a tree representation

      • Variety of functions provided for traversing the DOM tree

      • E.g.: Java DOM API provides Node class with methodsgetParentNode( ), getFirstChild( ), getNextSibling( ) getAttribute( ), getData( ) (for text node) getElementsByTagName( ), …

      • Also provides functions for updating DOM tree

The end of the course thank you for your cooperation
The end of the course for Thank you for your cooperation!