- By
**glyn** - Follow User

- 66 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' CS 319: Theory of Databases: C4' - glyn

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

(provisionary) Content

- Generalities DB
- Temporal Data
- Integrity constraints (FD revisited)
- Relational Algebra (revisited)
- Query optimisation
- Tuple calculus
- Domain calculus
- Query equivalence
- LLJ, DP and applications
- The Askew Wall
- Datalog

Relational model

- E.F. Codd: check wikipedia!
- Papers:
- http://www.cl.cam.ac.uk/Teaching/2003/Databases/codd.pdf
- Reprint at: A relational Model of Data for Large Shared Data Banks
- Relational Completeness of Data Base Sublanguages

Relational Model*

- Structure of Relational Databases
- Fundamental Relational-Algebra-Operations
- Additional Relational-Algebra-Operations

*based on Silberschatz, Korth, Sudarshan Database System Concepts

5th Edition, Chapter 2

Basic Structure

- Formally, given sets D1, D2, …. Dn a relation r is a subset of D1 x D2 x … x DnThus, a relation is a set of n-tuples (a1, a2, …, an) where each ai Di

Example

If

- customer_name = {Jones, Smith, Curry, Lindsay, …} /* Set of all customer names */
- customer_street = {Main, North, Park, …} /* set of all street names*/
- customer_city = {Harrison, Rye, Pittsfield, …} /* set of all city names */

Then r = { (Jones, Main, Harrison), (Smith, North, Rye), (Curry, North, Rye), (Lindsay, Park, Pittsfield) } is a relation over

customer_name x customer_street x customer_city

Attribute Types

- Each attribute of a relation has a name
- The set of allowed values for each attribute is called the domain of the attribute
- Attribute values are (normally) required to be atomic; that is, indivisible
- E.g. the value of an attribute can be an account number, but cannot be a set of account numbers
- Domain is said to be atomic if all its members are atomic
- The special value null is a member of every domain
- The null value causes complications in the definition of many operations
- We shall ignore the effect of null values in our main presentation and consider their effect later

Relation Schema

- A1, A2, …, Anare attributes
- R = (A1, A2, …, An ) is a relation schema

Example:

Customer_schema = (customer_name, customer_street, customer_city)

- r(R) denotes a relationr on the relation schema R

Example:

customer (Customer_schema)

Relation Instance

- The current values (relation instance) of a relation are specified by a table
- An element t of r is a tuple, represented by a row in a table

attributes

(or columns)

customer_name

customer_street

customer_city

Jones

Smith

Curry

Lindsay

Main

North

North

Park

Harrison

Rye

Rye

Pittsfield

tuples

(or rows)

customer

Relations are Unordered

- Order of tuples is irrelevant (tuples may be stored in an arbitrary order)
- Example: account relation with unordered tuples

Database

- A database consists of multiple relations
- Information about an enterprise is broken up into parts, with each relation storing one part of the information

Storing all information as a single relation such as bank(account_number, balance, customer_name, ..)results in

- repetition of information
- e.g.,if two customers own an account (What gets repeated?)
- the need for null values
- e.g., to represent a customer without an account
- Normalization theory deals with how to design relational schemas

Keys

- Let K R
- K is a superkeyof R if values for K are sufficient to identify a unique tuple of each possible relation r(R)

Example: {customer_name, customer_street} and {customer_name} are both superkeys of Customer, if no two customers can possibly have the same name

Keys (Cont.)

- K is a candidate key if K is minimal
- Example: {customer_name} : superkey + no subset of it is a superkey.
- Primary key: a candidate key chosen as the principal means of identifying tuples within a relation

Foreign Keys

- A relation schema may have an attribute that corresponds to the primary key of another relation. The attribute is called a foreign key.

Query Languages

- Language in which user requests information from the database.
- Categories of languages
- Procedural
- Non-procedural, or declarative
- “Pure” languages:
- Relational algebra
- Tuple relational calculus
- Domain relational calculus
- Pure languages form underlying basis of query languages that people use.

Relational Algebra

- Procedural language
- Six basic operators
- select:
- project:
- union:
- set difference: –
- Cartesian product: x
- rename:
- The operators take one or two relations as inputs and produce a new relation as a result.

Select Operation

- Notation: p(r)
- p is called the selection predicate
- Defined as:p(r) = {t | t rand p(t)}

Where p is a formula in propositional calculus consisting of termsconnected by : (and), (or), (not)Each term is one of:

<attribute> op <attribute> or <constant>

where op is one of: =, , >, . <.

- Example of selection:branch_name=“Perryridge”(account)

Project Operation

- Notation: where A1, A2 are attribute names and r is a relation name.
- The result is defined as the relation of k columns obtained by erasing the columns that are not listed
- Duplicate rows removed from result, since relations are sets
- Example: To eliminate the branch_name attribute of accountaccount_number, balance (account)

Union Operation

- Notation: r s
- Defined as: r s = {t | t r or t s}
- For r s to be valid.

1. r,s must have the same arity (same number of attributes)

2. The attribute domains must be compatible (example: 2nd column of r deals with the same type of values as does the 2nd column of s)

- Example: to find all customers with either an account or a loan

customer_name (depositor) customer_name (borrower)

Set Difference Operation

- Notation r – s
- Defined as:

r – s = {t | t rand t s}

- Set differences must be taken between compatible relations.
- r and s must have the same arity
- attribute domains of r and s must be compatible

Cartesian-Product Operation – Example

A

B

C

D

E

- Relations r, s:

1

2

10

10

20

10

a

a

b

b

r

s

A

B

C

D

E

- rxs:

1

1

1

1

2

2

2

2

10

10

20

10

10

10

20

10

a

a

b

b

a

a

b

b

Cartesian-Product Operation

- Notation r x s
- Defined as:

r x s = {t q | t r and q s}

- Assume that attributes of r(R) and s(S) are disjoint. (That is, R S = ).
- If attributes of r(R) and s(S) are not disjoint, then renaming must be used.

Composition of Operations

- Can build expressions using multiple operations
- Example: A=C(r x s)
- r x s
- A=C(r x s)

A

B

C

D

E

1

1

1

1

2

2

2

2

10

10

20

10

10

10

20

10

a

a

b

b

a

a

b

b

A

B

C

D

E

10

10

20

a

a

b

1

2

2

Rename Operation

- Allows us to refer to results of RA expressions & to refer to a relation by more than one name.
- Example: x (E)

returns the expression E under the name X

- expression E under name X, with attributes renamed to A1 , A2 , …., An .

Relational Algebra Operators thus …

- Procedural language with six basic operators
- select:
- project:
- union:
- set difference: –
- Cartesian product: x
- rename:
- The operators take one or two relations as inputs and produce a new relation as a result.

Banking Example

branch (branch_name, branch_city, assets)

customer (customer_name, customer_street, customer_city)

account (account_number, branch_name, balance)

loan (loan_number, branch_name, amount)

depositor (customer_name, account_number)

borrower(customer_name, loan_number)

Example Queries

Find all loans of over $1200

amount> 1200 (loan)

Find the loan number for each loan of an amount greater than $1200

loan_number (amount> 1200 (loan))

Find the names of all customers who have a loan, an account, or both, from the bank

customer_name (borrower) customer_name (depositor)

Example Queries

Find the names of all customers who have a loan at the Perryridge branch.

customer_name (branch_name=“Perryridge”

(borrower.loan_number = loan.loan_number(borrower x loan)))

Find the names of all customers who have a loan at the Perryridge branch but do not have an account at any branch of the bank.

customer_name (branch_name = “Perryridge”

(borrower.loan_number = loan.loan_number(borrower x loan))) – customer_name(depositor)

Example Queries

Find the names of all customers who have a loan at the Perryridge branch.

Query 1

customer_name (branch_name = “Perryridge”

(borrower.loan_number = loan.loan_number (borrower x loan)))

Query 2

customer_name(loan.loan_number = borrower.loan_number ( (branch_name = “Perryridge” (loan)) x borrower))

Example Queries

- Find the largest account balance
- Strategy:
- Find those balances that are not the largest
- Rename account relation as d so that we can compare each account balance with all others
- Use set difference to find those account balances that were not found in the earlier step.
- The query is:

balance(account) - account.balance

(account.balance < d.balance(accountxrd(account)))

Formal Definition

- A basic expression in the relational algebra consists of either one of the following:
- A relation in the database
- A constant relation
- Let E1 and E2 be relational-algebra expressions; the following are all relational-algebra expressions:
- E1 E2
- E1–E2
- E1 x E2
- p (E1), P is a predicate on attributes in E1
- s(E1), S is a list consisting of some of the attributes in E1
- x(E1), x is the new name for the result of E1

Additional Operations

We define additional operations that do not add any power to the

relational algebra, but that simplify common queries.

- Set intersection
- Natural join
- Division
- Assignment

Set-Intersection Operation

- Notation: rs
- Defined as:
- rs = { t | trandts }
- Assume:
- r, s have the same arity
- attributes of r and s are compatible
- Note: rs = r – (r – s)

Natural-Join Operation

- Notation: r s

- Let r and s be relations on schemas R and S respectively. Then, r s is a relation on schema R S obtained as follows:
- Consider each pair of tuples tr from r and ts from s.
- If tr and ts have the same value on each of the attributes in RS, add a tuple t to the result, where
- t has the same value as tr on r
- t has the same value as ts on s

Natural joint example

- Example:

R = (A, B, C, D)

S = (E, B, D)

- Result schema = (A, B, C, D, E)
- rs is defined as:

r.A, r.B, r.C, r.D, s.E (r.B = s.B r.D = s.D (r x s))

Natural Join Operation – Example

- Relations r, s:

B

D

E

A

B

C

D

1

3

1

2

3

a

a

a

b

b

1

2

4

1

2

a

a

b

a

b

r

s

A

B

C

D

E

1

1

1

1

2

a

a

a

a

b

Division Operation

r s

- Notation:
- Suited to queries that include the phrase “for all”*.
- Let r and s be relations on schemas R and S respectively where
- R = (A1, …, Am , B1, …, Bn )
- S = (B1, …, Bn)

The result of r s is a relation on schema

R – S = (A1, …, Am)

r s = { t | t R-S (r) u s ( tu r ) }

Where tu means the concatenation of tuples t and u to produce a single tuple

Another Division Example

- Relations r, s:

A

B

C

D

E

D

E

a

a

a

a

a

a

a

a

a

a

b

a

b

a

b

b

1

1

1

1

3

1

1

1

a

b

1

1

s

r

- r s:

A

B

C

a

a

Division Operation (Cont.)

- Property
- Let q = r s
- Then q is the largest relation satisfying q x s r
- Definition in terms of the basic algebra operationLet r(R) and s(S) be relations, and let S R

r s = R-S (r ) – R-S ( ( R-S(r ) x s ) – R-S,S(r ))

To see why

- R-S,S (r) simply reorders attributes of r
- R-S (R-S(r ) x s ) – R-S,S(r) ) gives those tuples t in R-S(r ) such that for some tuple u s, tu r.

Assignment Operation

- The assignment operation () provides a convenient way to express complex queries.
- Write query as a sequential program consisting of
- a series of assignments
- followed by an expression whose value is displayed as a result of the query.
- Assignment must always be made to a temporary relation variable.
- Example: Write r s as

temp1 R-S (r )temp2 R-S ((temp1 x s ) – R-S,S (r ))result = temp1 – temp2

- The result to the right of the is assigned to the relation variable on the left of the .
- May use variable in subsequent expressions.

Bank Example Queries

- Find the names of all customers who have a loan and an account at the bank.

- Find the name of all customers who have a loan at the bank and the loan amount

customer_name (borrower) customer_name (depositor)

customer_name, loan_number, amount (borrower loan)

Bank Example Queries

- Find all customers who have an account from at least the “Downtown” and the “Uptown” branches.

- Query 1

customer_name (branch_name = “Downtown” (depositoraccount ))

customer_name (branch_name = “Uptown” (depositoraccount))

- Query 2

customer_name, branch_name(depositoraccount) temp(branch_name)({(“Downtown” ), (“Uptown” )})

Note that Query 2 uses a constant relation.

Bank Example Queries

- Find all customers who have an account at all branches located in Brooklyn city.

customer_name, branch_name(depositoraccount) branch_name (branch_city = “Brooklyn” (branch))

Library Case

reservation

book

author

name

department

Initials

title

name

publisher

date

year

cancelled

copy

borrow

name

department

department

from

cpYear

to

present

Library database

book ( ISBN, title, publisher, year )

author (ISBN, initials, name )

copy (barcode, ISBN, department, cpYear, present )

reservation (name, department, ISBN, date, cancelled )

borrow (name, department, barcode, from, to )

Library Questions (RA)

- List all the authors of books of which the library has a copy that has never been borrowed.
- List all the authors of books that have never been borrowed.
- List all the authors of which no book has ever been borrowed.

(simple) Employee database

- employee(person_name, street, city)
- works(person_name, company_name, salary)
- company(company_name, city)
- manages(person_name, manager_name)

Exercise 2.1.c

- Find the names of all employees who earn more than every employee of “SBC”.
- The “more than every...” clause cannot be expressed directly in the relational algebra.
- For all other employees there is an employee of “SBC” earning more. This can be described using a selection on a cartesian product (of works with works).
- We then use the difference to find the correct result.

Solution 2.1.c

person-name (W) –

- person-name( (w.salary<=w2.salary ^ w2.company-name = ‘SBC’ (W xW2(W) ))

Exercise 2.5.e

- Find all companies located in every city in which “SBC” is located.

Case 1: 1 city per company

C.company-name (

(C.city=SBC.city

(C (company )x (SBC.company-name=‘SBC’ (SBC (company ) ) )

Trivial!

Case 2: multiple cities per company

- Find all companies located in every city in which “SBC” is located.
- The “every city...” clause cannot be expressed directly in the relational algebra (except for division).
- Let’s see …

Case 2a: multiple cities: division

- All cities where SBC is located:
- SBCCity = city(company-name=‘SBC’ (company ) )
- Is SBCCity constant?
- Yes!
- So we can use it on the right hand side of the division.
- companies located in every city in which “SBC” is located:
- company SBCCity
- Still quite neat!!

Case 2b: multiple cities: no division

- Find all companies located in every city in which “SBC” is located.
- The “every city...” clause cannot be expressed directly in the relational algebra (except for division).
- For the other companies there is a city in which “SBC” is located and the other company is not.
- The cities where a company is not located are all the cities except the ones where the company is located.
- We thus need 2 times a difference in this query.
- Can also be formulated using a difference operator

Case 2b: multiple cities: no division

- For the other companies there is a city in which “SBC” is located and the other company is not = IntRes.
- E = company-name (company )x SBCCity
- IntRes = E – company
- We want not (IntRes):
- company-name (company ) - company-name (IntRes)

What is the meaning of:

SBCCity

Recognizing Types of Queries

- Identify the type of the following queries, and afterwards also translate them to the algebra (PSJ, union, intersection, difference, division):
- Give the name of customers that have a loan with a branch where they also have an account.
- Give the name of customers who have a loan at a branch where they do not have an account.
- Give the name of customers who have a loan at every branch where they have an account.
- Give the name of customers who have loans only at branches where they have an account.

Reading Queries

5. customer-name(balance>amount(

account depositor borrower loan))

6. branch-name(branch)

branch-name(branch-city customer-city(

branch account depositor customer))

7. X.customer-name (X.account-number = Y.account-number X.customer-name Y.customer-name (X(depositor) Y(depositor)))

Beer Database

- visits(drinker, bar)
- serves(bar, beer)
- likes(drinker, beer).

Beer questions with a difference

- Give all drinkers that visit bars that don’t serve any beer they like
- Give all drinkers that only visit bars that serve a beer they like
- Give all drinkers that only visit bars that serve no beer they like
- Give all drinkers that only visit bars that serve all beers they like (and maybe other beers as well)
- Give all drinkers that only visit bars that only serve beers they like (and thus serve nothing else)

Summary

- We have learned RA
- We have learned to perform simple and more complex queries in RA

Query optimisation

Download Presentation

Connecting to Server..