lecture 3 relational algebra and sql l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Lecture 3: Relational Algebra and SQL PowerPoint Presentation
Download Presentation
Lecture 3: Relational Algebra and SQL

Loading in 2 Seconds...

play fullscreen
1 / 90

Lecture 3: Relational Algebra and SQL - PowerPoint PPT Presentation


  • 311 Views
  • Uploaded on

Lecture 3: Relational Algebra and SQL. Tuesday, March 25, 2008. Outline. Relational Algebra: 4.2 (except 4.2.5) SQL: 5.2, 5.3, 5.4. Querying the Database. Goal: specify what we want from our database Find all the employees who earn more than $50,000 and pay taxes in New Jersey.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Lecture 3: Relational Algebra and SQL' - colby


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
outline
Outline
  • Relational Algebra: 4.2 (except 4.2.5)
  • SQL: 5.2, 5.3, 5.4
querying the database
Querying the Database
  • Goal: specify what we want from our database
    • Find all the employees who earn more than $50,000 and pay taxes in New Jersey.
  • Could write in C++/Java, but bad idea
  • Instead use high-level query languages:
    • Theoretical: Relational Algebra, Datalog
    • Practical: SQL
  • Relational algebra: a basic set of operations on relations that provide the basic principles.
relational algebra
Relational Algebra
  • Operators: relations as input, new relation as output
  • Five basic RA operators:
    • Set Operators
      • union, difference
      • Selection: s
    • Projection: p
    • Cartesian Product: X
  • Derived operators:
    • Intersection, complement
    • Joins (natural,equi-join, theta join, semi-join)
  • When our relations have attribute names:
    • Renaming: r
set operations union
Set Operations: Union
  • Union: all tuples in R1 or R2
  • Notation: R1 U R2
  • R1, R2 must have the same schema
  • R1 U R2 has the same schema as R1, R2
  • Example:
    • ActiveEmployees U RetiredEmployees
union
Union
  • R  S
    • Union of two relations R and S defines a relation that contains all the tuples of R, or S, or both R and S, duplicate tuples being eliminated.
    • R and S must be union-compatible.
  • If R and S have I and J tuples, respectively, union is obtained by concatenating them into one relation with a maximum of (I + J) tuples.

Example: List all cities where there is either a branch office or a property for rent.

  • Pcity(Branch) union Pcity(PropertyForRent) or R = Branch[city] union PropertyForRent[city]
intersection
Intersection
  • R  S
    • Defines a relation consisting of the set of all tuples that are in both R and S.
    • R and S must be union-compatible.
  • Expressed using basic operations:

R  S = R – (R – S)

Example: List all cities where there is both a branch office and at least one property for rent.

city(Branch) city(PropertyForRent)

difference minus
Difference (Minus)
  • R – S
    • Defines a relation consisting of the tuples that are in relation R, but not in S.
    • R and S must be union-compatible.
  • List all cities where there is a branch office but no properties for rent.

city(Branch) – city(PropertyForRent)

Or R = Branch [city] - PropertyForRent [city]

set operations difference
Set Operations: Difference
  • Difference: all tuples in R1 and not in R2
  • Notation: R1 – R2
  • R1, R2 must have the same schema
  • R1 - R2 has the same schema as R1, R2
  • Example
    • AllEmployees - RetiredEmployees
set operations selection
Set Operations: Selection
  • Returns all tuples which satisfy a condition
  • Notation: sc(R)
  • c is a condition: =, <, >, and, or, not
  • Output schema: same as input schema
  • Find all employees with salary > $40,000:
    • s Salary > 40000(Employee)
restriction or selection
Restriction (or Selection)
  • Works on a single relation R and defines a relation that contains only those tuples (rows) of R that satisfy the specified condition (predicate).

Example: List all staff with a salary greater than US$10,000.

salary > 10000 (Staff) or R = STAFF Where Salary > 10000

projection
Projection
  • Unary operation: returns certain columns
  • Eliminates duplicate tuples !
  • Notation: P A1,…,An(R)
  • Input schema R(B1,…,Bm)
  • Condition: {A1, …, An}  {B1, …, Bm}
  • Output schema S(A1,…,An)
  • Example: project social-security number and names:
    • PSSN, Name (Employee)
projection16
Projection
  • col1, . . . , coln(R)
    • Works on a single relation R and defines a relation that contains a vertical subset of R, extracting the values of specified attributes and eliminating duplicates.
  • Produce a list of salaries for all staff, showing only staffNo, fName, lName, and salary details.

staffNo, fName, lName, salary(Staff) or Staff [staffNo, fName, lName, salary]

cartesian product
Cartesian Product
  • Each tuple in R1 with each tuple in R2
  • Notation: R1 x R2
  • Input schemas R1(A1,…,An), R2(B1,…,Bm)
  • Condition: {A1,…,An} ∩ {B1,…Bm} = F
  • Output schema is S(A1, …, An, B1, …, Bm)
  • Notation: R1 x R2
  • Example: Employee x Dependents
  • Very rare in practice; but joins are very often
cartesian product multiplication
Cartesian Product (Multiplication)
  • R X S
    • Defines a relation that is the concatenation of every tuple of relation R with every tuple of relation S.
  • List the names and comments of all clients who have viewed a property for rent.

(clientNo, fName, lName(Client)) X (clientNo, propertyNo, comment (Viewing)) or

Client [clientNo, fName, lName] x Viewing [clientNo, propertyNo, comment ]

example cartesian product and selection
Example - Cartesian product and Selection
  • Use selection operation to extract those tuples where Client.clientNo = Viewing.clientNo.

sClient.clientNo = Viewing.clientNo((ÕclientNo,fName,lName(Client))  (ÕclientNo,propertyNo,comment(Viewing)))

  • Cartesian product and Selection can be reduced to a single operation called a Join.
renaming
Renaming
  • Does not change the relational instance
  • Changes the relational schema only
  • Notation: rB1,…,Bn (R)
  • Input schema: R(A1, …, An)
  • Output schema: S(B1, …, Bn)
  • Example:
      • LastName, SocSocNo (Employee)
renaming example
Renaming Example

Employee

Name

SSN

John

999999999

Tony

777777777

  • LastName, SocSocNo (Employee)

LastName

SocSocNo

John

999999999

Tony

777777777

derived operations
Derived Operations
  • Intersection can be derived:
    • R1 ∩ R2 = R1 – (R1 – R2)
    • There is another way to express it (later)
  • Most importantly: joins, in many variants
slide25
Join
  • Join is a derivative of Cartesian product.
  • Equivalent to performing a Selection, using join predicate as selection formula, over Cartesian product of the two operand relations.
  • One of the most difficult operations to implement efficiently in an RDBMS and one reason why RDBMSs have intrinsic performance problems.
slide26
Join
  • Join is a derivative of Cartesian product.
  • Equivalent to performing a Selection, using join predicate as selection formula, over Cartesian product of the two operand relations.
  • One of the most difficult operations to implement efficiently in an RDBMS and one reason why RDBMSs have intrinsic performance problems.
  • Various forms of join operation
    • Natural join (defined by Codd)
    • Outer join
    • Theta join
    • Equijoin (a particular type of Theta join)
    • Semijoin
theta join join
Theta join (-join)
  • R FS
    • Defines a relation that contains tuples satisfying the predicate F from the Cartesian product of R and S.
    • The predicate F is of the form R.ai S.bi where  may be one of the comparison operators (<, , >, , =, ).
theta join join28
Theta join (-join)
  • Can rewrite Theta join using basic Selection and Cartesian product operations.

R FS = F(R  S)

  • Degree of a Theta join is sum of degrees of the operand relations R and S. If predicate F contains only equality (=), the term Equijoin is used.
example equijoin
Example - Equijoin
  • List the names and comments of all clients who have viewed a property for rent.

(clientNo, fName, lName(Client)) Client.clientNo = Viewing.clientNo (clientNo, propertyNo, comment(Viewing))

equijoin
EQUIJOIN
  • SELECT table.column, table.column FROM table1, table2 WHERE table1.column1 = table2.column2;
natural join
Natural Join
  • Notation: R1 R2
  • Input Schema: R1(A1, …, An), R2(B1, …, Bm)
  • Output Schema: S(C1,…,Cp)
    • Where {C1, …, Cp} = {A1, …, An} U {B1, …, Bm}
  • Meaning: combine all pairs of tuples in R1 and R2 that agree on the attributes:
    • {A1,…,An} ∩ {B1,…, Bm} (called the join attributes)
  • Equivalent to a cross product followed by selection
  • Example Employee Dependents
slide32

Employee Dependents =

PName, SSN, Dname(sSSN=SSN2(Employee x rSSN2, Dname(Dependents))

Natural Join Example

Employee

Name

SSN

John

999999999

Tony

777777777

Dependents

SSN

Dname

999999999

Emily

777777777

Joe

Name

SSN

Dname

John

999999999

Emily

Tony

777777777

Joe

natural join33
Natural Join
  • List the names and comments of all clients who have viewed a property for rent.

(clientNo, fName, lName(Client)) Join (clientNo, propertyNo, comment(Viewing))

Or Client [clientNo, fName, lName] Join Viewing [clientNo, propertyNo, comment ]

natural join35
Natural Join
  • Given the schemas R(A, B, C, D), S(A, C, E), what is the schema of R S ?
  • Given R(A, B, C), S(D, E), what is R S ?
  • Given R(A, B), S(A, B), what is R S ?
theta join
Theta Join
  • A join that involves a predicate
  • Notation: R1 q R2 where q is a condition
  • Input schemas: R1(A1,…,An), R2(B1,…,Bm)
  • {A1,…An} ∩ {B1,…,Bm} = f
  • Output schema: S(A1,…,An,B1,…,Bm)
  • Derived operator:

R1 q R2 = sq (R1 x R2)

semijoin
Semijoin
  • R S = PA1,…,An (R S)
  • Where the schemas are:
    • Input: R(A1,…An), S(B1,…,Bm)
    • Output: T(A1,…,An)
outer join
Outer join
  • To display rows in the result that do not have matching values in the join column, use Outer join.
  • R S
    • (Left) outer join is join in which tuples from R that do not have matching values in common columns of S are also included in result relation.
outer join39
Outer Join
  • To display rows in the result that do not have matching values in the join column, use Outer join.
  • R Left Outer Join S
    • (Left) outer join is join in which tuples from R that do not have matching values in common columns of S are also included in result relation.

Example:

  • Produce a status report on property viewings.

propertyNo, street, city(PropertyForRent)

Left Outer Join Viewing

types of joins
Types of Joins
  • Left Outer Join
    • keep all of the tuples from the “left” relation
    • join with the right relation
    • pad the non-matching tuples with nulls
  • Right Outer Join
    • same as the left, but keep tuples from the “right” relation
  • Full Outer Join
    • same as left, but keep all tuples from both relations
left outer join
Left Outer Join

name phone

name email

R=

S=

  • If we do a left outer join on R and S, and we match on the first column, the result is:

name phone email

example left outer join
Example - Left Outer join
  • Produce a status report on property viewings.

propertyNo, street, city(PropertyForRent)

Viewing

right outer join
Right Outer Join

name email

name phone

R=

S=

  • If we do a right outer join on R and S, and we match on the first column, the result is:

name phone email

full outer join
Full Outer Join

name email

name phone

R=

S=

  • If we do a full outer join on R and S, and we match on the first column, the result is:

name phone email

division
Division
  • Identify all clients who have viewed all properties with three rooms.

(clientNo, propertyNo(Viewing))  (propertyNo(rooms = 3 (PropertyForRent)))

natural join48
Natural join
  • R S
    • An Equijoin of the two relations R and S over all common attributes x. One occurrence of each common attribute is eliminated from the result.
example natural join
Example - Natural join
  • List the names and comments of all clients who have viewed a property for rent.

(clientNo, fName, lName(Client))

(clientNo, propertyNo, comment(Viewing))

semijoin50
Semijoin
  • R F S
    • Defines a relation that contains the tuples of R that participate in the join of R with S.
  • Can rewrite Semijoin using Projection and Join:
    • R F S = A(R F S)
example semijoin
Example - Semijoin
  • List complete details of all staff who work at the branch in Glasgow.

Staff Staff.branchNo = Branch.branchNo and Branch.city = ‘Glasgow’ Branch

division52
Division
  • R  S
    • Defines a relation over the attributes C that consists of set of tuples from R that match combination of every tuple in S.
  • Expressed using basic operations:

T1C(R)

T2C((S X T1) – R)

T  T1 – T2

example division
Example - Division
  • Identify all clients who have viewed all properties with three rooms.

(clientNo, propertyNo(Viewing))  (propertyNo(rooms = 3 (PropertyForRent)))

summary of relational algebra
Summary of Relational Algebra
  • Five basic operators, many derived
  • Combine operators in order to construct queries: relational algebra expressions, usually shown as trees
ra has limitations
RA has Limitations !
  • Cannot compute “transitive closure”
  • Find all direct and indirect relatives of Fred
  • Cannot express in RA !!! Need to write C program
equivalences
Equivalences

The same relational algebraic expression can be written in many different ways. The order in which tuples appear in relations is never significant.

  • A  B <=> B  A
  • A  B <=> B  A
  • A  B <=> B  A
  • (A - B) is not the same as (B - A)
  •  c1 ( c2 (A)) <=>  c2 ( c1 (A)) <=>  c1 ^ c2 (A)
  •  a1(A) <=>  a1( a1,etc(A)) , where etc is any attributes of A.
  • ...
operations on bags and why we care
Operations on Bags (and why we care)
  • Union: {a,b,b,c} U {a,b,b,b,e,f,f} = {a,a,b,b,b,b,b,c,e,f,f}
    • add the number of occurrences
  • Difference: {a,b,b,b,c,c} – {b,c,c,c,d} = {a,b,b,d}
    • subtract the number of occurrences
  • Intersection: {a,b,b,b,c,c}∩{b,b,c,c,c,c,d} = {b,b,c,c}
    • minimum of the two numbers of occurrences
  • Selection: preserve the number of occurrences
  • Projection: preserve the number of occurrences (no duplicate elimination)
  • Cartesian product, join: no duplicate elimination
sql introduction
SQL Introduction
  • Standard language for querying and manipulating data

Structured Query Language

  • Many standards out there: SQL92, SQL2, SQL3, SQL99
  • Vendors support various subsets of these, but all of what we’ll be talking about.
  • Works on bags, rather than sets
  • Basic construct:SELECT …FROM …WHERE …
selections
Selections

Company(sticker, name, country, stockPrice)

Find all US companies whose stock is > 50:

SELECT *

FROM Company

WHERE country=“USA” AND stockPrice > 50

Output schema: R(sticker, name, country, stockPrice)

selections65
Selections

What you can use in WHERE:

  • attribute names of the relation(s) used in the FROM.
  • comparison operators: =, <>, <, >, <=, >=
  • apply arithmetic operations: stockprice*2
  • operations on strings (e.g., “||” for concatenation).
  • Lexicographic order on strings.
  • Pattern matching: s LIKE p
  • Special stuff for comparing dates and times.
the like operator
The LIKE operator
  • s LIKE p: pattern matching on strings
  • p may contain two special symbols:
    • % = any sequence of characters
    • _ = any single character

Company(sticker, name, address, country, stockPrice)

Find all US companies whose address contains “Mountain”:

SELECT *

FROM Company

WHERE country=“USA” AND

address LIKE “%Mountain%”

  • Needed in the 1st assignment !
projections
Projections

Select only a subset of the attributes

SELECT name, stockPrice

FROM Company

WHERE country=“USA” AND stockPrice > 50

Input schema: Company(sticker, name, country, stockPrice)

Output schema: R(name, stock price)

projections with renamings
Projections with Renamings

Rename the attributes in the resulting table

SELECT name AS company, stockprice AS price

FROM Company

WHERE country=“USA” AND stockPrice > 50

Input schema: Company(sticker, name, country, stockPrice)

Output schema: R(company, price)

eliminating duplicates
Eliminating Duplicates

SELECTDISTINCT country

FROM Company

WHERE stockPrice > 50

Without DISTINCT the result is a bag

ordering the results
Ordering the Results

SELECT name, stockPrice

FROM Company

WHERE country=“USA” AND stockPrice > 50

ORDERBY country, name

Ordering is ascending, unless you specify the DESC keyword.

Ties are broken by the second attribute on the ORDERBY list, etc.

joins
Joins

Product ( pname, price, category, maker)

Purchase (buyer, seller, store, product)

Company (cname, stockPrice, country)

Person( per-name, phoneNumber, city)

Find names of people living in Seattle that bought gizmo products, and the names of the stores they bought from

SELECT per-name, store

FROM Person, Purchase

WHERE per-name=buyer AND city=“Seattle”

AND product=“gizmo”

disambiguating attributes
Disambiguating Attributes

Find names of people buying telephony products:

SELECT Person.name

FROM Person, Purchase, Product

WHERE Person.name=buyer

AND product=Product.name

AND Product.category=“telephony”

Product (name, price, category, maker)

Purchase (buyer, seller, store, product)

Person(name, phoneNumber, city)

tuple variables
Tuple Variables

Find pairs of companies making products in the same category

SELECT product1.maker, product2.maker

FROM Product AS product1, Product AS product2

WHERE product1.category=product2.category

AND product1.maker <> product2.maker

Product ( name, price, category, maker)

tuple variables74
Tuple Variables
  • Tuple variables introduced automatically by the system:
  • Product ( name, price, category, maker)
  • SELECT name
  • FROM Product
  • WHERE price > 100
  • Becomes:
  • SELECT Product.name
  • FROM Product AS Product
  • WHERE Product.price > 100
  • Doesn’t work when Product occurs more than once.
meaning semantics of sql queries
Meaning (Semantics) of SQL Queries

SELECT a1, a2, …, ak

FROM R1 AS x1, R2 AS x2, …, Rn AS xn

WHERE Conditions

1. Nested loops:

Answer = {}

for x1 in R1 do

for x2 in R2 do

…..

for xn in Rn do

if Conditions

then Answer = Answer U {(a1,…,ak)}

return Answer

meaning semantics of sql queries76
Meaning (Semantics) of SQL Queries

SELECT a1, a2, …, ak

FROM R1 AS x1, R2 AS x2, …, Rn AS xn

WHERE Conditions

2. Parallel assignment

Answer = {}

for all assignments x1 in R1, …, xn in Rn do

if Conditions then Answer = Answer U {(a1,…,ak)}

return Answer

Doesn’t impose any order !

meaning semantics of sql queries77
Meaning (Semantics) of SQL Queries

SELECT a1, a2, …, ak

FROM R1 AS x1, R2 AS x2, …, Rn AS xn

WHERE Conditions

3. Translation to Relational algebra:

Pa1,,…,ak ( sConditions (R1 x R2 x … x Rn))

Select-From-Where queries are precisely Select-Project-Join

first unintuitive sqlism
First Unintuitive SQLism

SELECT R.A

FROM R, S, T

WHERE R.A=S.A OR R.A=T.A

Looking for R ∩ (S U T)

But what happens if T is empty?

union intersection difference
Union, Intersection, Difference

(SELECT name

FROM Person

WHERE City=“Seattle”)

UNION

(SELECT name

FROM Person, Purchase

WHERE buyer=name AND store=“The Bon”)

Similarly, you can use INTERSECT and EXCEPT.

You must have the same attribute names (otherwise: rename).

conserving duplicates
Conserving Duplicates

The UNION, INTERSECTION and EXCEPT operators

operate as sets, not bags.

(SELECT name

FROM Person

WHERE City=“Seattle”)

UNION ALL

(SELECT name

FROM Person, Purchase

WHERE buyer=name AND store=“The Bon”)

subqueries
Subqueries

A subquery producing a single tuple:

SELECT Purchase.product

FROM Purchase

WHERE buyer =

(SELECT name

FROM Person

WHERE ssn = “123456789”);

In this case, the subquery returns one value.

If it returns more, it’s a run-time error.

slide82
Can say the same thing without a subquery:

SELECT Purchase.product

FROM Purchase, Person

WHERE buyer = name AND ssn = “123456789”

Is this query equivalent to the previous one ?

subqueries returning relations
Subqueries Returning Relations

Find companies who manufacture products bought by Joe Blow.

SELECT Company.name

FROM Company, Product

WHERE Company.name=maker

AND Product.name IN

(SELECT product

FROM Purchase

WHERE buyer = “Joe Blow”);

Here the subquery returns a set of values

subqueries returning relations84
Subqueries Returning Relations

Equivalent to:

SELECT Company.name

FROM Company, Product, Purchase

WHERE Company.name=maker

AND Product.name = product

AND buyer = “Joe Blow”

Is this query equivalent to the previous one ?

subqueries returning relations85
Subqueries Returning Relations

You can also use: s > ALL R

s > ANY R

EXISTS R

Product ( pname, price, category, maker)

Find products that are more expensive than all those produced

By “Gizmo-Works”

SELECT name

FROM Product

WHERE price > ALL (SELECT price

FROM Purchase

WHERE maker=“Gizmo-Works”)

question for database fans
Question for Database Fans
  • Can we express this query as a single SELECT-FROM-WHERE query, without subqueries ?
  • Hint: show that all SFW queries are monotone (figure out what this means). A query with ALL is not monotone
conditions on tuples
Conditions on Tuples

SELECT Company.name

FROM Company, Product

WHERE Company.name=maker

AND (Product.name,price) IN

(SELECT product, price)

FROM Purchase

WHERE buyer = “Joe Blow”);

correlated queries
Correlated Queries

Movie (title, year, director, length)

Find movies whose title appears more than once.

correlation

SELECT title

FROM Movie AS x

WHERE year < ANY

(SELECT year

FROM Movie

WHERE title = x.title);

Note (1) scope of variables (2) this can still be expressed as single SFW

complex correlated query
Complex Correlated Query

Product ( pname, price, category, maker, year)

  • Find products (and their manufacturers) that are more expensive than all products made by the same manufacturer before 1972

SELECT pname, maker

FROM Product AS x

WHERE price > ALL (SELECT price

FROM Product AS y

WHERE x.maker = y.maker AND y.year < 1972);

Powerful, but much harder to optimize !

exercises write ra and sql expressions
Exercises: write RA and SQL expressions

Product ( pname, price, category, maker)

Purchase (buyer, seller, store, product)

Company (cname, stock price, country)

Person( per-name, phone number, city)

Ex #1: Find people who bought telephony products.

Ex #2: Find names of people who bought American products

Ex #3: Find names of people who bought American products and did

not buy French products

Ex #4: Find names of people who bought American products and they

live in Seattle.

Ex #5: Find people who bought stuff from Joe or bought products

from a company whose stock prices is more than $50.