slide1
Download
Skip this Video
Download Presentation
Database Systems – Set Theory

Loading in 2 Seconds...

play fullscreen
1 / 71

Database Systems – Set Theory - PowerPoint PPT Presentation


  • 149 Views
  • Uploaded on

Database Systems – Set Theory. website. organization. first-year. category. www.zojjed.com. Walking Promotions. 2006. Fiction. www.racewalk.com. Walking Promotions. 1995. Health. www.greattreks.com. Walking Promotions. 2006. Travel. www.twofeetgallery.com. Walking Promotions.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Database Systems – Set Theory' - odessa


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Database Systems – Set Theory

website

organization

first-year

category

www.zojjed.com

Walking Promotions

2006

Fiction

www.racewalk.com

Walking Promotions

1995

Health

www.greattreks.com

Walking Promotions

2006

Travel

www.twofeetgallery.com

Walking Promotions

2004

Photograph

www.walkinghealthy.com

Walking Promotions

2002

www.cs.drexel.edu/~jsalvage

Drexel University

2005

Education

RELATIONS

  • A relational database consists of tables, each of which is assigned a unique name.
  • A row in a table represents a relationship among a set of values.
  • A table is a collection of such relationships.
  • Column Headers are commonly referred to as attributes

Websites-Schema=(website, organization, first-year, category)‏

websites relation:

Health

slide2

Database Systems – Set Theory

DOMAINS

  • A Domain is the set of permitted values for a column/attribute.
  • The domain can be any positive number as in the case with first year
  • The domain can be a series of letters up to a maximum number of letters as in the case with organization.
  • The domain can be valid web addresses, whose rules might be slightly more complicated.

If

D1 denotes the set of all websites

D2 denotes the set of all organizations

D3 denotes the set of all first years

D4 denotes the set of all categories

Any row of websites must contain a 4-tuple(v1,v2,v3, v4) where

v1 is a website in the domain D1

v2 is a organization in the domain D2

v3 is year in the domain D3

V4 is a category in the domain D4

Therefore, account is a subset of D1xD2xD3xD4.

slide3

Database Systems – Set Theory

DOMAINS

In general a table must be a subset of D1xD2x…xDn-1xDn

Tables vs. Relations

There exists a close relationship between this language and the terminology used in databases.

Instead of numbers DB’s use names.

Relation -> table

tuple -> row

Websites table has 6 tuples.

slide4

Database Systems – Set Theory

TUPLE NOTATION

If t is a variable denoting the first tuple relationship, then t[website] denotes the website attribute of the tuple t.

t[website] = “www.zojjed.com”

t[organization]=”Walking Promotions”

t[first-year] = 2006

t[category] = “Fiction”

Alternatively, using numbered attributes:

t[1] = “www.zojjed.com”

t[2]= ”Walking Promotions”

t[3] = 2006

t[4] = “Fiction”

t  r, indicate the tuple t is in the relation r

slide5

Database Systems – Set Theory

DOMAINS

It is possible for several attributes to have the same domain, i.e. several attributes can contain the same domain of possible values.

Later we will introduce a customer relation. It has a customer name; if Ialso had an employee table with the field employee name, technically they both have the same domain.

It depends upon how you look at it. If the domain is the set of all possible names, this is true.

What about the domains website and first-year. They are incompatible.

What about website and category? While they both store character strings, the website domain has a more limited domain.

In a set, an attribute may contain the value Null.

For now we will assume they do not.

slide6

Database Systems – Set Theory

DATABASE SCHEMAS

  • Logical design of the database
  • defines the type definition of a variable

DATABASE INSTANCE

  • Snapshot of the database at a given time
  • an instance of a variable

A database schema in relations is defined by using a capitalized name for the relationship-schema and a lowercase name of each attribute. An instance of a relation is represented by a lowercase name.

Websites-schema(website, organization, first-year, category)‏

A relation on the Website-schema is as follows:

websites(Website-schema)‏

Side notes, very important:

A relation has no order

A relation cannot contain duplicate tuples

slide7

Database Systems – Set Theory

website

first-name

last-name

www.zojjed.com

Derek

Jeter

www.zojjed.com

Chase

Utley

www.drexel.edu/~jsalvage

Jeremy

Johnson

www.racewalk.com

Ryan

Howard

www.zojjed.com

Ryan

Howard

Customers-Schema=(website, first-name, last-name)‏

customers Relation

Notice the website attribute appears in both the customers relation and Websites relation.

This is not a coincidence, fields often are repeated.

This allows distinct relations to be related.

If we wanted to gather website details for customer websites we need information from both relations

slide8

Database Systems – Set Theory

website

category

first-name

last-name

www.zojjed.com

Fiction

Derek

Jeter

www.zojjed.com

Fiction

Chase

Utley

www.cs.drexel.edu/~jsalvage

Education

Jeremy

Johnson

www.racewalk.com

Health

Ryan

Howard

www.zojjed.com

Fiction

Ryan

Howard

Combined information from website and customers relations

In real databases, unique id fields would be used to identify the customer and the website so the website name would not be repeated in both tables.

slide9

Database Systems – Set Theory

Instead of having two schemas, it’s possible to have one schema as follows:

WebsiteCustomers(website, organization, first-year, category, first-name, last-name)‏

What is wrong with this?

slide10

Database Systems – Set Theory

website

organization

first-year

category

first-name

last-name

www.zojjed.com

Walking Promotions

2006

Fiction

Derek

Jeter

www.racewalk.com

Walking Promotions

1995

Health

Ryan

Howard

www.greattreks.com

Walking Promotions

2006

Travel

Null

Null

www.twofeetgallery.com

Walking Promotions

2004

Photographs

Null

Null

www.walkinghealthy.com

Walking Promotions

2002

Health

Null

Null

www.zojjed.com

Walking Promotions

2006

Fiction

Ryan

Howard

www.zojjed.com

Walking Promotions

2006

Fiction

Chase

Utley

www.cs.drexel.edu/~jsalvage

Drexel University

2005

Education

Jeremy

Johnson

Redundant data as well as null fields.

slide11

Database Systems – Set Theory

website

date

hit-count

www.zojjed.com

5/20/2007

5

www.racewalk.com

5/20/2007

2019

www.greattreks.com

5/20/2007

1050

www.twofeetgallery.com

5/20/2007

32

www.walkinghealthy.com

5/20/2007

159

www.zojjed.com

5/21/2007

6

www.zojjed.com

5/22/2007

5

www.cs.drexel.edu/~jsalvage

5/20/2007

376

www.racewalk.com

5/21/2007

2099

hit-counts-Schema= (website, date, hit-count)‏

hit-counts relation

Is there anything wrong with the above relation?

slide12

Database Systems – Set Theory

website

date

hit-count

www.zojjed.com

5/20/2007

5

www.racewalk.com

5/20/2007

2019

www.greattreks.com

5/20/2007

1050

www.twofeetgallery.com

5/20/2007

32

www.walkinghealthy.com

5/20/2007

159

www.zojjed.com

5/21/2007

6

www.zojjed.com

5/22/2007

5

www.cs.drexel.edu/~jsalvage

5/20/2007

376

www.racewalk.com

5/21/2007

2099

hit-counts-Schema= (website, date, hit-count)‏

hit-counts relation

Is there anything wrong with the above relation?

No, there is no reason why we cannot list a website more than once.

slide13

Database Systems – Set Theory

website

hit-count

www.zojjed.com

5

www.racewalk.com

2019

www.greattreks.com

1050

www.twofeetgallery.com

32

www.walkinghealthy.com

159

www.zojjed.com

6

www.zojjed.com

5

www.cs.drexel.edu/~jsalvage

376

www.racewalk.com

2099

If we did not care about the date and only cared about the hit count, could we define the hit-counts Schema as follows:

hit-counts-Schema= (website, hit-count)‏

hit-counts relation:

slide14

Database Systems – Set Theory

website

hit-count

www.zojjed.com

5

www.racewalk.com

2019

www.greattreks.com

1050

www.twofeetgallery.com

32

www.walkinghealthy.com

159

www.zojjed.com

6

www.zojjed.com

5

www.cs.drexel.edu/~jsalvage

376

www.racewalk.com

2099

If we did not care about the date and only cared about the hit count, could we define the hit-counts Schema as follows:

hit-counts-Schema= (website, hit-count)‏

hit-counts relation:

In real databases there would be no problem, but we said that you cannot repeat tuples in a relation. So the answer is no, we cannot use this schema because of duplicates.

slide15

Database Systems – Set Theory

QUERY LANGUAGES

A query language is a language in which the user requests information from the database.

Can be procedural or non-procedural.

We will study Relational Algebra

It Is a procedural language consisting of sets of operations that take one or two relations as input and output a relation. Operations include:

  • select
  • project
  • union
  • set difference
  • Cartesian product
  • Rename
  • Intersection
  • Aggregate functions

We will also study various forms of joining relations.

slide16

Database Systems – Set Theory

Unary- operates on one relation

Binary – operates on a pair of relations

The Select Operation

Unary operation

Selects tuples that satisfy a given predicate

 - represents a select operation - sigma

<select condition>(R)‏

<selection condition> = <attribute name> <comparison op> <constant value> or

<selection condition> = <attribute name> <comparison op> <attribute name>

comparison operators are: =, <>, <, <=, >, >=

equal, not equal, less than, less than or equal to, greater than, greater than or equal to

slide17

Database Systems – Set Theory

website

date

hit-count

website

date

hit-count

www.zojjed.com

5/20/2007

5

www.zojjed.com

5/20/2007

5

www.zojjed.com

5/21/2007

6

www.racewalk.com

5/20/2007

2019

www.zojjed.com

5/22/2007

5

www.greattreks.com

5/20/2007

1050

www.twofeetgallery.com

5/20/2007

32

www.walkinghealthy.com

5/20/2007

159

www.zojjed.com

5/21/2007

6

www.zojjed.com

5/22/2007

5

www.cs.drexel.edu/~jsalvage

5/20/2007

376

www.racewalk.com

5/21/2007

2099

To select those tuples of the hit-counts relation where the website is “www.zojjed.com” we write.

website = “www.zojjed.com”(hit-counts)

This returns the relation:

hit-counts relation

slide18

Database Systems – Set Theory

website

date

hit-count

website

date

hit-count

www.zojjed.com

5/20/2007

5

www.racewalk.com

5/20/2007

2019

www.racewalk.com

5/20/2007

2019

www.greattreks.com

5/20/2007

1050

www.greattreks.com

5/20/2007

1050

ww.racewalk.com

5/21/2007

2099

www.twofeetgallery.com

5/20/2007

32

www.walkinghealthy.com

5/20/2007

159

www.zojjed.com

5/21/2007

6

www.zojjed.com

5/22/2007

5

www.cs.drexel.edu/~jsalvage

5/20/2007

376

www.racewalk.com

5/21/2007

2099

To select those tuples of the hit-counts relation where the hit-count is greater than 1000 we write.

hit-count > 1000 (hit-counts)

This returns the relation:

hit-counts relation

slide19

Database Systems – Set Theory

website

date

hit-count

website

date

hit-count

www.racewalk.com

5/20/2007

2019

www.zojjed.com

5/21/2007

6

www.greattreks.com

5/20/2007

1050

www.twofeetgallery.com

5/20/2007

32

www.walkinghealthy.com

5/20/2007

159

website

date

hit-count

www.zojjed.com

5/21/2007

6

www.zojjed.com

5/20/2007

5

www.cs.drexel.edu/~jsalvage

5/20/2007

376

www.zojjed.com

5/21/2007

6

www.racewalk.com

5/21/2007

2099

www.zojjed.com

5/22/2007

5

Can combine predicates with and, or, and not

To select those tuples of the hit-counts relation where the hit-count is greater than 5 and the website is www.zojjed.com, we write.

hit-count > 5 and website = “www.zojjed.com”(hit-counts)

This returns the relation:

hit-count > 5

website = “www.zojjed.com”

slide20

Database Systems – Set Theory

website

category

www.zojjed.com

Fiction

www.racewalk.com

Health

www.greattreks.com

Travel

www.twofeetgallery.com

Photographs

www.walkinghealthy.com

Health

www.cs.drexel.edu/~jsalvage

Education

The Project Operation

unary

returns arguments in relation without all attributes

duplicates are removed

 - represent project operation - pi

 <attribute list> (R)

website, category(Websites)‏

slide21

Database Systems – Set Theory

Composition of Relational Operations

Often we need to combine operations. Often we wish to select a set of tuples and limit the relation returned to a few attributes.

What if we want to find out only the websites that have had greater than 1000 hits in a given day?

First we must find out what tuples have hit counts greater than 1000.

We can accomplish this with the following relational query:

hit-count>1000 (hit-counts)‏

By using the Project operation we can remove the extra attributes such as hit-count and date and only return the values in the website column.

website(hit-count>1000 (hit-counts))‏

What is the relation that is returned?

slide22

Database Systems – Set Theory

website

date

hit-count

website

www.zojjed.com

5/20/2007

5

www.racewalk.com

www.racewalk.com

5/20/2007

2019

www.greattreks.com

www.greattreks.com

5/20/2007

1050

www.twofeetgallery.com

5/20/2007

32

www.walkinghealthy.com

5/20/2007

159

www.zojjed.com

5/21/2007

6

www.zojjed.com

5/22/2007

5

www.cs.drexel.edu/~jsalvage

5/20/2007

376

www.racewalk.com

5/21/2007

2099

What if we want to find out only the websites that have had greater than 1000 hits in a single day?

website(hit-count>1000 (hit-counts))‏

What is the relation that is returned?

hit-counts relation

slide23

Database Systems – Set Theory

Union Operator

binary

 - union operator

It is often useful to combine the results of queries.

Again, remember that set theory removes duplicates.

Relation 1  Relation 2 = Result Set

slide24

Database Systems – Set Theory

website

first-name

last-name

website

www.zojjed.com

Derek

Jeter

www.drexel.edu/~jsalvage

www.zojjed.com

Chase

Utley

www.racewalk.com

www.drexel.edu/~jsalvage

Jeremy

Johnson

www.zojjed.com

www.racewalk.com

Ryan

Howard

www.zojjed.com

Ryan

Howard

What is a query that returns all websites that have customers OR a hit count greater than 1000?

We need information from both the customers relation as well as the hit count relation.

First we need the names of all websites that have customers

website(customers)‏

customers Relation

slide25

Database Systems – Set Theory

website

date

hit-count

website

www.zojjed.com

5/20/2007

5

www.racewalk.com

www.racewalk.com

5/20/2007

2019

www.greattreks.com

www.greattreks.com

5/20/2007

1050

www.twofeetgallery.com

5/20/2007

32

www.walkinghealthy.com

5/20/2007

159

www.zojjed.com

5/21/2007

6

www.zojjed.com

5/22/2007

5

www.cs.drexel.edu/~jsalvage

5/20/2007

376

www.racewalk.com

5/21/2007

2099

Then we need the names of the websites that have a hit count greater than 1000:

website(hit-count>1000 (hit-counts))

hit-counts relation

slide26

Database Systems – Set Theory

website

www.drexel.edu/~jsalvage

www.racewalk.com

www.greattreks.com

www.zojjed.com

website

website

www.racewalk.com

www.drexel.edu/~jsalvage

www.greattreks.com

www.racewalk.com

www.zojjed.com

Combine the results using a union operation

website(customers) website(hit-count>1000 (hit-counts))

website(hit-count>1000 (hit-counts))‏

website(customers)‏

  • Remember, order is not important!
  • Unions MUST be of similar types
  • They MUST have the same number of attributes
  • The domains of the attributes MUST be the same
slide27

Database Systems – Set Theory

Intersection Operator

binary

∩ - intersection operator

Returns all tuples contained within both relations

Relation 1 ∩ Relation 2 = Result Set

slide28

Database Systems – Set Theory

website

www.drexel.edu/~jsalvage

www.racewalk.com

www.zojjed.com

website

www.racewalk.com

www.greattreks.com

What is a query that returns all websites that have customers AND a hit count greater than 1000?

First we need the names of all websites that have customers

website(customers)‏

Then we need the names of the websites that have a hit count greater than 1000:

website(hit-count>1000 (hit-counts))

slide29

Database Systems – Set Theory

website

www.racewalk.com

website

website

www.racewalk.com

www.drexel.edu/~jsalvage

www.greattreks.com

www.racewalk.com

www.zojjed.com

What is a query that returns all websites that have customers AND a hit count greater than 1000?

website(customers) ∩ website(hit-count>1000 (hit-counts))

website(hit-count>1000 (hit-counts))‏

website(customers)‏

slide30

Database Systems – Set Theory

The Set Difference Operation (MINUS)‏

binary

-, denotes set difference

Relation 1 - Relation 2 = Result Set

Finds tuples in one set but not in another

r – s, produces a set containing those tuples in r but not in s, operand order is significant.

slide31

Database Systems – Set Theory

website

www.drexel.edu/~jsalvage

www.racewalk.com

www.zojjed.com

website

www.racewalk.com

www.greattreks.com

Produce a list of websites who have a hit count > 1000 and do not have a customer.

We need the names of all websites that have customers

website(customers)‏

Then we need the names of the websites that have a hit count greater than 1000:

website(hit-count>1000 (hit-counts))

website(hit-count>1000 (hit-counts)) - website(customers)‏

slide32

Database Systems – Set Theory

website

www.greattreks.com

website

website

www.racewalk.com

www.drexel.edu/~jsalvage

www.greattreks.com

www.racewalk.com

www.zojjed.com

Produce a list of websites who have a hit count > 1000 and do not have a customer.

website(hit-count>1000 (hit-counts)) - website(customers)‏

website(hit-count>1000 (hit-counts))‏

website(customers)‏

Notice the attributes in R1 that are not in R2 are included in the result set, but the attributes in R2 that are not in R1 are not included in the result set, i.e. operand order is significant.

slide33

Database Systems – Set Theory

website

organization

first-year

category

www.zojjed.com

Walking Promotions

2006

Fiction

www.racewalk.com

Walking Promotions

1995

Health

www.greattreks.com

Walking Promotions

2006

Travel

www.twofeetgallery.com

Walking Promotions

2004

Photographs

www.walkinghealthy.com

Walking Promotions

2002

Health

www.cs.drexel.edu/~jsalvage

Drexel University

2005

Education

Given the relation websites below:

What is the result of the following?

website(empty set) - website(websites)‏

slide34

Database Systems – Set Theory

website

organization

first-year

category

www.zojjed.com

Walking Promotions

2006

Fiction

www.racewalk.com

Walking Promotions

1995

Health

www.greattreks.com

Walking Promotions

2006

Travel

www.twofeetgallery.com

Walking Promotions

2004

Photographs

www.walkinghealthy.com

Walking Promotions

2002

Health

www.cs.drexel.edu/~jsalvage

Drexel University

2005

Education

Given the relation websites below:

What is the result of the following?

website(empty set) - website(websites)‏

The result is the empty set, because no records are contained in “R1” and only the records in R1 that are not in R2 are returned from the MINUS operator.

slide35

Database Systems – Set Theory

website

organization

first-year

category

www.zojjed.com

Walking Promotions

2006

Fiction

www.racewalk.com

Walking Promotions

1995

Health

www.greattreks.com

Walking Promotions

2006

Travel

www.twofeetgallery.com

Walking Promotions

2004

Photographs

www.walkinghealthy.com

Walking Promotions

2002

Health

www.cs.drexel.edu/~jsalvage

Drexel University

2005

Education

Given the relation websites below:

What is the result of the following?

website(websites) - website(empty set)‏

slide36

Database Systems – Set Theory

website

www.zojjed.com

www.racewalk.com

www.greattreks.com

www.twofeetgallery.com

www.walkinghealthy.com

www.cs.drexel.edu/~jsalvage

The result of website(websites) - website(empty set) is the complete relation websites, since no records are contained in the empty set all records from the websites relation are included in the result set.

slide37

Database Systems – Set Theory

The Cartesian-Product Operation

binary

x – combines information in two relations

Relation 1 x Relation 2 = Result Set

because attributes can be repeated in different relations, a notation relation.attribute will be used.

Therefore, the resulting schema of r = websites x customers is:

(websites.website, websites.organization, websites.first-year, websites.category, customers.website, customers.first-name, customers.last-name)‏

Note this does not support cases where you wish to use the same relation twice; we will address this with the rename operation shortly.

What tuples exist in r if r = websites x customers?

The combination of all tuples in websites with every tuple in customers.

Given r1 with n1 tuples and r2 with n2 tuples then r1xr2 has n1*n2 tuples

slide38

Database Systems – Set Theory

R1.Value 1

R2.Value 2

Value1

1

A

1

1

B

2

1

C

3

2

A

2

B

2

C

Value2

3

A

A

B

3

B

C

3

C

Let’s look at a simplified example first.

If relation R1 contains the following:

Then R1 x R2 contains the following:

and if relation R2 contains the following:

slide39

Database Systems – Set Theory

websites.website

organization

first-year

category

customers.website

first-name

last-name

www.zojjed.com

Walking Promotions

2006

Fiction

www.zojjed.com

Derek

Jeter

www.zojjed.com

Walking Promotions

2006

Fiction

www.zojjed.com

Chase

Utley

www.zojjed.com

Walking Promotions

2006

Fiction

www.drexel.edu/~jsalvage

Jeremy

Johnson

www.zojjed.com

Walking Promotions

2006

Fiction

www.racewalk.com

Ryan

Howard

www.zojjed.com

Walking Promotions

2006

Fiction

www.zojjed.com

Ryan

Howard

www.racewalk.com

Walking Promotions

1995

Health

www.zojjed.com

Derek

Jeter

www.racewalk.com

Walking Promotions

1995

Health

www.zojjed.com

Chase

Utley

www.racewalk.com

Walking Promotions

1995

Health

www.drexel.edu/~jsalvage

Jeremy

Johnson

www.racewalk.com

Walking Promotions

1995

Health

www.racewalk.com

Ryan

Howard

www.racewalk.com

Walking Promotions

1995

Health

www.zojjed.com

Ryan

Howard

www.greattreks.com

Walking Promotions

2006

Travel

www.zojjed.com

Derek

Jeter

www.greattreks.com

Walking Promotions

2006

Travel

www.zojjed.com

Chase

Utley

www.greattreks.com

Walking Promotions

2006

Travel

www.drexel.edu/~jsalvage

Jeremy

Johnson

www.greattreks.com

Walking Promotions

2006

Travel

www.racewalk.com

Ryan

Howard

www.greattreks.com

Walking Promotions

2006

Travel

www.zojjed.com

Ryan

Howard

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

Similarly, websites x customers appears as follows:

slide40

Database Systems – Set Theory

websites.website

organization

first-year

category

customers.website

first-name

last-name

www.racewalk.com

Walking Promotions

1995

Health

www.zojjed.com

Derek

Jeter

www.racewalk.com

Walking Promotions

1995

Health

www.zojjed.com

Chase

Utley

www.racewalk.com

Walking Promotions

1995

Health

www.drexel.edu/~jsalvage

Jeremy

Johnson

www.racewalk.com

Walking Promotions

1995

Health

www.racewalk.com

Ryan

Howard

www.racewalk.com

Walking Promotions

1995

Health

www.zojjed.com

Ryan

Howard

What if we want to find all the customers who bought from a website created before the year 2000?

We could try the following:

first-year <2000 (websites x customers)‏

Note that we are not using a projection to reduce the number of attributes to show what is really happening. In the end, you would use a projection to show only the fields requested by the query.

Oops, too many tuples!

slide41

Database Systems – Set Theory

websites.website

organization

first-year

category

customers.website

first-name

last-name

www.racewalk.com

Walking Promotions

1995

Health

www.zojjed.com

Derek

Jeter

www.racewalk.com

Walking Promotions

1995

Health

www.zojjed.com

Chase

Utley

www.racewalk.com

Walking Promotions

1995

Health

www.drexel.edu/~jsalvage

Jeremy

Johnson

www.racewalk.com

Walking Promotions

1995

Health

www.racewalk.com

Ryan

Howard

www.racewalk.com

Walking Promotions

1995

Health

www.zojjed.com

Ryan

Howard

Because the Cartesian-product pairs all possible tuples, all tuples from websites are combined with all tuples from customers. While only those with the first-year < 2000 are selected, it still returns 5 tuples.

Of those sets, we only want the ones where the websites relation’s website attribute equals the customers relation’s website attribute.

The only tuple we truly want is the highlighted tuple.

we can write this as follows:

websites.website = customers.website(first-year <2000 (websites x customers))‏

slide42

Database Systems – Set Theory

websites.website

organization

first-year

category

customers.website

first-name

last-name

www.racewalk.com

Walking Promotions

1995

Health

www.racewalk.com

Ryan

Howard

Since,

websites.website = customers.website(first-year <2000 (websites x customers))‏

Returns the following tuple with too many attributes, we must also use a projection to remove the excessive attributes.

Applying the projection of first-name, last name to the previous query gives us the following query:

first-name, last-name(websites.website = customers.website(first-year <2000 (websites x customers)))‏

slide43

Database Systems – Set Theory

  • The Assignment Operator
  • unary
  • allows an expression to be assigned to a variable
  • NewRelation  OldRelation

For example:

  • 1200loans amount > 1200(loan)‏
  • or
  • result loan-number(1200loans)‏
  • Or
  • The Rename Operation
  • Unary

x(E) renames the expression E to x.

  • Relational-algebra expressions do not have a name that we can refer to them by using the rename operator,

 is roh.

slide44

Database Systems – Set Theory

Example, without using an aggregation function (not yet shown), find the largest hit count of any website for a single day. If the same max hit count exists more than once, you are only allowed to return a single tuple containing the answer.

We accomplish this in two steps:

First, compute a temporary relation consisting of hit counts less than the largest hit count.

Second, take the set difference (minus) between the relation hit-count(hit-counts) and the temporary relation

To accomplish the first step, compute all the websites hit counts compared to all the websites hit counts, in other words compute the Cartesian product of the relation hit-counts with itself.

hit-counts x hit-counts

However, we must rename one of the hit-counts relations so that we can identify the balance distinctly

slide45

Database Systems – Set Theory

hit-count

5

2019

1050

32

159

6

376

2099

Given the projection of only the hit-count field from the relation hit-counts via

hit-count(hit-counts)‏

We have:

If we rename the result of this projection

d (hit-count(hit-counts))‏

And thus create a cross product of the two relations as

hit-count(hit-counts) x d (hit-count(hit-counts))‏

slide46

Database Systems – Set Theory

hit-count

d(hit-count)‏

hit-count

d(hit-count)‏

hit-count

d(hit-count)‏

5

1050

5

159

5

5

2019

1050

2019

159

2019

5

1050

1050

1050

159

1050

5

32

1050

32

159

32

5

159

1050

159

159

159

5

6

1050

6

159

6

5

376

1050

376

159

376

5

2099

1050

2099

159

2099

5

5

32

5

6

5

2019

2019

32

2019

6

2019

2019

1050

32

1050

6

1050

2019

32

32

32

6

32

2019

159

32

159

6

159

2019

6

32

6

6

6

2019

376

32

376

6

376

2019

2099

32

2099

6

2099

2019

hit-count(hit-counts) x d (hit-count(hit-counts))‏

slide47

Database Systems – Set Theory

hit-count

d(hit-count)‏

5

376

2019

376

1050

376

32

376

159

376

6

376

376

376

2099

376

5

2099

2019

2099

1050

2099

32

2099

159

2099

6

2099

376

2099

2099

2099

hit-count(hit-counts) x d (hit-count(hit-counts))‏

slide48

Database Systems – Set Theory

hit-count

d(hit-count)‏

hit-count

d(hit-count)‏

hit-count

d(hit-count)‏

5

1050

5

159

5

5

2019

1050

2019

159

2019

5

1050

1050

1050

159

1050

5

32

1050

32

159

32

5

159

1050

159

159

159

5

6

1050

6

159

6

5

376

1050

376

159

376

5

2099

1050

2099

159

2099

5

5

32

5

6

5

2019

2019

32

2019

6

2019

2019

1050

32

1050

6

1050

2019

32

32

32

6

32

2019

159

32

159

6

159

2019

6

32

6

6

6

2019

376

32

376

6

376

2019

2099

32

2099

6

2099

2019

Now we select only those tuples that have the first attribute containing a value less than the second attribute, we do so with the following query:

hitcounts.hit-count < d.hit-count(hit-count(hit-counts) x d (hit-count(hit-counts)))‏

slide49

Database Systems – Set Theory

hit-count

d(hit-count)‏

5

376

2019

376

1050

376

32

376

159

376

6

376

376

376

2099

376

5

2099

2019

2099

1050

2099

32

2099

159

2099

6

2099

376

2099

2099

2099

hit-count(hit-counts) x d (hit-count(hit-counts))‏

slide50

Database Systems – Set Theory

hit-count

5

2019

1050

32

159

6

376

This certainly gives us a lot of tuples, but if we then project just the hit-count from the first column and remove the duplicates, we are left with the following:

This is the set containing most hit counts, but it does not include the largest hit count.

slide51

Database Systems – Set Theory

hit-count

2099

To get just the largest hit count we now simply subtract our result set from the projection of the original hit count relation as follows:

hit-count(hit-counts) - hit-count(hitcounts.hit-count < d.hit-count (hit-count(hit-counts) x d (hit-count(hit-counts))))‏

slide52

Database Systems – Set Theory

We need a better way to represent certain queries because the notation for joining two relations and only selecting records where the attributes match is too cumbersome. Therefore we have:

The Natural Join Operation

Binary

Result Set = R1 |x| R2

The natural join operation finds the Cartesian product of two relations, but only returns tuples where the attributes whose names are the same in both relations contain the same values.

slide53

Database Systems – Set Theory

Value1

Value2

Value2

Value3

1

X

X

A

2

Y

Z

B

3

Z

A

C

Let’s look at a simplified example first.

If relation R1 contains the following:

and if relation R2 contains the following:

slide54

Database Systems – Set Theory

R1.Value 1

R1.Value 2

R2.Value2

R2.Value3

1

X

X

A

1

X

Z

B

1

X

A

C

2

Y

X

A

2

Y

Z

B

2

Y

A

C

3

Z

X

A

3

Z

Z

B

3

Z

A

C

R1.Value 1

R1.Value 2

R2.Value2

R2.Value3

1

X

X

A

3

Z

Z

B

Then R1 x R2 contains the following:

Then R1 |x| R2 contains the following:

slide55

Database Systems – Set Theory

  • Example:
  • Find the names of all customers who have made a purchase from a health or travel website. Return the name of the customer, the website, and the category of the website.
  • The old way:
  • Form a Cartesian product of the websites and customers relations.
  • Select the tuples of the same website as well as a category equal to “health” or “travel.”
  • Project the first-name, last-name, website, and category
  • first-name, last-name, website, category

(websites.website = customers.website and

category = “Health” or category = “Travel”(websites x customers))‏

slide56

Database Systems – Set Theory

Another example:

Find all the names of websites and the dates they have a hit count for web sites that are in the health category.

slide57

Database Systems – Set Theory

Another example:

Find all the web site names and the dates that have hit counts for web sites in the health category.

website, date (category = “Health” (websites |x| hit-counts))‏

slide58

Database Systems – Set Theory

product

first-name

last-name

tax

total-cost

Zojjed!

Derek

Jeter

1.00

17.95

Zojjed!

Chase

Utley

1.00

17.95

VB .NET Coach

Jeremy

Johnson

0

54.95

Race Walk Like A Champion

Ryan

Howard

1.25

25.95

Zojjed!

Ryan

Howard

0

16.95

Generalized Projections

Allows basic arithmetic operations within fields of a tuple

Observe the Sales relation:

What was the price of the product sold minus the tax paid?

product, first-name, last-name, (total-cost – tax) as net-pay (Sales)‏

slide59

Database Systems – Set Theory

Aggregate Functions

takes a collection of values and returns a single value as a result

i.e

sum {1, 1, 3, 4, 4, 11} returns the value 24.

avg {1, 1, 3, 4, 4, 11} returns the value 4.

count {1, 1, 3, 4, 4, 11} returns the value 6.

min {1, 1, 3, 4, 4, 11} returns the value 1.

max {1, 1, 3, 4, 4, 11} returns the value 11.

count-distinct {1, 1, 3, 4, 4, 11} returns the value 4.

Ignore the fact that we said sets can’t contain duplicate values

slide60

Database Systems – Set Theory

Operations to Modify the Contents of Relations

Deletion

r  r – E

This uses set difference (minus) and variable assignment.

Delete all of the sale of “Zojjed!” from the Sales relation

sales  sales - product = “Zojjed!” (sales)‏

Delete all sales with no tax collected

sales  sales - tax = 0 (sales)‏

Insertion

r  r  E

This uses union and variable assignment.

Add a record to the sales relation

sales  sales  {(“I Walk to Eat”, “Chase”, “Utley”, 0, 15.95)}

Add a record to the websites relation

websites  websites  {(“www.mediatitan.net”, “Walking Promotions”, 2006, “Fiction”)}

You can also insert touples based on the result of another query.

slide61

Database Systems – Set Theory

Updating

Zero all tax attributes in the sales relation

sales  product, first-name, last-name, 0, total-cost(sales)‏

slide62

Database Systems – Set Theory

employee-name

city

employee-name

team

Jeter

New York City

Glavine

Mets

Howard

Philadelphia

Howard

Phillies

Utley

Philadelphia

Bonds

Giants

Schilling

Boston

Schilling

Choke Sox

employee-name

city

employee-name

team

Howard

Philadelphia

Howard

Phillies

Schilling

Boston

Schilling

Choke Sox

Joins

There are other forms of joins. Let’s look at the following two simple relations:

cities relation

teams relation

Natural Join -> cities |x| teams

The natural join omits records that do not match, so we do not have records for Jeter, Utley, Glavine, or Bonds.

slide63

Database Systems – Set Theory

employee-name

city

employee-name

team

Jeter

New York City

Glavine

Mets

Howard

Philadelphia

Howard

Phillies

Utley

Philadelphia

Bonds

Giants

Schilling

Boston

Schilling

Choke Sox

employee-name

city

employee-name

team

Jeter

New York City

Null

Null

Howard

Philadelphia

Howard

Phillies

Utley

Philadelphia

Null

Null

Schilling

Boston

Schilling

Choke Sox

cities relation

teams relation

Left Outer Join -> cities LOJ teams

Includes all records from the left and only those records on the right that match

slide64

Database Systems – Set Theory

Value1

Value2

Value2

Value3

1

X

X

A

2

Y

Z

B

3

Z

Z

C

A subtle note for Left Outer Join.

If relation R1 contains the following:

and if relation R2 contains the following:

slide65

Database Systems – Set Theory

R1.Value 1

R1.Value 2

R2.Value2

R2.Value3

1

X

X

A

2

Y

Null

Null

3

Z

Z

B

3

Z

Z

C

Then R1 LOJ R2 contains the following:

slide66

Database Systems – Set Theory

employee-name

city

employee-name

team

Jeter

New York City

Glavine

Mets

Howard

Philadelphia

Howard

Phillies

Utley

Philadelphia

Bonds

Giants

Schilling

Boston

Schilling

Choke Sox

employee-name

city

employee-name

team

Null

Null

Glavine

Mets

Howard

Philadelphia

Howard

Phillies

Null

Null

Bonds

Giants

Schilling

Boston

Schilling

Choke Sox

cities relation

teams relation

Right Outer Join -> cities ROJ teams

Includes all records from the right and only those records on the left that match

slide67

Database Systems – Set Theory

employee-name

city

employee-name

team

Jeter

New York City

Glavine

Mets

Howard

Philadelphia

Howard

Phillies

Utley

Philadelphia

Bonds

Giants

Schilling

Boston

Schilling

Choke Sox

employee-name

city

employee-name

team

Null

Null

Glavine

Mets

Howard

Philadelphia

Howard

Phillies

Null

Null

Bonds

Giants

Schilling

Boston

Schilling

Choke Sox

Jeter

New York City

Null

Null

Utley

Philadelphia

Null

Null

cities relation

teams relation

Full Outer Join -> cities FOJ teams

Includes all records from the right and left, tuples that do not match have nulls in their place.

slide68

Database Systems – Set Theory

NULLS

And:

true and unknown = unknown

false and unknown = false

unknown and unknown = unknown

Or

true or unknown = true

false or unknown = unknown

unknown or unknown = unknown

Not

not unknown = unknown

slide69

Database Systems – Set Theory

  • REFERENTIAL INTEGRITY
  • Superkey of R
  • A unique identifier for a tuple.
  • A set of attributes SK of R such that no two tuples in any valid relation instance r(R) will have the same value for SK. That is, for any distinct tuples t1 and t2 in r(R), t1[SK]  t2[SK].
  • Key of R
  • A "minimal" superkey; that is, a superkey K such that removal of any attribute from K results in a set of attributes that is not a superkey, i.e. every attribute of K is needed to maintain its superkey status.
    • Example: The CAR relation schema
    • CAR(State, Reg#, SerialNo, Make, Model, Year) has two keys
    • Key1 = {State, Reg#}
    • Key2 = {SerialNo}
    • Both are superkeys. {SerialNo, Make} is a superkey but not a key.
  • If a relation has several candidate keys, one is chosen arbitrarily to be the primary key. The primary key attributes are underlined.
slide70

Database Systems – Set Theory

REFERNTIAL INTEGRITY

Relational Database Schema

A set S of relation schemas that belong to the same database. S is the name of the database.

S = {R1, R2, ..., Rn}

Entity Integrity:

The primary key attributes PK of each relation schema R in S cannot have null values in any tuple of r(R). This is because primary key values are used to identify the individual tuples.

t[PK]  null for any tuple t in r(R)‏

Note, other attributes of R may be similarly constrained to disallow null values, even though they are not members of the primary key.

slide71

Database Systems – Set Theory

Referential Integrity

A constraint involving two relations (the previous constraints involve a single relation).

Used to specify a relationship among tuples in two relations: the referencing relation and the referenced relation.

Tuples in the referencing relation R1 have attributes FK (called foreign key attributes) that reference the primary key attributes PK of the referenced relation R2.

A tuple t1 in R1 is said to reference a tuple t2 in R2 if t1[FK] = t2[PK].

A referential integrity constraint can be displayed in a relational database schema as a directed arc from R1.FK to R2.

ad