- 65 Views
- Uploaded on
- Presentation posted in: General

IS698: Database Management

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

IS698: Database Management

Min Song

IS

NJIT

The Relational Data Model

- Data is actually stored as bits, but it is difficult to work with data at this level.
- It is convenient to view data at different levels of abstraction.
- Schema: Description of data at some level. Each level has its own schema.

billing

payroll

records

External

schemas

View 1

View 2

View 3

Conceptual schema

Physical schema

- A particular way of structuring data (relations)
- Simple
- Mathematically based
- Expressions (queries) can be analyzed by DBMS
- Transformed to equivalent expressions automatically (query optimization)

- Relation is a set of tuples
- Tuple ordering immaterial
- No duplicates
- Cardinality of relation = number of tuples

- All tuples in a relation have the same structure; constructed from the same set of attributes

Id Name Address Status

Student

- Mathematical entity corresponding to a table
- row ~ tuple
- column ~ attribute

- Values in a tuple are related to each other
- John lives at 123 Main

- Relation R can be thought of as predicate R
- R(x,y,z) is true iff tuple (x,y,z) is in R

- Relation name
- Attribute names and domains
- Integrity constraints - e.g.,:
- The values of a particular attribute in all tuples are unique
- The values of a particular attribute in all tuples are greater than 0

- Finite set of relations
- Each relation consists of a schema and an instance
- Database schema = set of relation schemas (and other things)
- Database instance = set of (corresponding) relation instances

- Student (Id: INT, Name: STRING, Address: STRING, Status: STRING)
- Professor (Id: INT, Name: STRING, DeptId: DEPTS)
- Course (DeptId: DEPTS, CrsName: STRING, CrsCode: COURSES)
- Transcript (CrsCode: COURSES, StudId: INT, Grade: GRADES, Semester: SEMESTERS)
- Department(DeptId: DEPTS, Name: STRING)

- Part of schema
- Restriction on state (or sequence of states) of data base
- Enforced by DBMS
- Intra-relational - involve only one relation
- Part of relation schema
- e.g., all Ids are unique

- Inter-relational - involve several relations
- Part of relation schema or database schema

- Values in a column (or columns) of a relation are unique: at most one row in a relation instance can contain a particular value(s)
- Key - set of attributes satisfying key constraint
- e.g., Id in Student,
- e.g., (StudId, CrsCode, Semester) in Transcript

- Minimality - no subset of a key is a key
- (StudId, CrsCode) is not a key of Transcript

- Superkey - set of attributes containing key
- (Id, Name) is a superkey of Student

- Every relation has a key
- Relation can have several keys:
- primary key (Id in Student) – (cannot be null)
- candidate key ((Name, Address) in Student)

- Referential integrity => Item named in one relation must correspond to tuple(s) in another that describes the item
- Transcript (CrsCode) references Course(CrsCode )
- Professor(DeptId) references Department(DeptId)

- a1 is a foreign key of R1 referring to a2 in R2 => if v is a value of a1, there is a unique tuple of R2 in which a2 has value v
- This is a special case of referential integrity: a2 must be a candidate key of R2 (CrsCode is a key of Course)
- If no row exists in R2 => violation of referential integrity
- Not all rows of R2 need to be referenced.: relationship is not symmetric (some course might not be taught)
- Value of a foreign key might not be specified (DeptId column of some professor might be null)

a2

v3

v5

v1

v6

v2

v7

v4

a1

v1

v2

v3

v4

--

v3

R1

R2

Foreign key

Candidate key

- Names of a1 and a2 need not be the same.
- With tables:
ProfId attribute of Teaching references Id attribute of Professor

- With tables:
- R1 and R2 need not be distinct.
- Employee(Id:INT, MgrId:INT, ….)
- Employee(MgrId) references Employee(Id)

- Every manager is also an employee and hence has a unique row in Employee

- Employee(Id:INT, MgrId:INT, ….)

Teaching(CrsCode: COURSES, Sem: SEMESTERS, ProfId: INT)

Professor(Id: INT, Name: STRING, DeptId: DEPTS)

Relational Algebra

Operators

Expression Trees

- Mathematical system consisting of:
- Operands --- variables or values from which new values can be constructed.
- Operators --- symbols denoting procedures that construct new values from given values.

- An algebra whose operands are relations or variables that represent relations.
- Operators are designed to do the most common things that we need to do with relations in a database.
- The result is an algebra that can be used as a query language for relations.

- There is a core relational algebra that has traditionally been thought of as the relational algebra.
- But there are several other operators we shall add to the core in order to model better the language SQL --- the principal language used in relational database systems.

- Union, intersection, and difference.
- Usual set operations, but require both operands have the same relation schema.

- Selection: picking certain rows.
- Projection: picking certain columns.
- Products and joins: compositions of relations.
- Renaming of relations and attributes.

- R1 := SELECTC (R2)
- C is a condition (as in “if” statements) that refers to attributes of R2.
- R1 is all those tuples of R2 that satisfy C.

JoeMenu := SELECTbar=“Joe’s”(Sells):

barbeerprice

Joe’sBud2.50

Joe’sMiller2.75

Relation Sells:

barbeerprice

Joe’sBud2.50

Joe’sMiller2.75

Sue’sBud2.50

Sue’sMiller3.00

- R1 := PROJL (R2)
- L is a list of attributes from the schema of R2.
- R1 is constructed by looking at each tuple of R2, extracting the attributes on list L, in the order specified, and creating from those components a tuple for R1.
- Eliminate duplicate tuples, if any.

Prices := PROJbeer,price(Sells):

beerprice

Bud2.50

Miller2.75

Miller3.00

Relation Sells:

barbeerprice

Joe’sBud2.50

Joe’sMiller2.75

Sue’sBud2.50

Sue’sMiller3.00

- R3 := R1 * R2
- Pair each tuple t1 of R1 with each tuple t2 of R2.
- Concatenation t1t2 is a tuple of R3.
- Schema of R3 is the attributes of R1 and R2, in order.
- But beware attribute A of the same name in R1 and R2: use R1.A and R2.A.

R3(A,R1.B,R2.B,C )

1256

1278

12910

3456

3478

34910

R1(A,B )

12

34

R2(B,C )

56

78

910

- R3 := R1 JOINC R2
- Take the product R1 * R2.
- Then apply SELECTC to the result.

- As for SELECT, C can be any boolean-valued condition.
- Historic versions of this operator allowed only A theta B, where theta was =, <, etc.; hence the name “theta-join.”

BarInfo(bar,beer,price,name,addr )

Joe’sBud2.50Joe’sMaple St.

Joe’sMiller2.75Joe’sMaple St.

Sue’sBud2.50Sue’sRiver Rd.

Sue’sCoors3.00Sue’sRiver Rd.

Sells(bar,beer,price )Bars(name,addr )

Joe’sBud2.50Joe’sMaple St.

Joe’sMiller2.75Sue’sRiver Rd.

Sue’sBud2.50

Sue’sCoors3.00

BarInfo := Sells JOIN Sells.bar = Bars.name Bars

- A frequent type of join connects two relations by:
- Equating attributes of the same name, and
- Projecting out one copy of each pair of equated attributes.

- Called natural join.
- Denoted R3 := R1 JOIN R2.

BarInfo(bar,beer,price,addr )

Joe’sBud2.50Maple St.

Joe’sMilller2.75Maple St.

Sue’sBud2.50River Rd.

Sue’sCoors3.00River Rd.

Sells(bar,beer,price )Bars(bar,addr )

Joe’sBud2.50Joe’sMaple St.

Joe’sMiller2.75Sue’sRiver Rd.

Sue’sBud2.50

Sue’sCoors3.00

BarInfo := Sells JOIN Bars

Note Bars.name has become Bars.bar to make the natural

join “work.”

- The RENAME operator gives a new schema to a relation.
- R1 := RENAMER1(A1,…,An)(R2) makes R1 be a relation with attributes A1,…,An and the same tuples as R2.
- Simplified notation: R1(A1,…,An) := R2.

R(bar, addr )

Joe’sMaple St.

Sue’sRiver Rd.

Bars(name, addr )

Joe’sMaple St.

Sue’sRiver Rd.

R(bar, addr) := Bars

- Algebras allow us to express sequences of operations in a natural way.
- Example: in arithmetic --- (x + 4)*(y - 3).

- Relational algebra allows the same.
- Three notations, just as in arithmetic:
- Sequences of assignment statements.
- Expressions with several operators.
- Expression trees.

- Create temporary relation names.
- Renaming can be implied by giving relations a list of attributes.
- Example: R3 := R1 JOINC R2 can be written:
R4 := R1 * R2

R3 := SELECTC (R4)

- Example: the theta-join R3 := R1 JOINC R2 can be written: R3 := SELECTC (R1 * R2)
- Precedence of relational operators:
- Unary operators --- select, project, rename --- have highest precedence, bind first.
- Then come products and joins.
- Then intersection.
- Finally, union and set difference bind last.

- Leaves are operands --- either variables standing for relations or particular, constant relations.
- Interior nodes are operators, applied to their child or children.

- Using the relations Bars(name, addr) and Sells(bar, beer, price), find the names of all the bars that are either on Maple St. or sell Bud for less than $3.

UNION

RENAMER(name)

PROJECTname

PROJECTbar

SELECTaddr = “Maple St.”

SELECT price<3 AND beer=“Bud”

Bars

Sells

- Using Sells(bar, beer, price), find the bars that sell two different beers at the same price.
- Strategy: by renaming, define a copy of Sells, called S(bar, beer1, price). The natural join of Sells and S consists of quadruples (bar, beer, beer1, price) such that the bar sells both beers at this price.

PROJECTbar

SELECTbeer != beer1

JOIN

RENAMES(bar, beer1, price)

Sells

Sells

- A bag is like a set, but an element may appear more than once.
- Multiset is another name for “bag.”

- Example: {1,2,1,3} is a bag. {1,2,3} is also a bag that happens to be a set.
- Bags also resemble lists, but order in a bag is unimportant.
- Example: {1,2,1} = {1,1,2} as bags, but [1,2,1] != [1,1,2] as lists.

- SQL, the most important query language for relational databases is actually a bag language.
- SQL will eliminate duplicates, but usually only if you ask it to do so explicitly.

- Some operations, like projection, are much more efficient on bags than sets.

- Selection applies to each tuple, so its effect on bags is like its effect on sets.
- Projection also applies to each tuple, but as a bag operator, we do not eliminate duplicates.
- Products and joins are done on each pair of tuples, so duplicates in bags have no effect on how we operate.

SELECTA+B<5(R) =AB

12

12

R(A,B )S(B,C )

1234

5678

12

R(A,B )S(B,C )

1234

5678

12

PROJECTA(R) =A

1

5

1

R * S =AR.BS.BC

1234

1278

5634

5678

1234

1278

R(A,B )S(B,C )

1234

5678

12

R JOIN R.B<S.B S =AR.BS.BC

1234

1278

5678

1234

1278

R(A,B )S(B,C )

1234

5678

12

- Union, intersection, and difference need new definitions for bags.
- An element appears in the union of two bags the sum of the number of times it appears in each bag.
- Example: {1,2,1} UNION {1,1,2,3,1} = {1,1,1,1,1,2,2,3}

- An element appears in the intersection of two bags the minimum of the number of times it appears in either.
- Example: {1,2,1} INTER {1,2,3} = {1,2}.

- An element appears in the difference A – B of bags as many times as it appears in A, minus the number of times it appears in B.
- But never less than 0 times.

- Example: {1,2,1} – {1,2,3} = {1}.

- Not all algebraic laws that hold for sets also hold for bags.
- For one example, the commutative law for union (R UNION S = S UNION R ) does hold for bags.
- Since addition is commutative, adding the number of times x appears in R and S doesn’t depend on the order of R and S.

- Set union is idempotent, meaning that S UNION S = S.
- However, for bags, if x appears n times in S, then it appears 2n times in S UNION S.
- Thus S UNION S != S in general.

- DELTA = eliminate duplicates from bags.
- TAU = sort tuples.
- Extendedprojection : arithmetic, duplication of columns.
- GAMMA = grouping and aggregation.
- OUTERJOIN: avoids “dangling tuples” = tuples that do not join with anything.

- R1 := DELTA(R2).
- R1 consists of one copy of each tuple that appears in R2 one or more times.

DELTA(R) =AB

12

34

R =AB

12

34

12

- Using the same PROJL operator, we allow the list L to contain arbitrary expressions involving attributes, for example:
- Arithmetic on attributes, e.g., A+B.
- Duplicate occurrences of the same attribute.

PROJA+B,A,A (R) =A+BA1A2

311

733

R =AB

12

34

- Aggregation operators are not operators of relational algebra.
- Rather, they apply to entire columns of a table and produce a single result.
- The most important examples: SUM, AVG, COUNT, MIN, and MAX.

R =AB

13

34

32

SUM(A) = 7

COUNT(A) = 3

MAX(B) = 4

AVG(B) = 3

- R1 := GAMMAL (R2). L is a list of elements that are either:
- Individual (grouping ) attributes.
- AGG(A ), where AGG is one of the aggregation operators and A is an attribute.

- Group R according to all the grouping attributes on list L.
- That is, form one group for each distinct list of values for those attributes in R.

- Within each group, compute AGG(A ) for each aggregation on list L.
- Result has grouping attributes and aggregations as attributes. One tuple for each list of values for the grouping attributes and their group’s aggregations.

Then, average C within

groups:

ABAVG(C)

12 4

45 6

First, group R :

ABC

123

125

456

R =ABC

123

456

125

GAMMAA,B,AVG(C) (R) = ??

- Suppose we join R JOINC S.
- A tuple of R that has no tuple of S with which it joins is said to be dangling.
- Similarly for a tuple of S.

- Outerjoin preserves dangling tuples by padding them with a special NULL symbol in the result.

R OUTERJOIN S =ABC

123

45NULL

NULL67

R = ABS =BC

1223

4567

(1,2) joins with (2,3), but the other two tuples

are dangling.