1 / 51

Relational Model

Relational Model. Prof. Sharad Mehrotra Information and Computer Science Department University of California at Irvine Chapter 3 and 6 from SKS. Outline. Relational model basic modeling concepts: relation or a table Integrity constraints in the relational model:

vidah
Download Presentation

Relational Model

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Relational Model Prof. Sharad Mehrotra Information and Computer Science Department University of California at Irvine Chapter 3 and 6 from SKS

  2. Outline • Relational model • basic modeling concepts: • relation or a table • Integrity constraints in the relational model: • referential integrity constraints • key constraints • functional dependencies • multivalued dependencies • join dependencies • Relational Languages • relational algebra (algebraic) • basic operators • expressions in relational algebra • relational calculus (logic based) /*will not be covered in class */ • safety

  3. Relational Model -- Quick Example • A relational schema consists of a set of tables and constraints. • Tables: customer, account • Constraints: • Key constraints: • ssno is the key for customer table • both accountno and custid are keys for account table • Referential Integrity constraints: (foreign keys) • the custid attribute in account table takes values from ssno in customer table Null Constraint: customer name cannot take null values

  4. Relational Model A database schema consists of a set of relation schema and a set of constraints over the relation schema Relational Schema: • denoted by R(A,B,C). Graphically drawn as a table. • R = relation name • Recall: • relation is a subset of cartesian product of sets • relation is a set of n-tuples where n = degree of the relation • A, B, C are attribute names • With each attribute a domain is specified • In relational model attributes are atomic: • the values are not divisible. That is, we cannot refer to or directly see a subpart of the value. • an attribute can take a special null value • Null value represents either attributes whose value is not known, or do not exist

  5. Example of a Relation diagnosis --- an example relation/table Patient Disease JimSchizophrenic Jane Obsessive-Comp Jerry Manic joe null null in this case may mean that diagnosis is not complete and disease has not been identified. Notice possibility of confusion that null means that the patient has no disease! This is one of the reasons why using nulls is not a great idea! We will see other reasons as well later

  6. Constraints • What are they? • represent the semantics of the miniworld being modeled. • restrict the set of possible database states • Why do we want to specify the constraints? • Useful information to application programmers. They can write programs to prevent constraints violation • constraint -- acct balance should not fall below 0 dollars • programmer writing code for debit application should check at the end if the account gets overdrawn! • DBMS might enforce specified constraints directly making task of application writer easier • If DBMS guarantees that account does not get overdrawn, then debit application programmer need not worry about checking for overdrawn account condition.

  7. Constraints • Why specify constraints? • knowledge of some type of constraints enables us to identify redundancy in schemas and hence specification of constraints helps in database design (we will see this later) • Knowledge of some type of constraints can also help the DBMS in query processing • Constraint specification in Data Models: • ER model • domain and key constraints over entities • participation and cardinality constraints over relations • Relational Model • domain constraints, entity identity, key constraint, functional dependencies -- generalization of key constraints, referential integrity, inclusion dependencies -- generalization of referential integrity.

  8. Domain Constraint • In the schema, every attribute is declared to have a type --- integer, float, date, boolean, string, etc. • An insertion request can violate the domain constraint. • DBMS can check if insertion violates domain constraint and reject the insertion.

  9. Disallowing null Values Some fields of a relation are too important to contain null values. For example, in sales(customer, salesman, date, amount, saleID) we do not want ‘customer’ to contain a null value, because then we cannot send a bill to the customer for the sale! Similarly a primary key should not contain a null value since then it may not be possible to identify some tuples. If more than one tuple has a null value in its primary key, we may not be able to distinguish them (entity integrity constraint)

  10. Key Constraint Superkey for relation: set of attributes such that if two tuples agree on those attributes, then they agree on all the attributes of the relation Note: the set of all the attributes of a relation is always a superkey. Candidate key: superkey no subset of which i s a superkey Primary key: one of the candidate keys • A set of candidate keys can be associated by user with the relations. • insertion and modification may violate key constraint. • DBMS can check if the the tuple being inserted matches an existing tuple on the attributes in the key and disallow such insertions

  11. Foreign Key and Referential Integrity Constraint • Consider following 2 relation schemas: • R1(A1, A2, …An) • R2(B1, B2, … Bm) • Let PK be subset of {A1, A2, …,An} be primary key of R1 • A set of attributes FK of relation scheme R2 is a foreign key of R2 if the following 2 holds: • attributes in FK have the same domain as the attributes in PK • For all tuples t2 in R2, there exists a tuple t1in R1 such that t2[FK] = t1[PK]. • A referential integrity constraint from the attributes FK of R2 to R1 means that FK is a foreign that refers to the primary key of R1. • Insertion, deletion, and modification may violate referential integrity

  12. Example of Referential Integrity G student C# Semester grade Susan CS101 1-91 A Jane CS101 1-91 B John CS101 1-91 Q yping error! should be an ‘A’. If we keep a list of legal grades, LegalGrades Grade A and have a referential integrity B constraint saying that C every value of G.grade D must also be a value of F LegalGrades.grade, Audit Ex ... then we can reject the request insert into G values (“John”, “CS1010”, “1-91”, “Q”);

  13. Inclusion Dependencies • Generalization of referential integrity constraint. • Inclusion dependency R1[A1,...,An] Í R2 [B1,...,Bn] means that the values in the first relation R1 refer to the values in the second relation • Formally, R1[A1,...,An] Í R2 [B1,...,Bn] iff the following holds: • for all t1 in R1, there exists a t2 in R2 such that t1[A1, …, An] = t2[B1, …, Bn] • Notice that referential integrity constraint is an inclusion dependency in which {B1, .. Bn} is the primary key of R2. • Examples: • G[Grade] Í LegalGrade[Grade] • CourseOffering[C#] Í Courses[C#] • Takes[S#] Í Students[S#] • CourseOffering[Professor] Í UIEmployee[E#]

  14. Inclusion dependencies (specially referential integrity constraints) are very common • Modification, insertion and deletion requests can lead to violations of Ids -- must check for violation at end of each operation • Example: cancelling a course. Delete from courseOfferings where c# = “CS101” AND Semester = “2-91”; BUT TakesS# C# Semester Grade 1001 CS101 2-91 ^ 1002 CS101 2-91 ^ 1003 CS101 1-91 A Solution: 1) reject update - or - 2) delete tuples from ‘Takes’ that refer to ‘CS101, 2-91’ and then check integrity again.

  15. Functional Dependencies FDs is a generalization of concept of keys. Given a relation R with attributes A1,...,An,B1,...,Bm,C1,...,Cl, we say that A1,...,Anfunctionally determine B1,...,Bm (A1,...,An B1,...,Bm) if whenever two tuples agree on their values for A1,...,An, they agree on B1,…,Bm The key of a relation functionally determines all the attributes of the relation. (by definition of a key) Examples: Takes(C#, S#, semester, grade) C# S# Semester grade CS101 13146 1-91 A CS101 13146 1-91 B illegal since it violates FD that C#,S#,Semester functionally determine grade

  16. Logical Implication of Functional Dependencies • Consider R(A,B,C) • Let the following functional dependencies hold: • A B (f1) • B C (f2) • We can show that f1 and f2 together logically imply that the following functional dependency also holds: • A C (f3) • Proof: • say f3 does not hold. • then there exists tuples t1, t2 in R such that t1[A] = t2[A] and t1[C] is not equal to t2[C] • Since f1 holds and since t1[A] = t2[A], it must be the case that t1[B] = t2[B] • Hence since t1[B] = t2[B] and f2 holds, it must be the case that t1[C] = t2[C] • This is a contradition! • Hence, f3 must also hold!

  17. Closure of Functional Dependency Set Definition: Let R be a relation scheme, and F be the set of functional dependencies defined over R. F+ denotes the set of all functional dependencies that hold over R. That is, F+ = { X Y | F logically implies X Y} Example: Let F = {A B, B C} then A C is in F+ F+ is called the closure of F

  18. Inferencing Functional Dependencies Given a set of fds F over a relation scheme R, how to compute F+ ? 1. Reflexivity: If Y is a subset of X, then X Y Examples: AB A, ABC AB, etc. These 3 rules are called ARMSTRONGS AXIOMS!! 2. Augmentation: If X Y, then XZ YZ Examples: If A B, then AC BC 3: Transitivity: If X Y, and Y Z, then X Z.

  19. Using AA to infer dependencies • Union Rule: • if X Y, X Z, then X YZ • Proof: • Since X Y, using augmentation, X XY (1) • Since X Z, using augmentation, XY XZ (2) • Using (1) and (2) and transitivity we get X YZ Q.E.D. • Pseudo-Transitivity Rule: • If X Y, WY Z, then WX Z • Proof: • Since X Y, using augment XW YW (1) • Given WY Z, and (1) using transitivity • WX Z Q.E.D.

  20. Armstrongs Axioms Armstrongs Axioms are sound: If X Y is deduced using AA from F, then X Y holds over any relation r in which fd’s in F hold. Armstrongs Axioms are complete: If a functional dependency X Y is in the closure of F (that is, in F+), then it can be deduced using Armstrongs Axioms over F. Since AAs are sound and complete, they provide a mechanism to us to compute F+ from F. Even though we defined F+ as the set of all fds that are logical implication of F, since AA are sound and complete, we can redefine F+ as the fds that follow from F using AA.

  21. Superkeys and FDs • Using FDs we can formally define the notion of keys • Let R(A1, A2, …,An) be a relation • Let X be a subset of {A1, A2, …An} • X is a superkey of R if and only if the functional dependency X A1,A2, …,An is in F+ • Naïve Algorithm to test for superkey • compute F+ using AA • If X -----> A1,A2,…,An is in F+ • X is a superkey

  22. Cardinality of Closure of F Let F = {A B1, A B2, ..., A Bn} (cardinality of F = n) then {A Y | Y is a subset of {B1, B2, ..., Bn}} is a subset of F+ (cardinality of F+ is more than 2^n). So computing F+ may take exponential time! Fortunately, membership in F+ can be tested without computing F+.

  23. Closure of a set of Attributes • Let • R(A1, …, An) be a relation scheme • F be the functional dependency set on R. • X be a subset of {A1,…,An} • Defn: Closureof X with respect to F (denoted by X+) is the set of attributes Ai of R such that X Ai can be derived using Armstrong Axioms. • Theorem: X Y holds over R if and only if Y is a subset of X+. • Proof: • (if) Since Y is a subset of X+, by defn, for all Ai in Y, • X--->Ai. Repeated usage of union rule implies X--->Y. • (only if) Since X---> Y holds, then X-->Ai for each Ai in Y must hold. As a result, each element of Y is in X+. Hence Y is a subset of X+

  24. Strategy to test membership of X---> Y in F+ Compute X+ IF Y is a subset of X+ X---> Y holds in R Else X--->Y does not hold in R

  25. Membership of F+ Computing X+ X+ = X repeat oldX+ = X+ for each fd Y ---> Z in F do if (Y is a subset of oldX+) then X+ = X+ union Z endif endfor until (oldX+ == X+) maximum number of iterations = cardinality of F times the number of attributes in R! (polynomial time)

  26. Example • Let the set F contain the following fds: • AB C, D EG, C A, BE C, BC D • CG BD, ACD B, CE AG Let X = BD. Compute X+. iteration 1: X+ = {BD} iteration 2: X+ = {BDEG} (due to dependency 2) iteration 3: X+ = {BDCEG} (due to dependency 3) iteration 4: X+ = {BCDEGA} (due to dependency 8) iteration 5: X+ = {BCDEGA} Algorithm exits the loop since no new attribute added in last iteration and (BD)+ = {ABCDEG}

  27. Candidate Keys • X is a candidate key if: • X is a superkey • no subset of X is a superkey • Algorithm for test if X is a candidate key of R • test if X is a superkey • test if no subset of X is a superkey

  28. General Integrity Constraints Besides the basic integrity constraints discussed above, SQL allows a variety of other integrity constraints to be specified. These include assertion, triggers and general integrity constraints. We will discuss these when we study SQL Takes(C#, S#, semester, grade) courses(C#, UnitsOfCredit, ...) Rule: Every student must enroll for at least three credits every semester. Most commercial systems have been extended to support some of the general constraints (also referred to as the semantic integrity constraints) DBMS provides a language to specify these constraints and provides mechanisms to enforce them.

  29. Relational Query Languages RelationalAlgebra (procedural) Relational Calculus (non-procedural)

  30. Relational Languages • Relational Algebra (procedural) • defines operations on tables • Relational Calculus (declarative) • based on first-order predicate calculus • Every relational algebra query can be translated to relational calculus • Every safe relational calculus query can be translated to relational algebra. • Any language that is atleast as expressive as relational algebra is said to be relationally complete.

  31. Relational Algebra Operators • Select: given a relation R and a predicate P, select tuples from R that satisfy P. • Project: given a relation R and a subset of its attributes X, return a relation which is the same as R except that all columns not in X are left out. • Rename: given a relation R and a name N, return a relation that is exactly the same as R except that it has a name N. • Cartesian Product: Given 2 relations R1and R2,.return a relation R3 whose tuples are the concatenation of tuples in R1 and R2 • Union: Given relations R1 and R2, return a relation R3 which contains all tuples in R1 and R2 • Set Difference: Given relations R1 and R2, return a relation R3 containing all tuples in R1 that are not in R2

  32. Selection Operation • selection cond(R)or select[selection cond]R • Example: Employee(name, dept, sal) Employee select [sal > 20,000] Employee name dept sal jane pharmacy 30,000 jack hardware 30,000 jill pharmacy 75,000 select [(dept = toy) or (sal < 20,000)] Employee name dept sal joe toy 20,000 bill toy 12,000 s name dept sal jane pharmacy 30,000 jack hardware 30,000 jill pharmacy 75,000 joe toy 20,000 bill toy 12,000

  33. Projection • Proj [list of attr of R] (R ) or P list of attr of R (R) R A B C S A C Jane Toy 10,000 Jane Toy Jim Toy 20,000 John Complaint June Complaint 20,000 Proj[A]R Proj[CB]R A C B Jane 10,000 Toy Jim 20,000 Complaint June 20,000 Toy

  34. Cartesian Product • Denoted by R x S R: A B C S: A D joe toy 10K joe jill jack com 20K jack jill RxS: R.A B C S.A D joe toy 10K joe jill joe toy 10K jack jill jack com 20K joe jill jack com 20K jack jill • Notice attribute naming strategy to disambiguate attribute names attributes get the name, R.A, where A is attrib name, and R is the relation name from which attrib originates. If there is no possible ambiguity, relation name is dropped!

  35. Union Operator Denoted by: R U S (Illegal if R & S have different numbers of attributes or if domains of respective attributes dont match!) R A B S A C Jane Toy Jane Toy Jim Toy John Complaint June Complaint R U S = A B Jane Toy Jim Toy June Complaint John Complaint • Attributes of resulting relation take the names from the first relation

  36. Set Difference • Denoted by R - S ( Illegal if R & S have different numbers of attributes or if respective domains mismatch!) R A B S A C Jane Toy Jane Toy Jim Toy John Complaint June Complaint R - S = A B Jim Toy June Complaint Note attributes in resulting relation take name from the first relation

  37. Rename Operator • Strategy used to disambiguate attribute names: • For union and set difference operators, the resulting relation takes the attribute names of the first relation • For cartesian product, attributes are named as Relation-name.attribute-name, where Relation name refers to the relation from which the attribute originally came. • Strategy will not disambiguate when the same relation appears multiple times in the relational query. • Let R(A,B) be a relation. Consider R x R ---- what to name the attributes of the resulting relation? • Define a rename operator: • denoted by rename[N]R or by rN(R) • returns a relation which is exactly same as R except it has the name N

  38. Rename Operator • Consider relation Employee(name, dept, sal) • List all the employees who work in the same department as Jill • We first find the department(s) for which Jill works • Proj[dept](select[name = Jill] Employee) ---list of departments for which Jill works • To find out about all Employees working for this department, we need to reference the Employee table again: • select[P] ( Employee x Proj[dept](select[name = Jill] Employee) ) • where P is a selection predicate which requires dept values to be equal. • If we use Employee.dept in P it is ambiguous which instance of Employee relation in the query the attribute refers to. • To disambiguate, use rename operator: • Proj[Employee.name](select[Employee.dept = Employee2.dept] Employee x (Proj[dept] (select[name = Jill]( rename[Employee2](Employee))))

  39. Formal Definition of Relational Algebra • Basic Expressions: • Relation in a database • Constant Relations • General Expressions: constructed from basic ones. Let E1 and E2 be relational algebra expressons. Then the following are also expressions: • E1 U E2, if arity of E1 = E2 and corresponding attributes are of the same type • E1 - E2, if arity of E1 = E2 and corresponding attributes are of the same type • E1 x E2, if attributes in E1 and E2 have different names • Select[P](E1), where P is a predicate on attributes in E1 • Proj[S](E1), where S is a list of some attributes in E1 • rename[X](E1), where X is a new name for relation E1

  40. AdditionalOperators Basic Relational Algebra operators are like assembly language. Define more powerful operators that make the task of writing relational algebra queries easier Each of these operators can be expressed in relational algebra and do not increase the expressibility of the language Example: Intersection R Ç S = R - (R - S) = { t | t Î R & t Î S }

  41. Joins R join condition S = select[ join condition] (R x S) join condition is of the form: <condition> AND <condition> AND <condition> where each condiition is of the form Ai q Bj, where • Ai is attribute of R • Bj is attribute of S • q is a comparison operator {=, <, >, <=, >=, <>} • Example: • E(emp, dept) M(dept, mgr) • List all employees and their managers. • Proj[emp, mgr](select[E.dept = M.dept] (ExM)) • can be represented as: • Proj[emp,mgr] ( E E.dept = M.dept M )

  42. Types of Joins • Theta-Join: if a join condition uses some comparison operator other than equality. • E.g., list names of all managers who manage departments other than Jill’s • Proj[mgr]( select[emp = Jill](E ) (E.dept ¹ M.dept) M) • Equi-Join: if join conditions use only equality operator. • E.g., list the manager’s name of Jill’s department • Proj[mgr]( select[emp = Jill](E ) (E.dept = M.dept) M) • Natural Join: special type of equi-join.. • Let R and S be relations. Let attributes of R and S be denoted by R and S respectively. • Denote by R U S the union of the list of attributes of R and S • Let list of attributes common to both R and S be {A1, A2, …, An} • Natural join of R and S (denoted R S) is: • Proj[R U S ] (R R.A1 = S.A1 and R.A2 = S.A2 and … and R.An = S.An S) • E.g., Proj[mgr]( select[emp = Jill](E ) M)

  43. Assignment Operator • Lots of time convenient to write relational algebra expressions in parts using assignment to temporary relational variables. • For this purpose use assignment operator, denoted by := • E.g., Who makes more than their manager? E(emp, dept, sal) M(mgr, dept) ESM(emp, sal, mgr) := Proj[emp, sal, mgr] (E M) (Proj[ESM.emp](ESM [mgr = E.emp & ESM.sal >E.sal] E) ) • With the assignment operator, a query can be written as a sequential program consisting of a series of assignments followed by an expression whose value is the result of the query.

  44. Examples • A query is a composition of basic relational algebra operators • Consider relations: • customer(ssno, name, street, city) • account(acctno, custid, balance) • list account balance of Sharad • list names of all customers who have a higher balance than all customers in Champaign

  45. ToDiag Dis Test Strep A Mono B Meningitis C Hepatitis D Encehhalitis E Meningitis F Meningitis G Diag Pat Dis Winslett Strep Liu Mono Harandi Meningitis Harandi Hepatitis Liu Hepatitis Outcome Pat Test Outcome Winslett a T Winslett b F Liu b T Harandi f T Winslett e F Harandi e F Harandi g F Winslett e T

  46. 1. Who has what disease? Diag 2. Who has a disease they have been tested for? • Proj[pat](Diag | | ToDiag | | Outcome) 3. Who has a disease they tested positively for? Proj[pat](Diag | | ToDiag | | (select[outcome = ‘T’])Outcome)) 4. Who has a disease that they tested both positively & negatively for? Temp1(pat, dis) := Proj[pat,dis](Diag | | ToDiag select[outcome = ‘T’])Outcome) Temp2(pat, dis) : = Proj[pat,dis](Diag | | ToDiag select[outcome = ‘T’])Outcome) Proj[pat](Temp1 Ç Temp2) Use better names!!

  47. Example of Queries in Relational Algebra 5. Who tested both positively and negatively for a disease, whether or not they have it? Testpos(pat, dis) = Proj[pat,dis](ToDiag | | select[outcome = ‘T’]) Outcome) Testneg(pat, dis) = Proj[pat,dis](ToDiag | | select[outcome = ‘T’]) Outcome) (Testpos Ç Testneg)[pat] 6. Who tested both positively & negatively for the same test? (Winslett) Proj[pat](Outcome | | condition (rename[Outcome2](Outcome)) where condition is: [Outcome.pat = Outcome2.pat & Outcome.test = Outcome2.test & Outcome.outcome = Outcome2. outcome]

  48. 7. What testable disease does no one have? (encephalitis) Proj[dis]ToDiag - Proj[dis]Diag Note technique: compute opposite of what you want, take a difference. Use in hard queries with negation (‘no one’) 8. What disease does more than one person have? Proj[dis](Diag condition rename[Diag2](Diag)) where, condition is [Diag.pat ¹ Diag2.pat & Diag.dis = Diag2dis] 9. What disease does everyone have? clue that query is very hard Disease(dis) := diag[dis] Patients(pat) := diag[pat] DiseasesNotEveryoneHas(dis) := Proj[dis]((Patients x Disease) - Diag) Disease - Diseases Not Everyone Has Note technique used! A very hard query might require taking the difference several times.

  49. Outer Joins E emp dept sal M mgr dept Jones Missiles 10K Mendez Tanks Chu Tanks 20K Frank Explosive Swami Tanks 50K Jones Missiles Barth Revolver 100K Right outer join of E with M emp dept sal mgr Jones Missiles 10K Jones Chu Tanks 20K Mendez null Explosives null Frank left outer join of E with M emp dept sal mgr Jones Missiles 10K Jones Chu Tanks 20K Mendez Swami Tanks 50K Mendez Barth Revolver 100K null Full outer join of E with M emp dept sal mgr Jones Missiles 10K Jones Chu Tanks 20K Mendez Swami Tanks 50K Mendez Barth Revolver 100K null null Explosives null Frank

  50. Aggretate Functions • Mathematical aggregate functions (e.g., sum, count, average, min, max) on a collection of values from the database. • [grouping attributes] F [function list] (<relation expression>) where grouping attributes specify how to group the tuples function list specify the aggregate functions to be applied to the individual groups relation expression is the expression that results in a relation. • Example: Consider a relation Employee(dno, ssno, sal). An expression [dno] F [COUNT ssno, AVERAGE salary] (Employee) results in a relation: dno #employee average_salary 1 10 10,000 2 23 8000 Queries involving aggregate functions cannot be expressed in pure relational algebra!

More Related