Comp 231 Database Management Systems - PowerPoint PPT Presentation

hchen
comp 231 database management systems n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Comp 231 Database Management Systems PowerPoint Presentation
Download Presentation
Comp 231 Database Management Systems

Loading in 2 Seconds...

play fullscreen
1 / 59
Download Presentation
Comp 231 Database Management Systems
0 Views
Download Presentation

Comp 231 Database Management Systems

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Comp 231 Database Management Systems 3. Relational Data Model Department of Computer Science, HKUST 1

  2. Basic Concepts • Entities and relationships are stored in tables • Relationships are captured by including key of one table into another • Languages for manipulating the tables • All popular DBMSs today are based on relational data model (or an extension of it, e.g., object-relational data model) Department of Computer Science, HKUST 2

  3. Why is it so good? • Simplicity, everybody knows how to manipulate tables • Tables are simple enough so that solutions to complicated problems such as concurrency control and query optimization can be obtained • It has a theoretical basis for the studying of database design problems • Tables are logical concepts; physically tables can be stored in different ways  support data independence Department of Computer Science, HKUST 3

  4. Terminology • Relation  table; denoted by R(A1, A2, ..., An) • Attribute  column; denoted by Ai • Tuple  row • Attribute value  value stored in a table cell • Domain  legal type and range of values of an attributedenoted by dom(Ai) • Attribute: Age Domain: [0-100] • Attribute: EmpName Domain: 50 alphabetic chars • Attribute: Salary Domain: non-negative integer Ideally, a domain can be defined in terms of another domain; e.g., the domain of EmpName is PersonName. However, this is NOT allowed in most basic DBMSs. Department of Computer Science, HKUST 4

  5. An Example Relation Relation/Table Name Attributes/Columns Tuples/Rows Department of Computer Science, HKUST 5

  6. Some Formal Definitions • A relation is denoted by: R(A1, A2, ..., An) • STUDENT(Name, Student-id, Age, CGA) • Degree of a relation: the number of attributes n in the relation. • Tuple t of R(A1, A2, ..., An): An ordered set of values <v1,v2,...,vn> where each vi is an element of dom(Ai). • Relation instance r(R): A set of tuples in R • r(R) = {t1, t2, ..., tm}, or alternatively r(R)  dom(A1)  dom(A2)  ...  dom(An) Department of Computer Science, HKUST 6

  7. Relation and Cartesian Product • A relation is the Cartesian product of sets of valuesName = { Lee, Cheung }Grade = { A, B, C } • A relation StudentGrade (Name, Grade) can be defined as any subsetof the following maximal extent:Name  Grade = { Lee, A, Lee, B , Lee, C, Cheung, A, Cheung, B, Cheung C } • r(StudentGrade)  Name  Grade = { Lee, A Cheung C } Department of Computer Science, HKUST 7

  8. After all, why don’t we call them tables? • Observe which of the following would you consider a table: ??? • A empty table/relation is still a table/relation (can be denoted by R = {}) • A null table/relation is still a null table/relation (can be denoted by R = ) Department of Computer Science, HKUST 8

  9. Characteristics of Relations • Tuples in a relation are not considered to be ordered, even though they appear to be in a tabular form. • Ordering of attributes in a relation schema R are significant because the mathematical definition of a relation doesn’t include attribute names. • Values in a tuple: All values are considered atomic. A special null value is used to represent values that are unknown or inapplicable to certain tuples. Department of Computer Science, HKUST 9

  10. Keys • Let K R (I.e., K is a set of attributes which is a subset of R) • K is a superkey of R if K can identify a unique tuple of each possible relationr(R). Customer(CusNo, Name, Address, …)where customers have unique customer numbers and unique names. superkeys: CusNo Name {CusNo, Name} {CusNo, Name, Address} plus many others • K is a candidate key if K is minimal There are two candidate keys: CusNo and Name • Every relation is guarantee to (must) have at least one key. Why? Department of Computer Science, HKUST 10

  11. Relational Algebra Department of Computer Science, HKUST 11

  12. Relational Algebra • It is a procedural language, I.e., sequence of operations is explicitly specified in the language • Six basic operations: select, project, union, set difference, Cartesian product, renameE.g., R = S  T  U • Each operation takes one or more relations as input and give a relation as output (a property known as closure allowing us to nest relational algebra expressions) • Contrast relational algebra expressions with arithmetic expressions; the former operates on relations and the latter on numbers Department of Computer Science, HKUST 12

  13. Select Operation • Notation: p (r) • Defined as: p (r) = { t | t  r  P(t) } • Read as: t is a tuple of r and t satisfies the predicate P. • P is a formula of the form: attribute name op attribute nameor attribute name op constantwhere op is one of       • Predicates can be connected by  (and)  (or)  (not) Department of Computer Science, HKUST 13

  14. Select Operation - Example • Relation r: • A  B  D  5 (r) : Department of Computer Science, HKUST 14

  15. Project Operation • Notation:  A1, A2, ..., Ak (r)where A1, A2, ..., An are attributes defined in r • After the projection, duplicates are assumed to be eliminated automatically since a relation is a set!Not true in real DMBSs: duplicates are not eliminated until explicitly told to do so • Order of attributes in the projection list can be arbitrary. (One use of projection is to rearrange attributes) Department of Computer Science, HKUST 15

  16. Project Operation - Example • Relation r: •  A,C (r): Department of Computer Science, HKUST 16

  17. Union Operation • Notation: r  s • Defined as: r  s = { t | t  r  t  s } • For r  s to be valid (r and s must be compatible), I.e., • r, s must have the same number of attributes • domains must be compatible pair-wise between r and s. • R(EmpName, Age) S(StudName, Age)T(ProdID, Name) U(StudName, Age, CGA) • Compatible relations: R and S, R and  StudName, AgeU, etc. • Incompatible relations: R and T, R and U, etc. Department of Computer Science, HKUST 17

  18. A B A B • r  s : A B  1  2  1  2  3  2  1  1  3 Union Operation - Example • Relation r, s: r s Department of Computer Science, HKUST 18

  19. Set Difference Operation • Notation: r  s • Defined as: r  s = { t | t  r  t  s } • For r  s to be valid: • r, s must have the same number of attributes • domains must be compatible pair-wise between r and s. Department of Computer Science, HKUST 19

  20. A B  1  2  1 A B  1 A B  1  2  3 Set Difference Operation - Example r • Relation r, s: s • r  s : Department of Computer Science, HKUST 20

  21. Cartesian-Product Operation • Notation: rs • Defined as: rs = { tq | t  r  q  s } • Assume that attributes of r and s are disjoint (I.e., r and s don’t have attributes with the same name) • If they are not disjoint, then make them disjoint by renaming the attributes concerned Department of Computer Science, HKUST 21

  22. Relation r, s: r s • r  s : Cartesian-Product Operation - Example Department of Computer Science, HKUST 22

  23. Join Operation • Cartesian product is rarely used by its own; it is the basis of the join operation, which is more popular • rs = A  C (rs) • A and C are sets of attributes which have the same attribute names in r and s. Department of Computer Science, HKUST 23

  24. B D B D Join Operation - Example • R = ( A, B, C, D ) • S = ( E, B, D ) • R joins S has the result schema: T(A, B, C, D, E) and is defined by:rs =  r.A,r.B,r.C,r.D,s.E(r.B=s.B  r.D=s.D(rs)) • This is also called natural join Department of Computer Science, HKUST 24

  25. A B C D B D E A B C D E  1  a 1 a   1  a   2 γ a 3 a   1  a γ γ 4  b 1 a γ  1 γ a   1 γ a 2 b   1 γ a γ  2  b 3 b   2  b  Natural Join Operation - Example r s • Relation r, s: • rs : Department of Computer Science, HKUST 25

  26. Assignment Operation • The assignment operation() provides a convenient way to express complex queries by storing intermediate results into temporary relations • Assignment must always be made to a temporary relation variable. • Example: Write r  s astemp1  R-S (r ) temp2  R-S((temp1  s) - R-S,S( r)) result = temp1 - temp2 • The result to the right of the  is assigned to the relationvariable on the left of the . • May use variable in subsequent expressions. Department of Computer Science, HKUST 26

  27. Example Queries • Find all customers who have an account from both the “Downtown” and “Uptown” branches.depositor (customer-name, account-no)account (account-no, branch-name, branch-city) • Query 1CN(BN=“Downtown”(depositor account))  CN(BN=“Uptown”(depositor account))where CN denotes customer-name and BN denotes branch-name. • Query 2 (for the purpose of illustrating division only) temp  {“Downtown”, “Uptown”}[ customer-name, branch-name (depositor account) ]  temp Department of Computer Science, HKUST 27

  28. given a customer x and the set of branches a, b, c, check x,a x,b and x,c are in … Returns all branches of a customer all branches in Brooklyn Example Queries • Find all customers who have an account at all branches located in Brooklyn.depositor (customer-name, account-no)account (account-no, branch-name, branch-city) customer-name, branch-name (depositor account)  branch-name (branch-city=“Brooklyn” (branch)) • Dividend contains all customer-branch information • Divisor contains all branches located in Brooklyn Department of Computer Science, HKUST 28

  29. Tuple Relational Calculus Department of Computer Science, HKUST 29

  30. Tuple Relational Calculus • A nonprocedural query language, where each query is of the form { t | P(t) } • It is the set of all tuples t such that predicate P is true for t • t is a tuple variable; t[A] denotes the value of tuple t on attribute A • t  r denotes that tuple t is in relation r • P is a formula similar to that of the predicate calculus Department of Computer Science, HKUST 30

  31. Predicate Calculus Formula 1. Set of attributes and constants 2. Set of comparison operators: (e.g., <, , =, , >, ) 3. Set of connectives: and (), or (), not () 4. Implication (): x  y, if x is true, then y is true and if x is NOT true, then y could be either true OR false x  y   x  y 5. Set of quantifiers:  t  r (Q(t))  “there exists” a tuple t in relation r such that predicate Q(t) is true  t  r (Q(t))  Q is true “for all” tuples t in relation r Department of Computer Science, HKUST 31

  32. Banking example • branch(branch-name, branch-city, assets) • customer (customer-name, customer-street, customer-city) • account (branch-name, account-number, balance) • loan (branch-name, loan-number, amount) • depositor (customer-name, account-number) • borrower (customer-name, loan-number) Department of Computer Science, HKUST 32

  33. t[amount] t A Simple Selection Query • Find the branch-name, loan-number, and amount for loans of over $1200{t | t  loan  t[amount]  1200} all relevant attributes are contained in:loan (branch-name, loan-number, amount) all attributes in tuple t is returned; {t[name] | … …} to get only the name attribute Intuitively, you can consider t as a variable (or pointer) which iteratively loops through each tuple in loan t is called a tuple variable loan table amount Department of Computer Science, HKUST 33

  34. t s s[amount] A Join Query • Find the loan number for each loan of an amount greater than $1200 and return the customer info{t | t customer  ( s  loan) ( t[loan-number] = s[loan-number]  s[amount]  1200 ) } customer loan-no loan loan-no amt t[loan-no] s[loan-no] It is a join operation A customer may have many loans, but as long as one of his loans exceeds 1200, his information will be output again, all attributes in tuple t is returned Department of Computer Science, HKUST 34

  35. Peculiarity from the Textbook • borrower (customer-name, loan-number)loan (branch-name, loan-number, amount){t |  s loan (t[loan-number] = s[loan-number]  s[amount]  1200)} • Same query as the previous one, except that t is not defined as a tuple in borrower, so t is an arbitrary tuple variable • The whole expression only mentions “loan-number” as an attribute of t, so we can only assume that t has only one attribute, namely, “loan-number” • This query only returns loan numbers • Let’s make it simpler (same as the first example!){s[loan-number] | s loan  s[amount]  1200} • It is better to define the scope of a tuple variable first! You need thet borrower part if you want to return the customer-street or city anyway! Department of Computer Science, HKUST 35

  36. t’s name can be found in the borrower table or t’s name can be found in the depositor table Example Queries • Find the names of all customers having a loan, an account, or both at the bank{ t | ( s  borrower) ( t[customer-name] =s[customer-name] )  ( u depositor) (t[customer-name] = u[customer-name]) } • t has NOT been explicitly defined as a tuple in the customer relation • only attribute customer-name of tuple t is returned • consider adding t  customer Question: can I uses depositor instead of u depositor ? Department of Computer Science, HKUST 36

  37. Example Queries • Find the names of all customers who have a loan and an account at the bank.{t | s  borrower (t[customer-name] =s[customer-name])   u depositor (t[customer-name] = u[customer-name])} Alternative: {t [customer-name] | t customer ( s  borrower (t[customer-name] = s[customer-name])   u depositor (t[customer-name] = u[customer-name]) )} Department of Computer Science, HKUST 37

  38. t s Perryridge u Example Queries • Find the names of all customers having a loan at the Perryridge branch{ t[name] | t  customer  s  borrower (t[customer-name] = s[customer-name]  u loan (u[branch-name] = “Perryridge”  u[loan-number] = s[loan-number] ))} customer borrower loan { t | s  borrower (t[customer-name] = s[customer-name] u  loan (u[branch-name] = “Perryridge”  u[loan-name] = s[loan-number]))} [textbook] Department of Computer Science, HKUST 38

  39. t customer borrower s loan s Perryridge u where is “s” defined? Example Queries • The inner u loan ()cannot be taken out of the s  borrower ( ) quantifier. • {t | t customer   s  borrower (t[customer-name] = s[customer-name] )   u loan (u[branch-name] = “Perryridge”  u[loan-number] = s[loan-number] )} • {t | t customer   s  borrower (t[customer-name] = s[customer-name] )   u loan s  borrower (u[branch-name] = “Perryridge”  u[loan-number] = s[loan-number] )} • The two “s” in the two unnested existential quantifiers are not the same. Department of Computer Science, HKUST 39

  40. t is a borrower at Perryridge branch t customer depositor loan borrower s {t | s borrower ( t[customer-name] = s[customer-name] u loan ( u[branch-name] = “Perryridge”  u[loan-number] = s[loan-number] )   v  depositor (v[customer-name] = t[customer-name] ) ) } but t is not a depositor anywhere in the bank Perryridge v u NOT-EXIST Example • Find the names of all customers who have a loan at the Perryridge branch, but no account at any branch of the bank Department of Computer Science, HKUST 40

  41. t customer a b c EXIST vs. NOT-EXIST • v  depositor (v[customer-name] = t[customer-name] ) Consider a customer t, there does not exist any depositor whose name is the same as t’s name. • v  depositor (v[customer-name]  t[customer-name] ) Consider a customer t, there exist (any)one depositor whose name is not the same as t’s name. First query returns a and b Second query returns a, b, and c (c is qualified because you can find e in depositor which is  c) First query is the same as: vdepositor v[cust-nm]  t[cust-nm] depositor c v e Department of Computer Science, HKUST 41

  42. this part is for finding t’s city;not needed if you use t customer Example Queries • Find the names of all customers having a loan from the Perryridge branch, and also return the cities they live in {t | s  loan (s[branch-name] =“Perryridge”   u  borrower (u[loan-number] = s[loan-number]  t[customer-name] = u[customer-name]  v  customer (u[customer-name] =v[customer-name]  t[customer-city] = v[customer-city])))} the borrower has a loan and the loan is from Perryridge t is a borrower Department of Computer Science, HKUST 42

  43. some loan issued from Perryridge t has borrowed the loan t lives in the same city as the Perryridge branch is located s[branch-name] is already bound to “Perryridge”; want to find the city where Perryridge is in Example Queries • Find the names of all customers having a loan from the Perryridge branch and live in the the city where Perryridge branch is located. { t | t customer  s  loan ( s[branch-name] =“Perryridge”   u  borrower ( u[loan-number] = s[loan-number]  t[customer-name] = u[customer-name]   v  branch ( v[branch-name] = s[branch-name]  t[customer-city] = v[branch-city] ) )) } branch(branch-name, branch-city, assets) Department of Computer Science, HKUST 43

  44. such that t is the depositor of account u Query with Universal Quantifier • Find the names of all customers who have an account at all branches located in Brooklyn: {t | s  branch ( s[branch-city] =“Brooklyn”  u  account ( s[branch-name] = u[branch-name]  s  depositor ( t[customer-name] = s[customer-name] s[account-number] = u[account-number]) ) )} if s is a branch in Brooklyn then there is an account u in from that branch Department of Computer Science, HKUST 44

  45. Other Relational Algebra Operators Department of Computer Science, HKUST 45

  46. Outer Join • An extension of the join operation that avoids loss of information. • Computes the join and then adds tuples from one relation that do not match tuples in the other relation to the result of the join. • Uses null values: • null signifies that the value is unknown or does not exist. • All comparisons involving null are false by definition. Department of Computer Science, HKUST 46

  47. branch-name loan-number amount Downtown L-170 3000 Perryridge L-260 1700 Redwood L-230 4000 cust-name loan-number Jones L-170 Smith L-230 Hayes L-155 Outer Join - Example Relation loan Relation borrower Department of Computer Science, HKUST 47

  48. loan branch-name loan-number amount Downtown L-170 3000 Perryridge L-260 1700 borrower LoanBorrower branch-name loan-number amount cust-name Redwood L-230 4000 Downtown L-170 3000 Jones cust-name loan-number Redwood L-230 4000 Smith Jones L-170 Smith L-230 Hayes L-155 Outer Join - Example Join returns only the matching (or “good”) tuples The fact that loan L-260 has no borrower is not explicit in the result Hayes has borrowed an non-existent loan L-155 is also undetected Department of Computer Science, HKUST 48

  49. Left outer join: Loan borrower branch-name loan-number amount cust-name Downtown L-170 3000 Jones Perryridge L-260 1700 null Redwood L-230 4000 Smith Left Outer Join -Example Keep the entire left relation (Loan) and fill in informationfrom the right relation, use null if information is missing. Department of Computer Science, HKUST 49

  50. Loan Borrower branch-name amount Downtown 3000 Redwood 4000 null null branch-name amount cust-name loan-number Loan borrower Downtown 3000 Jones L-170 cust-name loan-number Redwood 4000 Smith L-230 Jones L-170 Perryridge 1700 null L-260 Smith L-230 null null L-155 Hayes Hayes L-155 Right and Full Outer Join - example Department of Computer Science, HKUST 50