1 / 68

The Relational Model

The Relational Model. CS145 Lecture 2 September 24, 2014. Announcements. Piazza works! There are links on the page Please please sign up. Almost all course announcements on Piazza.

agatha
Download Presentation

The Relational Model

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Relational Model CS145 Lecture 2 September 24, 2014

  2. Announcements • Piazza works! There are links on the page • Please please sign up. Almost all course announcements on Piazza. • Students with documented disabilities should send in their accommodation letter from O.A.E. (Office of Accessible Education) by the end of this week to Firas at fabuzaid@stanford.edu. • All quizzes for EdX are online • They may cover slightly different material… Take them! • Quizzes are due either before midterm or final day. • Problem sets will be posted on the main web page. • Direct link to this class video is up • Firas is doing a great job.

  3. Outline • Motivation and History • The Relational (data) model • Relational Algebra

  4. Motivation Relational model is precise, implementable, and we can operate on it (query/update, etc.) It also happens to be the dominant paradigm for managing data in the world.

  5. A little history Relational model due to Edgar “Ted” Codd, a mathematician at IBM in 1970 A Relational Model of Data for Large Shared Data Banks". Communications of the ACM 13 (6): 377–387 Won Turing award 1981 IBM didn’t want to use relational model (take money from IMS)

  6. Outline • Motivation and History • The Relational (data) model • Relational Algebra

  7. The Relational Model: Schemata • Part (I) Relational Schema: • Students(sid: string, name: string, gpa: float) Attributes Relation name string, float, int are the domains of the attributes

  8. The Relational Model II: Data • Students(sid: string, name: string, gpa: float) Attributes # of attributes is the arity NB: In practice, DB systems relax the set requirement. Why? Tuples / Records # of tuples is the cardinality A relational instance is a set of tuples all conforming to the same schema

  9. To reiterate • A relational schema describes the data that is contained in a relational instance A relational instance is a set of tuples of the same type. Let R(f1:D1,…,fm:Dm) be a relational schemathen, an instance of R is a subset of Dom1 x Dom2 x … x Domn A tuple viewed as a total function from attribute names to types (names important)

  10. One more time • A relational schema describes the data that is contained in a relational instance • A relation R is of arity t is a function: • R : D1 x … x Dt {0,1} The schema is simply the signature of the function. In this model : order matters, but attribute name doesn’t. In other model: names matter, but order doesn’t

  11. A relational database • A relational database schema is a set of relational schemata, one for each relation • A relational database instance is a set of relational instances, one for each relation • Two conventions: • We call instances as simply databases • We assume all instances are valid, i.e., satisfy the domain constraints

  12. Remember the CMS Domain Constraints: “Apple” is not a valid GPA; no instance contains it. • Relation DB Schema • Students(sid: string, name: string, gpa: float) • Courses(cid: string, cname: string, credits: int) • Enrolled(sid: string, cid: string, grade: string) Relation Instances Students Courses Enrolled

  13. Ok, but you said this was operational… • SQL Data Definition Language (DDL) is one method of defining relations • In SQL, Relation → table • Students(sid: string, name: string, gpa: float) CREATE TABLE Students ( sid CHAR(20), Name CHAR(50), Gpa REAL); This awesome prehistoric syntax says same thing!

  14. 2nd Part of the Model: Querying SELECT S.name FROM Students S WHERE S.GPA > 3.5 “Find names of all students with GPA > 3.5” We will spend more time on this! We don’t tell the system how or where to get the data. The query is declarative Actually, I showed how to do this translation for a much richer language!

  15. Virtues of the model • Physical independence (logical too), Declarative • Simple, elegant clean: Everything is a relation • Why did it take multiple years? • Doubted it could be done efficiently.

  16. Relational Algebra

  17. RDBMS Architecture How does a SQL engine work ? • SQL query  relational algebra plan • Relational algebra plan  Optimized plan • Execute each operator of the plan This week we go top-down to learn how this works We cover Ch. 4 and start Ch.14

  18. Relational Algebra • Formalism for creating new relations from existing ones • Its place in the big picture: Declartivequerylanguage Algebra Implementation Relational algebraRelational bag algebra SQL,relational calculus

  19. Relational Algebra • Five operators: • Union:  • Difference: - • Selection:s • Projection: P • Cartesian Product:  • Derived or auxiliary operators: • Intersection, complement • Joins (natural,equi-join, theta join, semi-join) • Renaming:r • Division

  20. 1. Union and 2. Difference • R1  R2 • Example: • ActiveEmployees  RetiredEmployees • R1 – R2 • Example: • AllEmployees -- RetiredEmployees

  21. What about Intersection ? • It is a derived operator • R1  R2 = R1 – (R1 – R2) • Also expressed as a join (will see later) • Example • UnionizedEmployees  RetiredEmployees

  22. 3. Selection • Returns all tuples which satisfy a condition • Notation: sc(R) • Examples • sSalary > 40000(Employee) • sname = “Smith”(Employee) • The condition c can be =, <, , >,, <>

  23. sSalary > 40000(Employee)

  24. 4. Projection • Eliminates columns, then removes duplicates • Notation: P A1,…,An(R) • Example: project social-security number and names: • PSSN, Name (Employee) • Output schema: Answer(SSN, Name)

  25. PName,Salary (Employee)

  26. 5. Cartesian Product • Each tuple in R1 with each tuple in R2 • Notation: R1  R2 • Example: • Employee  Dependents • Very rare in practice; mainly used to express joins

  27. Relational Algebra • Five operators: • Union:  • Difference: - • Selection: s • Projection: P • Cartesian Product:  • Derived or auxiliary operators: • Intersection, complement • Joins (natural,equi-join, theta join, semi-join) • Renaming: r

  28. Renaming • Changes the schema, not the instance • Notation: rB1,…,Bn (R) • Example: • rLastName, SocSocNo (Employee) • Output schema: Answer(LastName, SocSocNo)

  29. Renaming Example Employee Name SSN John 999999999 Tony 777777777 • LastName, SocSocNo (Employee) LastName SocSocNo John 999999999 Tony 777777777

  30. Natural Join • Notation: R1 || R2 • Meaning: R1 || R2 = PA(sC(R1  R2)) • Where: • The selection sCchecks equality of all common attributes • The projection eliminates the duplicate common attributes

  31. Employee Dependents = PName, SSN, Dname(sSSN=SSN2(Employee xrSSN2, Dname(Dependents)) Natural Join Example Employee Name SSN John 999999999 Tony 777777777 Dependents SSN Dname 999999999 Emily 777777777 Joe Name SSN Dname John 999999999 Emily Tony 777777777 Joe

  32. Natural Join • R= S= • R || S=

  33. Natural Join • Given the schemas R(A, B, C, D), S(A, C, E), what is the schema of R || S ? • Given R(A, B, C), S(D, E), what is R || S ? • Given R(A, B), S(A, B), what is R || S ?

  34. Theta Join • A join that involves a predicate • R1 ||q R2 = sq (R1  R2) • Here q can be any condition

  35. Eq-join • A theta join where q is an equality • R1 ||A=B R2 = s A=B (R1  R2) • Example: • Employee ||SSN=SSN Dependents • Most useful join in practice

  36. Semijoin • R | S = PA1,…,An (R || S) • Where A1, …, An are the attributes in R • Example: • Employee | Dependents

  37. Semijoins in Distributed Databases • Semijoins are used in distributed databases Dependents Employee network Employee ||ssn=ssn (s age>71 (Dependents)) T = PSSNs age>71 (Dependents) R = Employee | T Answer = R || Dependents

  38. seller-ssn=ssn pid=pid buyer-ssn=ssn Complex RA Expressions Pname Person Purchase Person Product Pssn Ppid sname=fred sname=gizmo

  39. Division • Not a primitive operator: “find sailors who have reserved all boats” • Let A have 2 fields x,y and B have one field then y then • A/B = { x | (x,y) in A . Forally in B } • A/B = C implies BC <= A (and the largest)

  40. Expressing A/B using other ops • Idea: For A/B, compute all x values that are disqualified by some y value in B πx((πx(A) times B)) – A Then remove these from the set.

  41. Quiz (Not really)Express these in RA • “Find names of Sailors who’ve reserved boat #103” • “Find names of sailors who’ve reserved a red boat” • “Find sailors who reserved a red or green boat” Sailors(sid,sname,rating,age) Reserves(sid,bid,date) Boats(bid,bname,color)

  42. Operations on Bags A bag = a set with repeated elements All operations need to be defined carefully on bags • {a,b,b,c}{a,b,b,b,e,f,f}={a,a,b,b,b,b,b,c,e,f,f} • {a,b,b,b,c,c} – {b,c,c,c,d} = {a,b,b} • sC(R): preserve the number of occurrences • PA(R): no duplicate elimination • Cartesian product, join: no duplicate elimination Important ! Relational Engines work on bags, not sets !

  43. Finally: RA has Limitations ! • Cannot compute “transitive closure” • Find all direct and indirect relatives of Fred • Cannot express in RA !!! • Need to write C program, use a graph engine, or modern SQL…

  44. Relational Algebra • Five operators: • Union:  • Difference: - • Selection:s • Projection: P • Cartesian Product:  • Derived or auxiliary operators: • Intersection, complement • Joins (natural, equi-join, theta join, semi-join) • Renaming:r • Division Maps sets of tuples to sets of tuples Why are joins typically included?

  45. Operations on Bags A bag = a set with repeated elements All operations need to be defined carefully on bags • {a,b,b,c}{a,b,b,b,e,f,f}={a,a,b,b,b,b,b,c,e,f,f} • {a,b,b,b,c,c} – {b,c,c,c,d} = {a,b,b} • sC(R): preserve the number of occurrences • PA(R): no duplicate elimination • Cartesian product, join: no duplicate elimination Important ! Relational Engines work on bags, not sets !

  46. Logical Equivalence of Plans R(A,B) S(B,C) s[A=5] (p[A] R) = p[A] (s[A=5] R) Here, projection and selection commute p[B] (s[A=5] R)? Can we play the same game here?

  47. A simple plan p[B] R(A,B) S(B,C) What SQL query does this correspond to?

  48. Pushing down projection p[B] p[B] p[B] R(A,B) S(B,C) R(A,B) S(B,C) Why might we prefer this plan?

  49. Takeaways • This process is called logical optimization • Many equivalent plans used to search for “good plans” (more later) • Relational Algebra is an important abstraction.

More Related