1 / 56

Relational Algebra Theory Review

This lecture covers the basics of relational algebra, including the five operators and examples of their usage. It also introduces first-order logic as a query language. The lecture further discusses the syntax and semantics of first-order logic, as well as safe and unsafe queries.

carolperry
Download Presentation

Relational Algebra Theory Review

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSE 544: Lecture 8 Theory

  2. Review of Relational Algebra • Five basic RA operators: • , -, s, p, 

  3. Find all employees with salary more than $40,000. s Salary > 40000(Employee)

  4. PSSN, Name (Employee)

  5. Renaming Example Employee Name SSN John 999999999 Tony 777777777 • LastName, SocSocNo (Employee) LastName SocSocNo John 999999999 Tony 777777777

  6. Employee Dependents = PName, SSN, Dname(sSSN=SSN2(Employee rSSN2, Dname(Dependents)) Natural Join Example Employee Name SSN John 999999999 Tony 777777777 Dependents SSN Dname 999999999 Emily 777777777 Joe Name SSN Dname John 999999999 Emily Tony 777777777 Joe

  7. Natural Join • R= S= • R S=

  8. Natural Join • Given the schemas R(A, B, C, D), S(A, C, E), what is the schema of R S ? • Given R(A, B, C), S(D, E), what is R S ? • Given R(A, B), S(A, B), what is R S ?

  9. Today’s Outline • First Order Logic as a Query Language • Reading assignment: 4.3 • Query languages and Complexity Classes • Conjunctive queries

  10. First Order Logic: Syntax • Given: • A vocabulary: R1, …, Rk • An arity, ar(Ri), for each i=1,…,k • An infinite supply of variables x1, x2, x3, … • Constants: c1, c2, c3, ... • FO formulas, , are:  ::= R(t1, ..., tar(R)) | ti = tj |   ’ |   ’ | ’ x. | x. t ::= x | c

  11. Examples of Formulas Most interesting case: Vocabulary = one binary relation R (encodes a graph) 1 4 2 3 R=

  12. Examples of Closed Formulas • Does there exists a loop in the graph ? • Are there paths of length >2 ? • Is there a “sink” node ?   x.R(x,x)   x.y.z.u.(R(x,y)  R(y,z)  R(z,u))   x.y.R(x,y)

  13. Examples of Closed Formulas • Is there a clique of size 4 ? • Here x1x2 stands for x1=x2, etc. • Is the graph transitively closed ? • Here A  B stands for A  B   x1. x2. x3. x4. (x1x2  x1x3  ...  x3x4  R(x1,x2)  R(x1,x3)  ...  R(x3,x4))   x.y.z.(R(x,y)  R(y,z)  R(x,z))

  14. Examples of Open Formulas • Find all nodes connected by a path of length 2: • Find all nodes without outgoing edges: (x,y)  u.(R(x,u)  R(u,y)) (x)  y.R(x,y)

  15. More Examples • Vocabulary (= schema): Employee(name, office, mgr) Manager(name, office) • Queries: • Find offices: • Find offices with at least two employees: • Find managers that share office with all their employees: [to do in class] (y)  (x.z.Employee(x,y,z)  x.Manager(x,y)) (y)  x.z.x’.z’.(Employee(x,y,z)  Employee(x’,y,z’)  xx’)

  16. First Order Logic: Semantics • Given a vocabulary R1, …, Rk • A model is D = (D, R1D, …, RkD) • D = a set, called domain, or universe • RiD D  D  ...  D, (ar(Ri) times) i = 1,...,k

  17. First Order Logic: Semantics • Given: • A model D = (D, R1D, ..., RkD) • A formula  • A substitution s : {x1, x2, ...}  D • We define next the relation:meaning “D satisfies with s” D |= [s]

  18. First Order Logic: Semantics If (s(t1), ..., s(tn))  RD D |= (R(t1, ..., tn)) [s] D |= (t= t’) [s] If s(t) = s(t’)

  19. First Order Logic: Semantics D |= (  ’) [s] If D |= ()[s] and D |= (’) [s] D |= (  ’) [s] If D |= ()[s] or D |= (’) [s] D |= (’) [s] If not D |= ()[s]

  20. First Order Logic: Semantics If for all s’ s.t. s(y) = s’(y) for all variables y other than x, D |= ()[s’] D |= (x.) [s] D |= (x.) [s] If for some s’ s.t. s(y) = s’(y) for all variables y other than x, D |= ()[s’]

  21. FO and Databases

  22. FO and Databases • FO:a closed formula  is true in D if D |=  • Databases:a formula  with free variables x1, ..., xn defines the query:(D) = {(s(x1), ..., s(xn)) | D |= [s]}

  23. FO and Databases • The Relational Calculus • The query language consisting of FO • The Tuple Calculus • A minor variation on the relational calculus • Uses tuple variables instead of atomic variables • Reading assignment 4.3 • But some “queries” in these languages make no sense • Define safe queries next

  24. Safe Queries A model D = (D, R1D, …, RkD) • In FO: • both D and R1D, …, RkD may be infinite • In databases: • D may infinite (int, string, etc) • R1D, …, RkD are always finite • We call this a finite model

  25. Safe Queries •  is a finite query if for every finite model D, (D) is finite •  is a domain independent query if for every two finite models D, D’ having the same relations:D = (D, R1D, …, RkD), D’ = (D’, R1D, …, RkD)we have (D) = (D’) • Domain independent query aka safe query • Notice: book has different but equivalent definition

  26. Unsafe Relational Queries • Find all nodes that are not in the graph: • Find all nodes that are connected to “everything”: • Find all pairs of employees or offices: • We don’t want such queries ! • Finite, but not safe

  27. Safe Queries • Definition. Given D = (D, R1D, …, RkD), the active domain is Da = the set of all constants in R1D, …, RkD • Example. Given a graph D = (D, R) Da = { x | y.R(x,y) z.R(z,x)} • Property. If a query is safe, it suffices to range quantifiers only over the active domain (why ?) • Hence we can compute safe queries

  28. Safe Queries • The safe relational calculus consists only of safe queries. However: • Theorem It is undecidable if a given a FO query is safe. • Work around: • Define a subset of FO queries that we know are safe = range restricted queries (rr-query) • Any safe query is equivalent to some rr-query

  29. Range-restriction • A syntactic condition on queries • See [AHU] for a definition. The intuition:

  30. Range-restriction • Theorem. Every safe query is equivalent to a range-restricted query • “Proof”. Translate as follows: R(x, y, ...)  x  Da  y  Da  ....  R(x, y, ...) x.  x.(x  Da  ) x.  x.(x  Da  ) • From now on we assume that all queries are safe

  31. FO = Relational Algebra • Recall the 5 operators in the relational algebra: • , -, , s, P • Theorem. A query can be defined in the safe relational calculus iff it can be defined in the relational algebra

  32. FO = Relational Algebra • Proof [in class]

  33. Limited Expressive Power • Vocabulary: binary relation R • The following queries cannot be expressed in FO: • Transitive closure: • x.y. there exists x1, ..., xn s.t.R(x,x1)  R(x1,x2)  ...  R(xn-1,xn)  R(xn,y) • Parity: the number of edges in R is even

  34. Extensions of FO FO(LFP) = FO extended with least fixpoint: • Example: define transitive closure like: T(x,y) = R(x,y)  z.(R(x,z)  T(z,y)) • Meaning. Define: T0 :=  Tn+1 = {(x,y) | R(x,y)  z.(R(x,z)  Tn(z,y))} • T0  T1  T2  . . .  Tk-1 = Tk stop. • The answer is: Tk • Q: How many steps do we need in the iteration ?

  35. ComputationalComplexity Classes Recall computational complexity classes: • AC0 • LOGSPACE • NLOGSPACE • PTIME • NP • PSPACE • EXPTIME • EXPSPACE • (Kalmar) Elementary Functions • Turing Computable functions We care mostly about these

  36. Query Languages andComplexity Classes Paper: On the Unusual Effectiveness of Logic in Computer Science PSPACE FO(PFP) PTIME FO(LFP AC0 FO Important: the more complex a QL, the harder it is to optimize

  37. Conjunctive Queries • Definition A conjunctive query is a FO restricted to R(t1, ..., tn), , (missing are , ,) • CQ = all conjunctive queries • Any CQ query can be written as:x1. x2... xn.(R1(t11,...,t1m)  ...  Rk(tk1,...,tkm)) • Same in Datalog notation:A(x1,...,xn) :- R1(t11,...,t1m), ... , Rk(tk1,...,tkm)

  38. Examples Employee(x), ManagedBy(x,y), Manager(y) • Find all employees having the same manager as Smith:A(x) :- ManagedBy(“Smith”,y), ManagedBy(x,y)

  39. Examples Employee(x), ManagedBy(x,y), Manager(y) • Find all employees having the same director as Smith:A(x) :- ManagedBy(“Smith”,y), ManagedBy(y,z), ManagedBy(x,u), ManagedBy(u,z)

  40. Equivalent Formulations Relational Algebra: • Conjunctive queries correspond precisely to sC, PA,  (missing: , –) • A(x) :- ManagedBy(“Smith”,y), ManagedBy(x,y) P$2.name $1.manager=$2.manager sname=“Smith” ManagedBy ManagedBy

  41. Equivalent Formulations SQL: • Conjunctive queries correspond to single select-disticnt-from-where blocks with equality conditions in the WHERE clause selectdistinct m2.namefrom ManagedBy m1, ManagedBy m2where m1.name=“Smith” AND m1.manager=m2.manager

  42. Conjunctive Queries • Most useful class of queries • Also enjoys remarkable, positive properties • Focus of research during 70’s, 80’s • Still focus of research in the 00’s • We discuss the most celebrated property of conjunctive queries: containment is decidable

  43. Query Containment • Definition Given two queries q1, q2, we say that q1 is contained in q2 if for every database D, q1(D)  q2(D). • Notation: q1 q2 • Obviously: if q1 q2 and q2 q1 then q1 = q2.

  44. Examples of Query Containments q1(x) :- R(x,u), R(u,v), R(v,w) q2(x) :- R(x,u), R(u,v) q1(x) :- R(x,u), R(u,”Smith”) q2(x) :- R(x,u), R(u,v) q1(x) :- R(x,u), R(u,u) q2(x) :- R(x,u), R(u,v) In all cases: q1 q2

  45. Examples of Query Containments q1(x,y) :- R(x,u),R(v,u),R(v,y)q2(x,y) :- R(x,u),R(v,u),R(v,w),R(t,w),R(t,y)Then q1 q2 (why ?)

  46. Examples of Query Containments q1(x) :- R(x,u), R(u,”Smith”), R(u,”Fred”), R(u, u) q2(x) :- R(x,u), R(u,v), R(u,”Smith”), R(w,u) Then q1 q2 (why ?)

  47. Query Containment • Theorem Query containment for FO is undecidable • Theorem Query containment for CQ is decidable and NP-complete.

  48. Query Containment The most interesting part: how we check q1 q2 • The canonical database and the canonical tuple for q1: • Canonical database: Dq1 = (D, R1, …, Rk) where: • D = all variables and constants in q1 • R1, …, Rk = the body of q1 • Canonical tuple: tq1 = the head of q1

  49. Examples of Canonical Databases • q1(x,y) :- R(x,u),R(v,u),R(v,y) • Dq1 = (D, R) • D={x,y,u,v} • R = • tq1 = (x,y)

More Related