1 / 21

Data Models and Query Languages

Data Models and Query Languages. CSE 590DB, Winter 1999 Theory of Databases Zack Ives January 10, 1999. NewMovies Name Rating Director The Phantom Menace PG George Lucas Patch Adams PG-13 Tom Shadyac Stepmom PG-13 Chris Columbus. The Relational Model.

ksharp
Download Presentation

Data Models and Query Languages

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Models and Query Languages CSE 590DB, Winter 1999 Theory of Databases Zack Ives January 10, 1999

  2. NewMovies Name Rating Director The Phantom Menace PG George Lucas Patch Adams PG-13 Tom Shadyac Stepmom PG-13 Chris Columbus The Relational Model • Database consists of relations, where columns are attributes and rows are tuples • Declarative queries using an algebraic or logic language (or SQL, which we shall ignore) attribute tuple NewMovies(Name, Rating, Director) arity = 3, cardinality = 3

  3. Schema • Each attribute in a relation has a type: NewMovies(Name: String, Rating: RatingDomain, Director: String), where RatingDomain = {G, PG, PG-13, R, NC-17} (In many cases, we will omit the type, and assume the type most appropriate to the expression) • Relation schema: relation name, attributes, and types • Relation instance: a set of tuples for a given schema • Database schema: set of relation schemas • Database instance: relation instance for every relation schema

  4. More on Tuples • Formally, a tuple is a mapping from attribute names to (correctly typed) values: • Director -> “George Lucas” • Name -> “Stepmom” • May specify a tuple using values & schema notation: NewMovies(“The Phantom Menace”, “PG”, “George Lucas”) • Can have constraints on attribute values: • Integrity constraints • Keys • Foreign keys • Functional dependencies

  5. Relational Algebra All operators take one or more sets of tuples as input, and produce a new set as output • Basic set operators (, , —, but normally no complement) • Selection () • Projection () • Cartesian Product () • Join () • Division ()

  6. Set Operations • Sets must be compatible! • Same number of attributes • Same types/domains • R1  R2 = {all tuples in either R1 or R2} • R1  R2 = {all tuples in both R1 and R2} • Complement??? • Negation: R1 — R2 = {all tuples in R1 not in R2} • Query without unions is conjunctive

  7. NewMovies Name Rating Director The Phantom Menace PG George Lucas Patch Adams PG-13 Tom Shadyac Stepmom PG-13 Chris Columbus Name Rating Director Patch Adams PG-13 Tom Shadyac Stepmom PG-13 Chris Columbus Selection Chooses a subset of tuples satisfying a predicate Rating = “PG-13” (NewMovies)

  8. NewMovies Name Rating Director The Phantom Menace PG George Lucas Patch Adams PG-13 Tom Shadyac Stepmom PG-13 Chris Columbus Name, Director (NewMovies) Projection Chooses a subset of attributes, removes duplicates Name Director The Phantom Menace George Lucas Patch Adams Tom Shadyac Stepmom Chris Columbus

  9. NewMovie Name Director Phantom Menace George Lucas Patch Adams Tom Shadyac Stepmom Chris Columbus MetroShows Name Time Patch Adams 7:00 Patch Adams 9:40 Stepmom 6:50 Stepmom 9:00 NewMovie  MetroShows Name Director Name Time Phantom Menace George Lucas Patch Adams 7:00 Phantom Menace George Lucas Patch Adams 9:40 Phantom Menace George Lucas Stepmom 6:50 Phantom Menace George Lucas Stepmom 9:00 Patch Adams Tom Shadyac Patch Adams 7:00 … … … ... Cartesian Product • Combine two relations - all pairings of tuples

  10. NewMovie Name Director Phantom Menace George Lucas Patch Adams Tom Shadyac Stepmom Chris Columbus MetroShows Name Time Patch Adams 7:00 Patch Adams 9:40 Stepmom 6:50 Stepmom 9:00 NewMovie  NewMovie.Name = MetroShows.Name MetroShows Name Director Time Patch Adams Tom Shadyac 7:00 Patch Adams Tom Shadyac 9:40 Stepmom Chris Columbus 6:50 Stepmom Chris Columbus 9:00 Join • Combine tuples from two relations by predicate. If predicate is equality, remove duplicate attribute. Equivalent to: Name, Director, Time ( NewMovie.Name = MetroShows.Name (NewMovie  MetroShows))

  11. Division • Only returns results from dividend which match attributes of all of tuples in divisor (“for all”) Customers name branch Johnson Downtown Smith Brighton Lindsay Redwood Green Brighton Green Downtown Branches branch Brighton Downtown Customers  Branches name Green

  12. Logic-Based Query Languages • Tuple-relational calculus • Based on first-order predicate calculus, but imposes restrictions (e.g., no function symbols) • Equivalent in power to relational algebra • Datalog • More powerful than tuple-relational calculus Supports recursion and thus transitive closure • Equivalent to set of Horn clauses

  13. Predicates & Atoms • Relations represented by predicates • Tuples represented by atoms: Flight(“Alon”, “British Airways”, 1234, “Seattle”, “Israel”, “1/9/1999”, “10:00”) • Arithmetic atoms: X < 100, X + Y + 5 > Z * 2 • In certain cases, negated atoms: not Flight(“Zack”, “British Airways”, 1234, “Seattle”, “Israel”, “1/9/1999”, “10:00”)

  14. Datalog Rules & Programs • “Pure” datalog rules: head :- atom1, atom2, atom3 where all atoms are non-negated and relational • Datalog program is set of datalog rules • Conjunctive query has single rule: HanksDirector(D) :- ActorIn(“Hanks”, M) & DirectedBy(D, M) • Disjunctive query: HanksDirector(D) :- ActorIn(“Hanks”, M) & DirectedBy(D, M) HanksDirector(D) :- ProducedBy(“Hanks”, M) & DirectedBy(D, M)

  15. Intension & Extension • Distinction between EDB & IDB: • Extensional DB • what’s stored in the database • can only be in the body of a Datalog rule • Intensional DB • result of a datalog rule • can be in head or body of a Datalog rule

  16. Evaluating Rules • Evaluating rules: • Start with EDB, iteratively derive facts for IDB’s • Repeat until cannot derive any new facts: • Consider every possible assignment from database to variables in body • If each atom in body is made true by assignment, add tuple for the head into the head’s relation • Conjunctive queries are inexpensive • Disjunctive are more expensive (even though they’re Horn clauses)

  17. Transitive Closure • Suppose we to know all possible destination airports reachable from SeaTac, given an EDB Flight(From, To): SFO SEA LAX LGA DEN JFK We need to express the query: Find all airports reachable from SEA

  18. Recursion in Datalog • Program: Dest(X, Y) :- Flight(X, Y) Dest(X, Y) :- Dest(X, Z), Flight(Z, Y) • Evaluate unknown # of times until fixpoint • Flight: {(SEA,SFO), (SEA,LAX), (SEA,JFK), (LAX,LGA), (JFK,LGA),(LGA,DEN)}Dest0: {} • Dest1: Dest0  {(SEA,SFO), (SEA,LAX), (SEA,JFK)} • Dest2: Dest1  {(SEA,LGA),(LAX,DEN),(JFK,DEN)} • Dest3: Dest2  {(SEA,DEN)} • No more changes, so stop (minimum fixpoint)

  19. Built-in Predicates • Rules may include atoms with built-in predicates: Expensive(X) :- Product(X, Y, P) & P > 100 • But need to restrict use of built-in atoms: P(X) :- R(X) & X < Y • When is X < Y? • We require that every variable from a built-in atom appear in a relational atom

  20. Negated Subgoals • Restrict forms: P(X, Y) :- Between(X, Y, Z) & NOT Direct(X,Z) • Bad: Q(X, Y) :- From(X) & NOT To(Y) • Bad but fixable: P(X) :- From(X) & NOT Connects(X, Y) • Rewrite as: Connects’(X) :- Connects(X,Y) P(X) :- End(X) & NOT Connects’(X)

  21. Stratified/Stratisfiable Rules • Predicate P depends on predicate Q if Q appears negated in a rule defining P • If cycle in dependency graph, program is not stratifiable: p(X) :- r(X) & NOT q(X) q(X) :- r(X) & NOT p(X) • Suppose r has tuple {1} Intuitively, stratification provides a way of executing program P as sequence of subprograms P1 … Pn defining IDB relations w/o “forward references”. Results independent of stratification.

More Related