a brief history of db theory n.
Skip this Video
Loading SlideShow in 5 Seconds..
A Brief History of DB Theory PowerPoint Presentation
Download Presentation
A Brief History of DB Theory

Loading in 2 Seconds...

play fullscreen
1 / 47

A Brief History of DB Theory - PowerPoint PPT Presentation

  • Uploaded on

A Brief History of DB Theory. Functional dependencies --- Normalization More dependencies: multivalued, general Universal relations Acyclic hypergraphs Logical query languages: Datalog. Functional Dependencies.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'A Brief History of DB Theory' - ila-ross

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
a brief history of db theory
A Brief History of DB Theory
  • Functional dependencies --- Normalization
  • More dependencies: multivalued, general
  • Universal relations
  • Acyclic hypergraphs
  • Logical query languages: Datalog
functional dependencies
Functional Dependencies
  • Setting: A relation = table with column headers (schema ) and a finite number of rows (tuples ).


0 1 2

0 3 4

fd x y
FD X ->Y
  • A statement about which instances (finite sets of tuples) are “legal” for a given relation.
  • If two tuples agree in all the attributes of set X , then then must also agree in all attributes of set Y.
    • Common case: X is the key of the relation, Y is the other attributes.
issue 1 inference
Issue #1: Inference
  • Armstrong’s Axioms
    • X ->Y if Y is a subset of X
    • X ->Y implies XZ ->YZ (XZ , etc. = “union”)
    • X ->Y and YZ ->W implies XZ ->W
  • Closure test (Bernstein): given FD’s and a possible consequent X ->Y , use the FD’s to see what X determines, and then see if Y is contained therein.
issue 2 redundancy
Issue #2: Redundancy
  • Certain combinations of FD’s lead to redundancy and other “anomalies.”
  • Example: B ->C is the only FD:


0 1 2

3 1 ?

  • The FD lets us predict the value of ?.
issue 3 normalization
Issue #3: Normalization
  • Eliminate redundancy by splitting schemas.
  • A FD X ->Y allows us to split schema XYZ into XY and XZ .
  • Without the FD, the decomposition would lack a lossless join : the ability to reconstruct the original XYZ relation from the decomposed relations.
how i met moshe learned database theory
How I Met Moshe & Learned Database Theory


Bernstein Beeri

Ullman Vardi

why i like working with vardi
Why I Like Working With Vardi
  • Taste, selection of issues.
  • Best inventor of constructions.
  • Name comes after mine in alphabet.
lossless joins
Lossless Joins
  • When relations are decomposed, do the pieces allow reconstruction of the original?
    • Only way: join the projected relations.
    • You always get back what you started with.
    • Bad case is when you get more.


0 1 2

3 1 4


0 1 0 1 4 1 2

3 1 3 1 2 1 4

0 1 2

3 1 4

more kinds of dependencies
More Kinds of Dependencies
  • In essence, a dependency is any predicate that tells whether a given set of tuples is OK for a given schema.
  • The language chosen determines what constraints can be expressed and what we can decide about relations.
    • FD’s are just one, simple example
multivalued dependencies
Multivalued Dependencies
  • These occur when a relation tries to connect one class of objects to independent sets from two other classes..
  • Notation: X ->->Y means:
    • If two tuples agree on X , then we may swap the Y components and get two tuples that are also in the relation.
  • EmpID ->-> Phone

EmpID Addr Phone Project

If: 123 a1 p1 j1

123 a2 p2 j2

Then: 123 a1 p2 j1

123 a2 p1 j2

mvd origins
MVD Origins
  • Independent work of Delobel, Fagin, Zaniolo
  • Inference of MVD’s + FD’s: Beeri, Fagin, and Howard.
generalized dependencies
Generalized Dependencies
  • Equality-generating dependencies:
    • If tuples with this pattern of equal symbols appear in a relation instance, then certain symbols must also be equal.
    • Generalizes FD’s.
  • Tuple-generating dependencies:
    • If tuples with this pattern of equal symbols appear in a relation instance, then a certain tuple must also appear.
    • Generalizes MVD’s.
example egd
Example: EGD
  • FD A ->B in schema ABC :


a b1 c1 (Hypotheses)

a b2 c2

b1 = b2 (Conclusion)

example tgd
Example: TGD
  • MVD A ->->B in schema ABC :


a b1 c1 (Hypotheses)

a b2 c2

a b1 c2 (Conclusion)

full versus embedded gd s
Full Versus Embedded GD’s
  • Full GD’s have a conclusion with only symbols that appear in the hypotheses.
  • Embedded GD’s may have new symbols in the conclusion.
    • Existentially quantified.
  • Embedded EGD makes no sense, but embedded TGD’s are quite interesting.
the chase for inferring gd s
The Chase for Inferring GD’s
  • Test whether a GD G follows from given GD’s.
  • Start with R = the hypotheses of G .
  • Apply GD H by mapping the hypotheses of H to some rows of R .
    • If so, infer the (mapped) conclusion of H --- equate two symbols of R or add a tuple to R .
  • If you eventually infer the conclusion of G , then G follows, else it does not.
example if a b then a b
Example: If A ->B Then A ->->B
  • R has tuples (a,b1,c1) and (a,b2,c2). Does it have (a,b1,c2)?
  • Apply A ->B .
    • It’s hypotheses are the same as the two tuples of R , so surely they map.
    • Lets us conclude b1=b2.
    • Thus, since R has (a,b2,c2), It surely has (a,b1,c2).
decision properties of gd s
Decision Properties of GD’s
  • If all dependencies are full, no new symbols ever appear, so the chase must terminate.
    • Either we prove the desired conclusion, or R becomes a relation that provides a counterexample --- it satisfies all the given GD’s, but not the one we were trying to infer.
undecidability of embedded gd s
Undecidability of Embedded GD’s
  • First undecidability results involved the inference of untyped GD’s (symbols may appear in several columns).
    • Beeri and Vardi
    • Chandra, Lewis, and Makoswky
  • Tough result is undecidability even when GD’s are typed (one column per symbol).
    • Vardi
    • Gurevich and Lewis
history of gd s
History of GD’s
  • Similar ideas were developed independently by several people:
    • Yannakakis and Papadimitriou
    • Beeri and Vardi
    • Fagin
    • Sadri and Ullman
the universal relation
The Universal Relation
  • Idea: attributes carry information regardless of how they are placed in relation schemas.
  • One of the cool things about Moshe Vardi is the collection he keeps of re-inventions of this universal relation concept.
    • Typically touted by some HCI type as a “wonderful new interface to databases.”
ur query systems
UR Query Systems





  • Stored relations: ES(Emp,Sal), ED(Emp,Dept), DM(Dept,Mgr)
  • Universal relation: U(Emp,Sal, Dept,Mgr)
  • UR query:


  • Translated to:


  • U != join(ES, ED, DM); empty ES proves it.
theory of universal relations
Theory of Universal Relations
  • Some were quite upset by the UR idea.
    • Objection: the informal or ad-hoc way queries were translated from UR to stored relations.
  • Codd: “Neither the collection of all base relations nor the collection of all views should be cast by the DBMS in the form of a `universal relation’ (in the Stanford University sense) [Vardi, 1988].”
some early approaches to the ur
Some Early Approaches to the UR
  • Selected joins of stored relations.
  • Representative instance = pad stored relations with nulls in missing attributes; then chase with given dependencies.
    • Contributions by Vassiliou, Honeyman, Mendelzon, Sagiv, Yannakakis.
  • Maximal objects = union of joins within “acyclic hypergraphs.”
    • Maier, Ullman
window functions
Window Functions
  • Two stage process:

1. If query involves set of attributes X , compute the window function [X ] = some relation with schema X derived from the UR.

2. Apply the query to [X ].

  • Different UR definitions give different window functions, e.g., join-based, rep.-instance, maximal-object.
  • From Maier, Ullman, Vardi [1984].
acyclic hypergraphs
Acyclic Hypergraphs
  • Several different applications led to the identification of a class of database schemas (collection of relation schemas) with useful properties.
    • Sensible connections among stored relations in maximal-object theory of UR.
    • Joins of many relations in time polynomial in the size of the input and output (Yannakakis).
    • Etc., etc.
  • Nodes (typically attributes)
  • (Hyper)edges = sets of any number of nodes.



gyo reduction
GYO Reduction
  • Graham, Yu-Ozsuyoglu, independently defined acyclic hypergraphs thusly:
  • Two transformations:

1. Delete a node that is in only one hyperedge.

2. Delete a hyperedge contained in another.

  • Hypergraph is acyclic iff the result is a single, empty edge.
    • Limit is unique for any hypergraph.
  • GYO steps: delete C; delete D; delete {B}; delete F; delete {B,E}; delete A; delete B; delete E;



  • Acyclic hypergraph idea from Bernstein and Goodman.
    • But the definition is rather different.
  • Hypergraph version from Fagin, Mendelzon, Beeri, Maier, others.
    • And don’t forget the GYO folks.
  • Fagin defined several related-but-distinct concepts of “acyclic.”
    • This one is “alpha-acyclic.”
logical query languages
Logical Query Languages
  • Collections of Horn-clause rules without function symbols behave almost like SQL, but can do recursion.
  • Conjunctive queries = single Horn-clause rules w/o function symbols have decidable containment (Chandra and Merlin).
conjunctive queries
Conjunctive Queries
  • Example: path(X,Y) :- edge(X,Z) & edge(Z,Y)
  • Atoms in the body (hypothesis) refer to stored relations (EDB or “Extensional Database”).
  • Atom in head (conclusion) is the result.
  • Q1 is contained in Q2 if for every database D, Q1(D ) is a subset of Q2(D ).
containment mappings
Containment Mappings
  • Mapping from the variables of Q2 to the variables of Q1 that:

1. Turns the head of Q2 into the head of Q1.

2. Turns every atom in the body of Q2 into some atom in the body of Q1.

  • Containment mapping exists iff Q1 is contained in Q2.

Q1: answer(X,Y) :- e(X,Y) & e(Y,X)

Q2: answer(X,Y) :- e(X,W) & e(W,Z) & e(Z,Y)

datalog programs
Datalog Programs
  • Collection of conjunctive queries.
  • Some predicates are EDB (stored).
  • Other predicates are IDB (intensional database; defined by the CQ’s only).
  • One IDB predicate is the answer = least fixedpoint of the CQ’s.

path(X,Y) :- edge(X,Y)

path(X,Y) :- path(X,Z) & edge(Z,Y)

  • Value of path is all pairs of nodes such that there is a path from the first to the second according to EDB predicate edge .
optimization of datalog programs
Optimization of Datalog Programs
  • Problem: often the query asks for only a fraction of the answer.
    • e.g., find path(0,Y) for fixed node 0.
  • Bottom-up methods (“seminaive”) evaluate the whole least-fixedpoint, throw most away.
  • Top-down (goal seeking) can get stuck in left-recursion.
linear recursion methods
Linear-Recursion Methods
  • Recursion is linear if at most one IDB atom in the body.
    • Henschen-Naqvi.
    • Magic-sets for linear (Bancilhon, Maier, Sagiv, Ullman).
    • Left- and right-linear special cases (Naughton).
    • Conversion of nonlinear to linear (Ramakrishnan, Sagiv, Ullman, Vardi).
magic sets
Magic Sets
  • Several similar techniques for rewriting or executing Datalog programs in a way that avoids generation of useless facts.
    • Rohmer, Lescoeur, and Kerisit (earliest exposition).
    • Beeri and Ramakrishnan (reordering of atoms).
    • Sacca and Zaniolo (simplification of rules).
    • Dietrich and Warren (tabulation).
    • Vielle (query-subquery).
bounded recursion
Bounded Recursion
  • Sometimes a recursive Datalog program is equivalent to a nonrecursive program.
  • Sufficient conditions (Naughton, Ioannidis, Naughton and Sagiv).
  • Undecidability (Gaifman, Mairson, Sagiv, and Vardi).
  • Decidable cases (Vardi, Cosmodakis, Gaifman, Kanellakis, and Vardi)
containment for datalog
Containment for Datalog
  • Undecidable if one Datalog program is contained in another (Shmueli).
  • NP-complete whether a CQ is contained in a Datalog program.
  • Triply exponential whether a Datalog program is contained in a CQ (Chaudhuri and Vardi).
what is it all good for
What Is It All Good For?
  • Dependencies and normalization are now familiar to most CS graduates.
    • Difference between BCNF and 3NF.
    • Tradeoffs: lossless joins versus maintaining dependencies.
  • Magic-sets, recursion used in IBM’s DB/2.
  • Universal-relation interfaces despite Codd.
more what is it all good for
More “What Is It All Good For?”
  • Information integration . While logical query languages have not caught on, logic has been used to specify how legacy databases are combined into a uniform whole.
    • Tsimmis (Papakonstantinou, Vassalos).
    • Information Manifold (Levy).
    • Infomaster (Duschka, Genesereth).