- 123 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'A Brief History of DB Theory' - ila-ross

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

A Brief History of DB Theory

- Functional dependencies --- Normalization
- More dependencies: multivalued, general
- Universal relations
- Acyclic hypergraphs
- Logical query languages: Datalog

Functional Dependencies

- Setting: A relation = table with column headers (schema ) and a finite number of rows (tuples ).

A B C

0 1 2

0 3 4

FD X ->Y

- A statement about which instances (finite sets of tuples) are “legal” for a given relation.
- If two tuples agree in all the attributes of set X , then then must also agree in all attributes of set Y.
- Common case: X is the key of the relation, Y is the other attributes.

Issue #1: Inference

- Armstrong’s Axioms
- X ->Y if Y is a subset of X
- X ->Y implies XZ ->YZ (XZ , etc. = “union”)
- X ->Y and YZ ->W implies XZ ->W
- Closure test (Bernstein): given FD’s and a possible consequent X ->Y , use the FD’s to see what X determines, and then see if Y is contained therein.

Issue #2: Redundancy

- Certain combinations of FD’s lead to redundancy and other “anomalies.”
- Example: B ->C is the only FD:

A B C

0 1 2

3 1 ?

- The FD lets us predict the value of ?.

Issue #3: Normalization

- Eliminate redundancy by splitting schemas.
- A FD X ->Y allows us to split schema XYZ into XY and XZ .
- Without the FD, the decomposition would lack a lossless join : the ability to reconstruct the original XYZ relation from the decomposed relations.

Why I Like Working With Vardi

- Taste, selection of issues.
- Best inventor of constructions.
- Name comes after mine in alphabet.

Lossless Joins

- When relations are decomposed, do the pieces allow reconstruction of the original?
- Only way: join the projected relations.
- You always get back what you started with.
- Bad case is when you get more.

More Kinds of Dependencies

- In essence, a dependency is any predicate that tells whether a given set of tuples is OK for a given schema.
- The language chosen determines what constraints can be expressed and what we can decide about relations.
- FD’s are just one, simple example

Multivalued Dependencies

- These occur when a relation tries to connect one class of objects to independent sets from two other classes..
- Notation: X ->->Y means:
- If two tuples agree on X , then we may swap the Y components and get two tuples that are also in the relation.

Example

- EmpID ->-> Phone

EmpID Addr Phone Project

If: 123 a1 p1 j1

123 a2 p2 j2

Then: 123 a1 p2 j1

123 a2 p1 j2

MVD Origins

- Independent work of Delobel, Fagin, Zaniolo
- Inference of MVD’s + FD’s: Beeri, Fagin, and Howard.

Generalized Dependencies

- Equality-generating dependencies:
- If tuples with this pattern of equal symbols appear in a relation instance, then certain symbols must also be equal.
- Generalizes FD’s.
- Tuple-generating dependencies:
- If tuples with this pattern of equal symbols appear in a relation instance, then a certain tuple must also appear.
- Generalizes MVD’s.

Full Versus Embedded GD’s

- Full GD’s have a conclusion with only symbols that appear in the hypotheses.
- Embedded GD’s may have new symbols in the conclusion.
- Existentially quantified.
- Embedded EGD makes no sense, but embedded TGD’s are quite interesting.

The Chase for Inferring GD’s

- Test whether a GD G follows from given GD’s.
- Start with R = the hypotheses of G .
- Apply GD H by mapping the hypotheses of H to some rows of R .
- If so, infer the (mapped) conclusion of H --- equate two symbols of R or add a tuple to R .
- If you eventually infer the conclusion of G , then G follows, else it does not.

Example: If A ->B Then A ->->B

- R has tuples (a,b1,c1) and (a,b2,c2). Does it have (a,b1,c2)?
- Apply A ->B .
- It’s hypotheses are the same as the two tuples of R , so surely they map.
- Lets us conclude b1=b2.
- Thus, since R has (a,b2,c2), It surely has (a,b1,c2).

Decision Properties of GD’s

- If all dependencies are full, no new symbols ever appear, so the chase must terminate.
- Either we prove the desired conclusion, or R becomes a relation that provides a counterexample --- it satisfies all the given GD’s, but not the one we were trying to infer.

Undecidability of Embedded GD’s

- First undecidability results involved the inference of untyped GD’s (symbols may appear in several columns).
- Beeri and Vardi
- Chandra, Lewis, and Makoswky
- Tough result is undecidability even when GD’s are typed (one column per symbol).
- Vardi
- Gurevich and Lewis

History of GD’s

- Similar ideas were developed independently by several people:
- Yannakakis and Papadimitriou
- Beeri and Vardi
- Fagin
- Sadri and Ullman

The Universal Relation

- Idea: attributes carry information regardless of how they are placed in relation schemas.
- One of the cool things about Moshe Vardi is the collection he keeps of re-inventions of this universal relation concept.
- Typically touted by some HCI type as a “wonderful new interface to databases.”

Example

- Stored relations: ES(Emp,Sal), ED(Emp,Dept), DM(Dept,Mgr)
- Universal relation: U(Emp,Sal, Dept,Mgr)
- UR query:

SELECT Mgr WHERE Emp=a

- Translated to:

SELECT Mgr FROM ED, DM WHERE Emp=a AND ED.Dept=DM.Dept

- U != join(ES, ED, DM); empty ES proves it.

Theory of Universal Relations

- Some were quite upset by the UR idea.
- Objection: the informal or ad-hoc way queries were translated from UR to stored relations.
- Codd: “Neither the collection of all base relations nor the collection of all views should be cast by the DBMS in the form of a `universal relation’ (in the Stanford University sense) [Vardi, 1988].”

Some Early Approaches to the UR

- Selected joins of stored relations.
- Representative instance = pad stored relations with nulls in missing attributes; then chase with given dependencies.
- Contributions by Vassiliou, Honeyman, Mendelzon, Sagiv, Yannakakis.
- Maximal objects = union of joins within “acyclic hypergraphs.”
- Maier, Ullman

Window Functions

- Two stage process:

1. If query involves set of attributes X , compute the window function [X ] = some relation with schema X derived from the UR.

2. Apply the query to [X ].

- Different UR definitions give different window functions, e.g., join-based, rep.-instance, maximal-object.
- From Maier, Ullman, Vardi [1984].

Acyclic Hypergraphs

- Several different applications led to the identification of a class of database schemas (collection of relation schemas) with useful properties.
- Sensible connections among stored relations in maximal-object theory of UR.
- Joins of many relations in time polynomial in the size of the input and output (Yannakakis).
- Etc., etc.

GYO Reduction

- Graham, Yu-Ozsuyoglu, independently defined acyclic hypergraphs thusly:
- Two transformations:

1. Delete a node that is in only one hyperedge.

2. Delete a hyperedge contained in another.

- Hypergraph is acyclic iff the result is a single, empty edge.
- Limit is unique for any hypergraph.

Example

- GYO steps: delete C; delete D; delete {B}; delete F; delete {B,E}; delete A; delete B; delete E;

A B C D

E F

History

- Acyclic hypergraph idea from Bernstein and Goodman.
- But the definition is rather different.
- Hypergraph version from Fagin, Mendelzon, Beeri, Maier, others.
- And don’t forget the GYO folks.
- Fagin defined several related-but-distinct concepts of “acyclic.”
- This one is “alpha-acyclic.”

Logical Query Languages

- Collections of Horn-clause rules without function symbols behave almost like SQL, but can do recursion.
- Conjunctive queries = single Horn-clause rules w/o function symbols have decidable containment (Chandra and Merlin).

Conjunctive Queries

- Example: path(X,Y) :- edge(X,Z) & edge(Z,Y)
- Atoms in the body (hypothesis) refer to stored relations (EDB or “Extensional Database”).
- Atom in head (conclusion) is the result.
- Q1 is contained in Q2 if for every database D, Q1(D ) is a subset of Q2(D ).

Containment Mappings

- Mapping from the variables of Q2 to the variables of Q1 that:

1. Turns the head of Q2 into the head of Q1.

2. Turns every atom in the body of Q2 into some atom in the body of Q1.

- Containment mapping exists iff Q1 is contained in Q2.

Datalog Programs

- Collection of conjunctive queries.
- Some predicates are EDB (stored).
- Other predicates are IDB (intensional database; defined by the CQ’s only).
- One IDB predicate is the answer = least fixedpoint of the CQ’s.

Example

path(X,Y) :- edge(X,Y)

path(X,Y) :- path(X,Z) & edge(Z,Y)

- Value of path is all pairs of nodes such that there is a path from the first to the second according to EDB predicate edge .

Optimization of Datalog Programs

- Problem: often the query asks for only a fraction of the answer.
- e.g., find path(0,Y) for fixed node 0.
- Bottom-up methods (“seminaive”) evaluate the whole least-fixedpoint, throw most away.
- Top-down (goal seeking) can get stuck in left-recursion.

Linear-Recursion Methods

- Recursion is linear if at most one IDB atom in the body.
- Henschen-Naqvi.
- Magic-sets for linear (Bancilhon, Maier, Sagiv, Ullman).
- Left- and right-linear special cases (Naughton).
- Conversion of nonlinear to linear (Ramakrishnan, Sagiv, Ullman, Vardi).

Magic Sets

- Several similar techniques for rewriting or executing Datalog programs in a way that avoids generation of useless facts.
- Rohmer, Lescoeur, and Kerisit (earliest exposition).
- Beeri and Ramakrishnan (reordering of atoms).
- Sacca and Zaniolo (simplification of rules).
- Dietrich and Warren (tabulation).
- Vielle (query-subquery).

Bounded Recursion

- Sometimes a recursive Datalog program is equivalent to a nonrecursive program.
- Sufficient conditions (Naughton, Ioannidis, Naughton and Sagiv).
- Undecidability (Gaifman, Mairson, Sagiv, and Vardi).
- Decidable cases (Vardi, Cosmodakis, Gaifman, Kanellakis, and Vardi)

Containment for Datalog

- Undecidable if one Datalog program is contained in another (Shmueli).
- NP-complete whether a CQ is contained in a Datalog program.
- Triply exponential whether a Datalog program is contained in a CQ (Chaudhuri and Vardi).

What Is It All Good For?

- Dependencies and normalization are now familiar to most CS graduates.
- Difference between BCNF and 3NF.
- Tradeoffs: lossless joins versus maintaining dependencies.
- Magic-sets, recursion used in IBM’s DB/2.
- Universal-relation interfaces despite Codd.

More “What Is It All Good For?”

- Information integration . While logical query languages have not caught on, logic has been used to specify how legacy databases are combined into a uniform whole.
- Tsimmis (Papakonstantinou, Vassalos).
- Information Manifold (Levy).
- Infomaster (Duschka, Genesereth).

Download Presentation

Connecting to Server..