The history of datalog
This presentation is the property of its rightful owner.
Sponsored Links
1 / 23

The History of Datalog PowerPoint PPT Presentation


  • 82 Views
  • Uploaded on
  • Presentation posted in: General

The History of Datalog. Origins Failure Resurrection. An Odd Encounter. Several years ago, I met a colleague, Monica Lam, in the hallway at Stanford. “I hear you were involved in the early work on Datalog.”

Download Presentation

The History of Datalog

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


The history of datalog

The History of Datalog

Origins

Failure

Resurrection


An odd encounter

An Odd Encounter

  • Several years ago, I met a colleague, Monica Lam, in the hallway at Stanford.

  • “I hear you were involved in the early work on Datalog.”

  • She had discovered this work and used it in her system for large-scale data-flow analysis.


Odd encounter 2

Odd Encounter – (2)

  • The application is naturally recursive.

  • Very large-scale (analyzed code of 800K lines).

  • They (Monica and her student John Whaley) had an implementation bddbddb that compiled Datalog rules into BDD’s (binary decision diagrams).


Where did datalog come from

Where Did Datalog Come From?

  • Codd’s tuple and domain calculus (1972).

  • Gallaire and Minker’s “Logic and Databases” (1978).

  • Prolog (1976).


Codd s logics

Codd’s Logics

  • TRC. { t | R(r) and S(s) and t.A = r.A and r.B = s.B and t.C = s.C }

    • Implemented by Stonebraker as QUEL.

  • DRC. { ac | R(ab) and S(bc) }

    • Implemented by Zloof as Query-by-Example.


Logic and databases

“Logic and Databases”

  • Viewed queries as the result of an entire logical theory.

  • Thus allows recursion, negation, theories with multiple minimal models.

  • Closed/open-world evaluations.


Prolog

Prolog

  • A conventional programming language with predicates as function calls.

  • Bizarre execution rule.

  • Example: you have to write TC as:

    path(X,Y) :- arc(X,Y).

    path(X,Y) :- arc(X,Z),

    path(Z,Y).


Implementation of logical query languages for databases

Implementation of Logical Query Languages for Databases

  • In 1984 I took sabbatical at Hebrew University and wrote a paper with the above title.

  • It has some crazy stuff that makes me wonder “what was I thinking?”

  • Much was fixed by others, later.

  • Published in SIGMOD (no real theorems!).


Implementation 2

Implementation – (2)

  • Key idea: Prolog notation + Horn-clause, unique fixedpoint semantics.

  • Key idea: It’s about algorithms for query execution, not logical models.

    • Original thought in that direction was really by Henschen and Naqvi.


Enter datalog

Enter “Datalog”

  • The term “Datalog” to refer to positive Horn clauses without function symbols was first proposed by Dave Maier and David S. (“the other”) Warren.

  • Appears in their book Programming with Logic (1988), but in common use before that.


Good implementation ideas

Good Implementation Ideas

  • Seminaive evaluation (Bancilhon and Ramakrishnan, 1986 – also in SIGMOD).

  • Specialized linear-recursion implementations (many people including Naughton, Ramakrishnan, Sagiv, Vardi,…).

  • Magic sets (Beeri and Ramakrishnan, 1987 – finally something got into PODS).


Magic sets

Magic Sets

  • A query-rewriting scheme.

  • Similar in effect to a number of query-execution ideas such as

    • Query-Subquery (Rohmer, Lescoeur, and Kerasit, 1986).

    • Memoing (Dietrich and Warren, 1985).


Negation

Negation

  • With negated subgoals in Datalog

    • Example: bachelor(X) :- male(X),

      NOT married(X,Y)

      you run the risk of multiple minimal models.

  • Stratified model (Chandra-Harel, 1982; Apt, Blair, Walker, 1985).

  • Well-founded semantics (Van Gelder, Ross, Schlipf, 1988).


The death of datalog

The Death of Datalog

  • Recursion turned out not to be all that important in the world of the 1980’s.

  • In the AI community, where logic was taken more seriously than in DB, the emphasis was on expressiveness, not tractability.


The rebirth

The Rebirth

  • Datalog slept, but nothing could take away its important virtues:

    • Simplicity and declarativeness.

    • Tractability.

    • Simple execution engine.

  • While “rule-based systems” were long an AI staple, they never got these features of Datalog.


Bddbddb

bddbddb

  • Why did Monica Lam think of Datalog for data-flow analysis?

  • Classical DFA was for code optimization.

    • Only inner loops are important, so data never needed to get really large.


Bddbddb 2

bddbddb – (2)

  • Monica was looking at a different application: software security.

    • Example: can a string read at one point be passed to a SQL call without first being the argument of a function that checks safety?

  • Entire program analyzed as a whole.

    • Example: 800K lines of Apache.

    • Now it’s a database problem.


Overlog and dedalus

Overlog and Dedalus

  • At about the same time, Joe Hellerstein was experimenting with Datalog, first for prototyping and later for the real implementation.

  • General direction: protocols for distributed systems.


Overlog and dedalus 2

Overlog and Dedalus – (2)

  • Two important additions: time and space as first-class concepts.

  • Example (space): Assume each node has a table of arcs out.

    • arc(@n, h) means the table at node n contains an arc to node h.


Example continued

Example – Continued

  • Each node n computes the set of nodes it can reach by consulting the reach sets for the nodes to which n has arcs.

    reach(@n, m) :- arc(@n, h),

    reach(@h, m).


Some other datalog directions

Some Other Datalog Directions

  • Webdamlog (Abiteboul et al., these proceedings).

    • Adds creation of rules at remote sites.

  • PrPl (Lam et al.).

    • Social networking in Datalog.

  • SecPAL (Becker et al.).

    • Microsoft authorization language translated to Datalog.


Other directions 2

Other Directions – (2)

  • LogicBlox (Molham Aref, CEO).

    • Startup in Atlanta GA.

      • One of several Datalog-based startups.

    • Uses Datalog for customized decision-support systems.

    • Many extensions, including controlled 2nd –order predicates.

    • Still has a tractable, straightforward execution model.


Conclusions

Conclusions

  • Too early to tell how important Datalog will be.

    • Will simplicity and tractability beat expressiveness?

  • But moving in the right direction(s) now.

  • From Datalog 2.0 Workshop: needs an open-source standard, like mySQL.


  • Login