The history of datalog
1 / 23

The History of Datalog - PowerPoint PPT Presentation

  • Uploaded on

The History of Datalog. Origins Failure Resurrection. An Odd Encounter. Several years ago, I met a colleague, Monica Lam, in the hallway at Stanford. “I hear you were involved in the early work on Datalog.”

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' The History of Datalog' - mikkel

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
The history of datalog

The History of Datalog




An odd encounter
An Odd Encounter

  • Several years ago, I met a colleague, Monica Lam, in the hallway at Stanford.

  • “I hear you were involved in the early work on Datalog.”

  • She had discovered this work and used it in her system for large-scale data-flow analysis.

Odd encounter 2
Odd Encounter – (2)

  • The application is naturally recursive.

  • Very large-scale (analyzed code of 800K lines).

  • They (Monica and her student John Whaley) had an implementation bddbddb that compiled Datalog rules into BDD’s (binary decision diagrams).

Where did datalog come from
Where Did Datalog Come From?

  • Codd’s tuple and domain calculus (1972).

  • Gallaire and Minker’s “Logic and Databases” (1978).

  • Prolog (1976).

Codd s logics
Codd’s Logics

  • TRC. { t | R(r) and S(s) and t.A = r.A and r.B = s.B and t.C = s.C }

    • Implemented by Stonebraker as QUEL.

  • DRC. { ac | R(ab) and S(bc) }

    • Implemented by Zloof as Query-by-Example.

Logic and databases
“Logic and Databases”

  • Viewed queries as the result of an entire logical theory.

  • Thus allows recursion, negation, theories with multiple minimal models.

  • Closed/open-world evaluations.


  • A conventional programming language with predicates as function calls.

  • Bizarre execution rule.

  • Example: you have to write TC as:

    path(X,Y) :- arc(X,Y).

    path(X,Y) :- arc(X,Z),


Implementation of logical query languages for databases
Implementation of Logical Query Languages for Databases

  • In 1984 I took sabbatical at Hebrew University and wrote a paper with the above title.

  • It has some crazy stuff that makes me wonder “what was I thinking?”

  • Much was fixed by others, later.

  • Published in SIGMOD (no real theorems!).

Implementation 2
Implementation – (2)

  • Key idea: Prolog notation + Horn-clause, unique fixedpoint semantics.

  • Key idea: It’s about algorithms for query execution, not logical models.

    • Original thought in that direction was really by Henschen and Naqvi.

Enter datalog
Enter “Datalog”

  • The term “Datalog” to refer to positive Horn clauses without function symbols was first proposed by Dave Maier and David S. (“the other”) Warren.

  • Appears in their book Programming with Logic (1988), but in common use before that.

Good implementation ideas
Good Implementation Ideas

  • Seminaive evaluation (Bancilhon and Ramakrishnan, 1986 – also in SIGMOD).

  • Specialized linear-recursion implementations (many people including Naughton, Ramakrishnan, Sagiv, Vardi,…).

  • Magic sets (Beeri and Ramakrishnan, 1987 – finally something got into PODS).

Magic sets
Magic Sets

  • A query-rewriting scheme.

  • Similar in effect to a number of query-execution ideas such as

    • Query-Subquery (Rohmer, Lescoeur, and Kerasit, 1986).

    • Memoing (Dietrich and Warren, 1985).


  • With negated subgoals in Datalog

    • Example: bachelor(X) :- male(X),

      NOT married(X,Y)

      you run the risk of multiple minimal models.

  • Stratified model (Chandra-Harel, 1982; Apt, Blair, Walker, 1985).

  • Well-founded semantics (Van Gelder, Ross, Schlipf, 1988).

The death of datalog
The Death of Datalog

  • Recursion turned out not to be all that important in the world of the 1980’s.

  • In the AI community, where logic was taken more seriously than in DB, the emphasis was on expressiveness, not tractability.

The rebirth
The Rebirth

  • Datalog slept, but nothing could take away its important virtues:

    • Simplicity and declarativeness.

    • Tractability.

    • Simple execution engine.

  • While “rule-based systems” were long an AI staple, they never got these features of Datalog.


  • Why did Monica Lam think of Datalog for data-flow analysis?

  • Classical DFA was for code optimization.

    • Only inner loops are important, so data never needed to get really large.

Bddbddb 2
bddbddb – (2)

  • Monica was looking at a different application: software security.

    • Example: can a string read at one point be passed to a SQL call without first being the argument of a function that checks safety?

  • Entire program analyzed as a whole.

    • Example: 800K lines of Apache.

    • Now it’s a database problem.

Overlog and dedalus
Overlog and Dedalus

  • At about the same time, Joe Hellerstein was experimenting with Datalog, first for prototyping and later for the real implementation.

  • General direction: protocols for distributed systems.

Overlog and dedalus 2
Overlog and Dedalus – (2)

  • Two important additions: time and space as first-class concepts.

  • Example (space): Assume each node has a table of arcs out.

    • arc(@n, h) means the table at node n contains an arc to node h.

Example continued
Example – Continued

  • Each node n computes the set of nodes it can reach by consulting the reach sets for the nodes to which n has arcs.

    reach(@n, m) :- arc(@n, h),

    reach(@h, m).

Some other datalog directions
Some Other Datalog Directions

  • Webdamlog (Abiteboul et al., these proceedings).

    • Adds creation of rules at remote sites.

  • PrPl (Lam et al.).

    • Social networking in Datalog.

  • SecPAL (Becker et al.).

    • Microsoft authorization language translated to Datalog.

Other directions 2
Other Directions – (2)

  • LogicBlox (Molham Aref, CEO).

    • Startup in Atlanta GA.

      • One of several Datalog-based startups.

    • Uses Datalog for customized decision-support systems.

    • Many extensions, including controlled 2nd –order predicates.

    • Still has a tractable, straightforward execution model.


  • Too early to tell how important Datalog will be.

    • Will simplicity and tractability beat expressiveness?

  • But moving in the right direction(s) now.

  • From Datalog 2.0 Workshop: needs an open-source standard, like mySQL.