# The History of Datalog - PowerPoint PPT Presentation

Origins

Failure

Resurrection

• Several years ago, I met a colleague, Monica Lam, in the hallway at Stanford.

• “I hear you were involved in the early work on Datalog.”

• She had discovered this work and used it in her system for large-scale data-flow analysis.

• The application is naturally recursive.

• Very large-scale (analyzed code of 800K lines).

• They (Monica and her student John Whaley) had an implementation bddbddb that compiled Datalog rules into BDD’s (binary decision diagrams).

• Codd’s tuple and domain calculus (1972).

• Gallaire and Minker’s “Logic and Databases” (1978).

• Prolog (1976).

• TRC. { t | R(r) and S(s) and t.A = r.A and r.B = s.B and t.C = s.C }

• Implemented by Stonebraker as QUEL.

• DRC. { ac | R(ab) and S(bc) }

• Implemented by Zloof as Query-by-Example.

• Viewed queries as the result of an entire logical theory.

• Thus allows recursion, negation, theories with multiple minimal models.

• Closed/open-world evaluations.

• A conventional programming language with predicates as function calls.

• Bizarre execution rule.

• Example: you have to write TC as:

path(X,Y) :- arc(X,Y).

path(X,Y) :- arc(X,Z),

path(Z,Y).

• In 1984 I took sabbatical at Hebrew University and wrote a paper with the above title.

• It has some crazy stuff that makes me wonder “what was I thinking?”

• Much was fixed by others, later.

• Published in SIGMOD (no real theorems!).

• Key idea: Prolog notation + Horn-clause, unique fixedpoint semantics.

• Key idea: It’s about algorithms for query execution, not logical models.

• Original thought in that direction was really by Henschen and Naqvi.

• The term “Datalog” to refer to positive Horn clauses without function symbols was first proposed by Dave Maier and David S. (“the other”) Warren.

• Appears in their book Programming with Logic (1988), but in common use before that.

• Seminaive evaluation (Bancilhon and Ramakrishnan, 1986 – also in SIGMOD).

• Specialized linear-recursion implementations (many people including Naughton, Ramakrishnan, Sagiv, Vardi,…).

• Magic sets (Beeri and Ramakrishnan, 1987 – finally something got into PODS).

• A query-rewriting scheme.

• Similar in effect to a number of query-execution ideas such as

• Query-Subquery (Rohmer, Lescoeur, and Kerasit, 1986).

• Memoing (Dietrich and Warren, 1985).

• With negated subgoals in Datalog

• Example: bachelor(X) :- male(X),

NOT married(X,Y)

you run the risk of multiple minimal models.

• Stratified model (Chandra-Harel, 1982; Apt, Blair, Walker, 1985).

• Well-founded semantics (Van Gelder, Ross, Schlipf, 1988).

• Recursion turned out not to be all that important in the world of the 1980’s.

• In the AI community, where logic was taken more seriously than in DB, the emphasis was on expressiveness, not tractability.

• Datalog slept, but nothing could take away its important virtues:

• Simplicity and declarativeness.

• Tractability.

• Simple execution engine.

• While “rule-based systems” were long an AI staple, they never got these features of Datalog.

• Why did Monica Lam think of Datalog for data-flow analysis?

• Classical DFA was for code optimization.

• Only inner loops are important, so data never needed to get really large.

• Monica was looking at a different application: software security.

• Example: can a string read at one point be passed to a SQL call without first being the argument of a function that checks safety?

• Entire program analyzed as a whole.

• Example: 800K lines of Apache.

• Now it’s a database problem.

• At about the same time, Joe Hellerstein was experimenting with Datalog, first for prototyping and later for the real implementation.

• General direction: protocols for distributed systems.

• Two important additions: time and space as first-class concepts.

• Example (space): Assume each node has a table of arcs out.

• arc(@n, h) means the table at node n contains an arc to node h.

Example – Continued

• Each node n computes the set of nodes it can reach by consulting the reach sets for the nodes to which n has arcs.

reach(@n, m) :- arc(@n, h),

reach(@h, m).

• Webdamlog (Abiteboul et al., these proceedings).

• Adds creation of rules at remote sites.

• PrPl (Lam et al.).

• Social networking in Datalog.

• SecPAL (Becker et al.).

• Microsoft authorization language translated to Datalog.

• LogicBlox (Molham Aref, CEO).

• Startup in Atlanta GA.

• One of several Datalog-based startups.

• Uses Datalog for customized decision-support systems.

• Many extensions, including controlled 2nd –order predicates.

• Still has a tractable, straightforward execution model.

• Too early to tell how important Datalog will be.

• Will simplicity and tractability beat expressiveness?

• But moving in the right direction(s) now.

• From Datalog 2.0 Workshop: needs an open-source standard, like mySQL.