Reasoning with Logic Programming

Reasoning with Logic Programming Luís Moniz Pereira José Júlio Alferes Utrecht, August 1999

Course Aim To provide a logic programming integrated framework and working system, for representing knowledge and reason about classical AI topics and applications, such as: • taxonomies • hypothetical reasoning • abduction • paraconsistent reasoning • revision • diagnosis • updating • actions

Course material • The course comprises research of the last decade on the use of Logic Programming for Knowledge Representation and Reasoning • This is now a mature area of research • It presents the basic formalism, proof procedures, and application domains • It articulates work on semantics, methodologies for KRR, belief revision, and knowledge updates • It shows open areas with research opportunities • At the end, pointers to bibliography are provided

Course topics • Overview of Logic Programs semantics • Explicit negation semantics • Paraconsistency • Methodology for KR in LP • Proof procedures • Belief revision and abduction • Updates of Logic Programs • Modeling actions

LP forKnowledge Representation • Due to its declarative nature, LP has become a prime candidate for Knowledge Representation and Reasoning • This has been more noticeable since its relations to other NMR formalisms were established • For this usage of LP, a precise declarative semantics was in order

Language • A Normal Logic Programs P is a set of rules: H ¬A1, …, An, not B1, … not Bm (n,m ³ 0) where H, Ai and Bj are atoms • Literal not Bj are called default literals • When no rule in P has default literal, P is called definite • The Herbrand base HP is the set of all instantiated atoms from program P. • We will consider programs as possibly infinite sets of instantiated rules.

Declarative Programming • A logic program can be an executable specification of a problem member(X,[X|Y]). member(X,[Y|L])¬ member(X,L). • Easier to program, compact code • Adequate for building prototypes • Given efficient implementations, why not use it to “program” directly?

LP and Deductive Databases • In a database, tables are viewed as sets of facts: • Other relations are represented with rules:

LP and Deductive DBs (cont) • LP allows to store, besides relations, rules for deducing other relations • Note that default negation cannot be classical negation in: • A form of Closed World Assumption (CWA) is needed for inferring non-availability of connections

Default Rules • The representation of default rules, such as “All birds fly” can be done via the non-monotonic operator not

The need for a semantics • In all the previous examples, classical logic is not an appropriate semantics • In the 1st, it does not derive not member(3,[1,2]) • In the 2nd, it never concludes choosing another company • In the 3rd, all abnormalities must be expressed • The precise definition of a declarative semantics for LPs is recognized as an important issue for its use in KRR.

2-valued Interpretations • A 2-valued interpretation I of P is a subset of HP • A is true in I (ie. I(A) = 1) iff AÎ I • Otherwise, A is false in I (ie. I(A) = 0) • Interpretations can be viewed as representing possible states of knowledge. • If knowledge is incomplete, there might be in some states atoms that are neither true nor false

3-valued Interpretations • A 3-valued interpretation I of P is a set I = T U not F where T and F are disjoint subsets of HP • A is true in I iff A Î T • A is false in I iff AÎ F • Otherwise, A is undefined (I(A) = 1/2) • 2-valued interpretations are a special case, where: HP = T U F

Models • Models can be defined via an evaluation function Î: • For an atom A, Î(A) = I(A) • For a formula F, Î(not F) = 1 - Î(F) • For formulas F and G: • Î((F,G)) = min(Î(F), Î(G)) • Î(F ¬ G)= 1 if Î(G) £ Î(F), and = 0 otherwise • I is a model of P iff, for all rule H ¬ B of P: Î(H ¬ B) = 1

Minimal Models Semantics • The idea of this semantics is to minimize positive information. What is implied as true by the program is true; everything else is false. • {pr(s),pr(e),ph(s),ph(e),aM(s),aM(e)}is a model • Lack of information that sampaio is a physicist, should indicate that he isn’t • The minimal model is: {pr(s),ph(e),aM(e)}

Minimal Models Semantics • [Truth ordering] For interpretations I and J, I £ J iff for all atom A, I(A) £ J(A), i.e. TIÍ TJ and FIÊ FJ • Every definite logic program has a least (truth ordering) model. • [minimal models semantics] An atom A is true in (definite) P iff A belongs to its least model. Otherwise, A is false in P.

TP operator • The minimal models of a definite P can be computed (bottom-up) via operator TP • [TP] Let I be an interpretation of definite P. TP(I) = {H: (H ¬ Body) Î P and Body Í I} • If P is definite, TP is monotone and continuous. Its minimal fixpoint can be built by: • I0 = {} and In = TP(In-1) • The least model of definite P is TPw({})

On Minimal Models • SLD can be used as a proof procedure for the minimal models semantics: • If the is a SLD-derivation for A, then A is true • Otherwise, A is false • The semantics does not apply to normal programs: • p ¬ not qhas two minimal models: {p} and {q} There is no least model !

The idea of completion • In LP one uses “if” but mean “iff” [Clark78] • This doesn’t imply that -1 is not a natural number! • With this program we mean: • This is the idea of Clark’s completion: • Syntactically transform if’s into iff’s • Use classical logic in the transformed theory to provide the semantics of the program

Program completion • The completion of P is the theory comp(P) obtained by: • Replace p(t) ¬j by p(X) ¬ X = t, j • Replace p(X) ¬j by p(X) ¬$Y j, where Y are the original variables of the rule • Merge all rules with the same head into a single one p(X) ¬j1Ú … Újn • For every q(X) without rules, add q(X) ¬^ • Replace p(X) ¬j by "X (p(X) Ûj)

Completion Semantics • Let comp(P) be the completion of P where not is interpreted as classical negation: • A is true in P iff comp(P) |= A • A is false in P iff comp(P) |= not A • Though completion’s definition is not that simple, the idea behind it is quite simple • Also, it defines a non-classical semantics by means of classical inference on a transformed theory

SLDNF proof procedure • By adopting completion, procedurally we have: not is “negation as finite failure” • In SLDNF proceed as in SLD. To prove not A: • If there is a finite derivation for A, failnot A • If, after any finite number of steps, all derivations for A fail, remove not A from the resolvent (i.e. succeed not A) • SLDNF can be efficiently implemented (cf. Prolog)

¬ a ¬ q ¬ not b ¬ not p ¬ p ¬ b ¬ p X ¬ not c ¬ p No success nor finite failure X SLDNF example p ¬ p. q ¬ not p. a ¬ not b. b ¬ not c. ¬ c • According to completion: • comp(P) |= {not a, b, not c} • comp(P) |¹p, comp(P) |¹not p • comp(P) |¹q, comp(P) |¹not q

Problems with completion • Some consistent programs may became inconsistent: p ¬ not p becomes p Û not p • Does not correctly deal with deductive closures edge(a,b). edge(c,d). edge(d,c). reachable(a). reachable(A) ¬ edge(A,B), reachable(B). • Completion doesn’t conclude not reachable(c), due to the circularity caused by edge(c,d) and edge(d,c) • Circularity is a procedural concept, not a declarative one

Completion Problems (cont) • Difficulty in representing equivalencies: abnormal(B) ¬ irregular(B) irregular(B) ¬ abnormal(B) bird(tweety). fly(B) ¬ bird(B), not abnormal(B). • Completion doesn’t conclude fly(tweety)! • Without the rules on the left fly(tweety) is true • An explanation for this would be: “the rules on the left cause a loop”. • Again, looping is a procedural concept, not a declarative one • When defining declarative semantics, procedural concepts should be rejected

Program stratification • Minimal models don’t have “loop” problems • But are only applicable to definite programs • Generalize Minimal Models to Normal LPs: • Divide the program into strata • The 1st is a definite program. Compute its minimal model • Eliminate all nots whose truth value was thus obtained • The 2nd becomes definite. Compute its MM • …

Stratification example p ¬ p a ¬ b b c ¬ not p d ¬ c, not a e ¬ a, not d f ¬ not c • Least(P1) = {a, b, not p} • Processing this, P2 becomes: c ¬true d ¬ c, false • Its minimal model, together with P1 is: {a, b, c, not d, not p} • Processing this, P3 becomes: e¬a, true f ¬false P1 P P2 P3 • The (desired) semantics for P is then: {a, b ,c, not d, e, not f, not p}

Stratification • Let S1;…;Sn be such that S1 U…U Sn = HP, all the Si are disjoint, and for all rules of P: A ¬ B1,…,Bm, not C1,…,not Ck if A Î Si then: • {B1,…,Bm} Í Uij=1 Sj • {C1,…,Ck} Í Ui-1j=1 Sj Let Pi contain all rules of P whose head belongs to Si. P1;…;Pn is a stratification of P

P1 P1 a b ¬ a c ¬ not a a b ¬ a c ¬ not a P2 P P P2 P3 Stratification (cont) • A program may have several stratifications: or • Or may have no stratification: b ¬ not a a ¬ not b • A Normal Logic Program is stratified iff it admits (at least) one stratification.

Semantics of stratified LPs • Let I|R be the restriction of interpretation I to the atoms in R, and P1;…;Pn be a stratification of P. Define the sequence: • M1 = least(P1) • Mi+1 is the minimal models of Pi+1 such that: Mi+1| (Uij=1 Sj) = Mi Mn is the standard model of P • A is true in P iff A Î Mn • Otherwise, A is false

Properties of Standard Model Let MP be the standard model of stratified P • MP is unique (does not depend on the stratification) • MP is a minimal model of P • MP is supported • A model M of program P is supported iff: A Î M Þ $ (A ¬ Body) Î P : Body Í M (true atoms must have a rule in P with true body)

Perfect models • The original definition of stratification (Apt et al.) was made on predicate names rather than atoms. • By abandoning the restriction of a finite number of strata, the definitions of Local Stratification and Perfect Models (Przymusinski) are obtained. This enlarges the scope of application: P1= {even(0)} P2= {even(1) ¬ not even(0)} ... even(0) even(s(X)) ¬ not even(X) • The program isn’t stratified (even/1 depends negatively on itself) but is locally stratified. • Its perfect model is: {even(0),not even(1),even(2),…}

Problems with stratification • Perfect models are adequate for stratified LPs • Newer semantics are generalization of it • But there are (useful) non-stratified LPs even(X) ¬ zero(X) zero(0) even(Y) ¬ suc(X,Y),not even(X) suc(X,s(X)) • Is not stratified because (even(0)¬suc(0,0),not even(0)) Î P • No stratification is possible if P has: pacifist(X)¬not hawk(X) hawk(Y)¬not pacifist(X) • Thisis useful in KR: “X is pacifist if it cannot be assume X is hawk, and vice-versa. If nothing else is said, it is undefined whether X is pacifist or hawk”

SLS procedure • In perfect models not includes infinite failure • SLS is a (theoretical) procedure for perfect models based on possible infinite failure • No complete implementation is possible (how to detect infinite failure?) • Sound approximations exist: • based on loop checking (with ancestors) • based on tabulation techniques (cf. XSB-Prolog implementation)

Stable Models Idea • The construction of perfect models can be done without stratifying the program. Simply guess the model, process it into P and see if its least model coincides with the guess. • If the program is stratified, the results coincide: • A correct guess must coincide on the 1st strata; • and on the 2nd (given the 1st), and on the 3rd … • But this can be applied to non-stratified programs…

Stable Models Idea (cont) • “Guessing a model” corresponds to “assuming default negations not”. This type of reasoning is usual in NMR • Assume some default literals • Check in P the consequences of such assumptions • If the consequences completely corroborate the assumptions, they form a stable model • The stable models semantics is defined as the intersection of all the stable models (i.e. what follows, no matter what stable assumptions)

SMs: preliminary example a ¬ not b c ¬ a p ¬ not q b ¬ not a c ¬ b q ¬ not r r • Assume, e.g., not r and not p as true, and all others as false. By processing this into P: a ¬false c ¬ a p ¬false b ¬false c ¬ b q ¬true r • Its least model is {not a, not b, not c, not p, q, r} • So, it isn’t a stable model: • By assuming not r, r becomes true • not a is not assumed and a becomes false

SMs example (cont) a ¬ not b c ¬ a p ¬ not q b ¬ not a c ¬ b q ¬ not r r • Now assume, e.g., not b and not q as true, and all others as false. By processing this into P: a ¬true c ¬ a p ¬true b ¬false c ¬ b q ¬false r • Its least model is {a, not b, c, p, not q, r} • I is a stable model • The other one is {not a, b, c, p, not q, r} • According to Stable Model Semantics: • c, r and p are true and q is false. • a and b are undefined

Stable Models definition • Let I be a (2-valued) interpretation of P. The definite program P/I is obtained from P by: • deleting all rules whose body has not A, and AÎ I • deleting from the body all the remaining default literals GP(I) = least(P/I) • M is a stable model of P iff M = GP(M). • A is true in P iff A belongs to all SMs of P • A is false in P iff A doesn’t belongs to any SMs of P (i.e. not A “belongs” to all SMs of P).

Properties of SMs • Stable models are minimal models • Stable models are supported • If P is locally stratified then its single stable model is the perfect model • Stable models semantics assign meaning to (some) non-stratified programs • E.g. the one in the example before

Importance of Stable Models Stable Models were an important contribution: • Introduced the notion of default negation (versus negation as failure) • Allowed important connections to NMR. Started the area of LP&NMR • Allowed for a better understanding of the use of LPs in Knowledge Representation It is considered as THE semantics of LPs by a significant part of the community. But...

Cumulativity • A semantics Sem is cumulative iff for every P: if AÎSem(P) and BÎSem(P) then BÎSem(P U {A}) (i.e. all derived atoms can be added as facts, without changing the program’s meaning) • This property is important for implementation: • without cumulativity, tabling methods cannot be used

Relevance • A directly depends on B if B occurs in the body of some rule with head A. A depends on B if A directly depends on B or there is a C such that A directly depends on C and C depends on B. • A semantics Sem is relevant iff for every P: AÎSem(P) iff AÎSem(RelA(P)) where RelA(P) contains all rules of P whose head is A or some B on which A depends on. • Only this property allows for the usual top-down execution of logic programs.

Problems with SMs • Don’t provide a meaning to every program: • P = {a¬not a} has no stable models • It’s non-cumulative and non-relevant: The only SM is {not a, c, b} a ¬ not b c ¬ not a b ¬ not a c ¬ not c • However b is not true in P U {c} (non-cumulative) • P U {c} has 2 SMs: {not a, b, c} and {a, not b, c} • b is not true in Relb(P) (non-relevance) • The rules in Relb(P) are the 2 on the left • Relb(P) has 2 SMs: {not a, b} and {a, not b}

Problems with SMs (cont) • Its computation is NP-Complete • The intersection of SMs is non-supported: c is true but neither a nor b are true. a ¬ not b c ¬ a b ¬ not a c ¬ b • Note that the perfect model semantics: • is cumulative • is relevant • is supported • its computation is polynomial

Well Founded Semantics • Defined in [GRS90], generalizes SMs to 3-valued models. • Note that: • there are programs with no fixpoints of G • but all have fixpoints of G2 P = {a ¬ not a} • G({a}) = {} andG({}) = {a} • There are no stable models • But: G2({}) = {} and G2({a}) = {a}

Partial Stable Models • A 3-valued intr. (T U not F) is a PSM of P iff: • T = G2(T) • T ÍG(T) • F = HP - G(T) The 2nd condition guarantees that no atom is both true and false: T Ç F = {} P = {a ¬ not a}, has a single PSM: {} This program has 3 PSMs: {}, {a, not b} and {c, b, not a} The 3rd corresponds to the single SM a ¬ not b c ¬ not a b ¬ not a c ¬ not c

WFS definition • [WF Model] Every P has a knowledge ordering (i.e. wrt Í) least PSM, obtainable by the transfinite sequence: • T0 = {} • Ti+1 = G2(Ti) • Td = Ua<d Ta, for limit ordinals d Let T be the least fixpoint obtained. MP = T U not (HP - G(T)) is the well founded model of P.

Well Founded Semantics • Let M be the well founded model of P: • A is true in P iff AÎ M • A is false in P iff not AÎ M • Otherwise (i.e. AÏ M and not AÏ M) A is undefined in P

WFS Properties • Every program is assigned a meaning • Every PSM extends one SM • If WFM is total it coincides with the single SM • It is sound wrt to the SMs semantics • If P has stable models and A is true (resp. false) in the WFM, it is also true (resp. false) in the intersection of SMs • WFM coincides with the perfect model in locally stratified programs (and with the least model in definite programs)

Reasoning with Logic Programming