Extending Bayesian Logic Programs for Plan Recognition and Machine Reading

Extending Bayesian Logic Programsfor Plan Recognition and Machine Reading Sindhu V. Raghavan Advisor: Raymond Mooney PhD Proposal May 12th, 2011

Plan Recognition in Intelligent User Interfaces $ cd my-dir $ cp test1.txt test-dir $ rm test1.txt What task is the user performing? move-file Which files and directories are involved? test1.tx and test-dir Can the task be performed more efficiently? Data is relational in nature - several files and directories and several relations between them

Characteristics of Real World Data • Relational or structured data • Several entities in the domain • Several relations between entities • Do not always follow the i.i.d assumption • Presence of noise or uncertainty • Uncertainty in the types of entities • Uncertainty in the relations Traditional approaches like first-order logic or probabilistic models can handle either structured dataor uncertainty,but not both.

Statistical Relational Learning (SRL) • Integratesfirst-order logic and probabilistic graphical models [Getoor and Taskar, 2007] • Combines strengths of both first-order logic and probabilistic models • SRL formalisms • Stochastic Logic Programs (SLPs) [Muggleton, 1996] • Probabilistic Relational Models (PRMs)[Friedman et al., 1999] • Bayesian Logic Programs (BLPs) [Kersting and De Raedt, 2001] • Markov Logic Networks (MLNs) [Richardson and Dominogs, 2006]

Statistical Relational Learning (SRL) • Integratesfirst-order logic and probabilistic graphical models [Getoor and Taskar, 2007] • Combines strengths of both first-order logic and probabilistic models • SRL formalisms • Stochastic Logic Programs (SLPs) [Muggleton, 1996] • Probabilistic Relational Models (PRMs)[Friedman et al., 1999] • Bayesian Logic Programs (BLPs)[Kersting and De Raedt, 2001] • Markov Logic Networks (MLNs) [Richardson and Dominogs, 2006]

Bayesian Logic Programs (BLPs) • Integrate first-order logic and Bayesian networks • Why BLPs? • Efficient grounding mechanism that includes only those variables that are relevant to the query • Easy to extend by incorporating any type oflogical inferenceto construct networks • Well suited for capturing causal relations in data

Objectives BLPs for Plan Recognition BLPs for Machine Reading

Objectives Plan recognition involves predicting the top-level plan of an agent based on its actions BLPs for Machine Reading

Objectives BLPs for Plan Recognition Machine Reading involves automatic extraction of knowledge from natural language text

Outline • Motivation • Background • First-order logic • Logical Abduction • Bayesian Logic Programs (BLPs) • Completed Work • Part 1 – Extending BLPs for Plan Recognition • Part 2 – Extending BLPs for Machine Reading • Proposed Work • Conclusions

First-order Logic • Terms • Constants – individual entities like anna, bob • Variables – placeholders for objects like X, Y • Predicates • Relations over entities like worksFor, capitalOf • Literal – predicate or its negation applied to terms • Atom – Positive literal like worksFor(X,Y) • Ground literal – literal with no variables like worksFor(anna,bob) • Clause – disjunction of literals • Horn clause has at most one positive literal • Definite clause has exactly one positive literal

First-order Logic • Quantifiers • Universal quantification - true for all objects in the domain • Existential quantification - true for some objects in the domain • Logical Inference • Forward Chaining– For every implication pq, if p is true, then q is concluded to be true • Backward Chaining – For a query literal q, if an implication pq is present and p is true, then q is concluded to be true, otherwise backward chaining tries to prove p

Logical Abduction • Abduction • Process of finding the best explanation for a set of observations • Given • Background knowledge, B, in the form of a set of (Horn) clauses in first-order logic • Observations, O, in the form of atomic facts in first-order logic • Find • A hypothesis, H, a set of assumptions(atomic facts) that logically entail the observations given the theory: B  H  O • Best explanation is the one with the fewest assumptions

Bayesian Logic Programs (BLPs) [Kersting and De Raedt, 2001] • Set of Bayesian clauses a|a1,a2,....,an • Definite clauses that are universally quantified • Head of the clause - a • Body of the clause - a1, a2, …, an • Range-restricted, i.e variables{head} variables{body} • Associated conditional probability table (CPT) • P(head|body) • Bayesian predicates a, a1, a2, …, an have finite domains • Combining rule like noisy-or for mapping multiple CPTs into a single CPT.

Logical Inference in BLPs SLD Resolution Proof alarm(james) BLP lives(james,yorkshire). lives(stefan,freiburg). neighborhood(james). tornado(yorkshire). burglary(X) | neighborhood(X). alarm(X) | burglary(X). alarm(X) | lives(X,Y), tornado(Y). Query alarm(james) Example from Ngo and Haddawy, 1997

Logical Inference in BLPs SLD Resolution Proof alarm(james) BLP lives(james,yorkshire). lives(stefan,freiburg). neighborhood(james). tornado(yorkshire). burglary(X) | neighborhood(X). alarm(X) | burglary(X). alarm(X) | lives(X,Y), tornado(Y). burglary(james) Query alarm(james) Example from Ngo and Haddawy, 1997

Logical Inference in BLPs SLD Resolution Proof alarm(james) BLP lives(james,yorkshire). lives(stefan,freiburg). neighborhood(james). tornado(yorkshire). burglary(X) | neighborhood(X). alarm(X) | burglary(X). alarm(X) | lives(X,Y), tornado(Y). burglary(james) Query alarm(james) neighborhood(james) Example from Ngo and Haddawy, 1997

Logical Inference in BLPs SLD Resolution Proof alarm(james) BLP lives(james,yorkshire). lives(stefan,freiburg). neighborhood(james). tornado(yorkshire). burglary(X) | neighborhood(X). alarm(X) | burglary(X). alarm(X) | lives(X,Y), tornado(Y). burglary(james) lives(james,Y) tornado(Y) Query alarm(james) neighborhood(james) Example from Ngo and Haddawy, 1997

Logical Inference in BLPs SLD Resolution Proof alarm(james) BLP lives(james,yorkshire). lives(stefan,freiburg). neighborhood(james). tornado(yorkshire). burglary(X) | neighborhood(X). alarm(X) | burglary(X). alarm(X) | lives(X,Y), tornado(Y). burglary(james) lives(james,Y) tornado(Y) Query alarm(james) lives(james,yorkshire) neighborhood(james) Example from Ngo and Haddawy, 1997

Logical Inference in BLPs SLD Resolution Proof alarm(james) BLP lives(james,yorkshire). lives(stefan,freiburg). neighborhood(james). tornado(yorkshire). burglary(X) | neighborhood(X). alarm(X) | burglary(X). alarm(X) | lives(X,Y), tornado(Y). burglary(james) lives(james,Y) tornado(Y) Query alarm(james) lives(james,yorkshire) neighborhood(james) tornado(yorkshire) Example from Ngo and Haddawy, 1997

Bayesian Network Construction neighborhood(james) burglary(james) lives(james,yorkshire) tornado(yorkshire) alarm(james) Each ground atom becomes a node (random variable) in the Bayesian network Edges are added from ground atoms in the body of a clause to the ground atom in the head Specify probabilistic parameters using the CPTs associated with Bayesian clauses

Bayesian Network Construction neighborhood(james) burglary(james) tornado(yorkshire) lives(james,yorkshire) alarm(james) Each ground atom becomes a node (random variable) in the Bayesian network Edges are added from ground atoms in the body of a clause to the ground atom in the head Specify probabilistic parameters using the CPTs associated with Bayesian clauses

Bayesian Network Construction ✖ neighborhood(james) lives(stefan,freiburg) burglary(james) lives(james,yorkshire) tornado(yorkshire) alarm(james) Each ground atom becomes a node (random variable) in the Bayesian network Edges are added from ground atoms in the body of a clause to the ground atom in the head Specify probabilistic parameters using the CPTs associated with Bayesian clauses Use combining rule to combine multiple CPTs into a single CPT

Probabilistic Inference and Learning • Probabilistic inference • Marginal probability given evidence • Most Probable Explanation (MPE) given evidence • Learning [Kersting and De Raedt, 2008] • Parameters • Expectation Maximization • Gradient-ascent based learning • Structure • Hill climbing search through the space of possible structures

Part 1Extending BLPs for Plan Recognition[Raghavan and Mooney, 2010]

Plan Recognition • Predict an agent’s top-level plans based on the observed actions • Abductive reasoning involving inference of cause from effect • Applications • Story Understanding • Recognize character’s motives or plans based on its actions to answer questions about the story • Strategic planning • Predict other agents’ plans so as to work co-operatively • Intelligent User Interfaces • Predict the task that the user is performing so as to provide valuable tips to perform the task more efficiently

Related Work • First-order logic based approaches [Kautz and Allen, 1986; Ng and Mooney, 1992] • Knowledge base of plans and actions • Default reasoning or logical abduction to predict the best plan based on the observed actions • Unable to handle uncertainty in data or estimate likelihood of alternative plans • Probabilistic graphical models [Charniak and Goldman, 1989; Huber et al., 1994; Pynadath and Wellman, 2000; Bui, 2003; Blaylock and Allen, 2005] • Encode the domain knowledge using Bayesian networks, abstract hidden Markov models, or statistical n-gram models • Unable to handle relational/structured data

Related Work • Markov Logic based approaches [Kate and Mooney, 2009; Singla and Mooney, 2011] • Logical inference in MLNs is deductivein nature • MLN-PC [Kate and Mooney, 2009] • Add reverse implications to handle abductive reasoning • Does not scale to large domains • MLN-HC[Singla and Mooney, 2011] • Improves over MLN-PC by adding hidden causes • Still does not scale to large domains • MLN-HCAM [Singla and Mooney, 2011]uses logical abduction to constrain grounding • Incorporates ideas from the BLP approach for plan recognition [Raghavan and Mooney, 2010] • MLN-HCAM performs better than MLN-PC and MLN-HC

BLPs for Plan Recognition • Why BLPs ? • Directed model captures causal relationships well • Efficient grounding process results in smaller networks, unlike in MLNs • SLD resolution is deductive inference, used for predicting observed actions from top-level plans • Plan recognition is abductive in nature and involves predicting the top-level plan from observed actions BLPs cannot be used as is for plan recognition

Extending BLPs for Plan Recognition BLPs Logical Abduction BALPs BALPs – Bayesian Abductive Logic Programs

Logical Abduction in BALPs • Given • A set of observation literals O = {O1, O2,….On} and a knowledge base KB • Compute a set abductive proofs of O using Stickel’s abduction algorithm[Stickel 1988] • Backchain on each Oi until it is proved or assumed • A literal is said to be proved if it unifies with a fact or the head of some rule in KB, otherwise it is said to be assumed • Construct a Bayesian network using the resulting set of proofs as in BLPs.

Example – Intelligent User Interfaces • Top-level plans predicates • copy-file, move-file, remove-file • Actions predicates • cp, rm • Knowledge Base (KB) • cp(filename,destdir) | copy-file(filename,destdir) • cp(filename,destdir) | move-file(filename,destdir) • rm(filename) | move-file(filename,destdir) • rm(filename) | remove-file(filename) • Observed actions • cp(Test1.txt, Mydir) • rm(Test1.txt)

Abductive Inference Assumed literal copy-file(Test1.txt,Mydir) cp(Test1.txt,Mydir) cp(filename,destdir) | copy-file(filename,destdir)

Abductive Inference Assumed literal copy-file(Test1.txt,Mydir) move-file(Test1.txt,Mydir) cp(Test1.txt,Mydir) cp(filename,destdir) | move-file(filename,destdir)

Abductive Inference Match existing assumption copy-file(Test1.txt,Mydir) move-file(Test1.txt,Mydir) rm(Test1.txt) cp(Test1.txt,Mydir) • rm(filename) | move-file(filename,destdir)

Abductive Inference Assumed literal copy-file(Test1.txt,Mydir) move-file(Test1.txt,Mydir) remove-file(Test1) rm(Test1.txt) cp(Test1.txt,Mydir) • rm(filename) | remove-file(filename)

Structure of Bayesian network copy-file(Test1.txt,Mydir) move-file(Test1.txt,Mydir) remove-file(Test1) rm(Test1.txt) cp(Test1.txt,Mydir)

Probabilistic Inference • Specifying probabilistic parameters • Noisy-and • Specify the CPT for combining the evidence from conjuncts in the body of the clause • Noisy-or • Specify the CPT for combining the evidence from disjunctive contributions from different ground clauses with the same head • Models “explaining away” • Noisy-and and noisy-or models reduce the number of parameters learned from data

Probabilistic Inference copy-file(Test1,txt,Mydir) move-file(Test1,txt,Mydir) remove-file(Test1) rm(Test1.txt) cp(Test1.txt,Mydir) 4 parameters 4 parameters

Probabilistic Inference copy-file(Test1,txt,Mydir) move-file(Test1,txt,Mydir) remove-file(Test1) θ4 θ2 θ3 θ1 rm(Test1.txt) cp(Test1.txt,Mydir) 2 parameters 2 parameters Noisy models require parameters linear in the number of parents

Probabilistic Inference • Most Probable Explanation (MPE) • For multiple plans, compute MPE, the most likely combination of truth values to all unknown literals given this evidence • Marginal Probability • For single top level plan prediction, compute marginal probabilityfor all instances of plan predicate and pick the instance with maximum probability • When exact inference is intractable, SampleSearch[Gogate and Dechter, 2007], an approximate inference algorithm for graphical models with deterministic constraints is used

Probabilistic Inference copy-file(Test1,txt,Mydir) move-file(Test1,txt,Mydir) remove-file(Test1) Noisy-or Noisy-or rm(Test1.txt) cp(Test1.txt,Mydir)

Probabilistic Inference copy-file(Test1,txt,Mydir) move-file(Test1,txt,Mydir) remove-file(Test1) Noisy-or Noisy-or rm(Test1.txt) cp(Test1.txt,Mydir) Evidence

Probabilistic Inference Query variables copy-file(Test1,txt,Mydir) move-file(Test1,txt,Mydir) remove-file(Test1) Noisy-or Noisy-or rm(Test1.txt) cp(Test1.txt,Mydir) Evidence

Probabilistic Inference MPE Query variables FALSE FALSE TRUE copy-file(Test1,txt,Mydir) move-file(Test1,txt,Mydir) remove-file(Test1) Noisy-or Noisy-or rm(Test1.txt) cp(Test1.txt,Mydir) Evidence

Extending Bayesian Logic Programs for Plan Recognition and Machine Reading

Extending Bayesian Logic Programs for Plan Recognition and Machine Reading

Presentation Transcript

Pattern Recognition and Machine Learning

Bayesian Machine Learning for Signal Processing

Pattern Recognition and Machine Learning

Bayesian Logic Programs

Bayesian Logic Programs for Plan Recognition and Machine Reading

Machine Reading

Extending Reading Skills

Plan Recognition with Multi-Entity Bayesian Networks

Adbuctive Markov Logic for Plan Recognition

Determinacy Inference for Logic Programs

Bayesian Machine learning and its application

Reading Programs

CRFs for ASR: Extending to Word Recognition

Learning to “ Read Between the Lines ” using Bayesian Logic Programs

Plan Recognition

Abductive Plan Recognition By Extending Bayesian Logic Programs

CRFs for ASR: Extending to Word Recognition

Hoare Logic for Concurrent Programs

Using Logic Models to Plan, Monitor, and Evaluate Programs

Pattern Recognition and Machine Learning

Logic and Programs

CRFs for ASR: Extending to Word Recognition