1.44k likes | 1.54k Views
Extending Bayesian Logic Programs for Plan Recognition and Machine Reading. Sindhu V. Raghavan Advisor: Raymond Mooney PhD Proposal May 12 th , 2011. Plan Recognition in Intelligent User Interfaces. $ cd my-dir $ cp test1.txt test-dir $ rm test1.txt. What task is the user performing?
E N D
Extending Bayesian Logic Programsfor Plan Recognition and Machine Reading Sindhu V. Raghavan Advisor: Raymond Mooney PhD Proposal May 12th, 2011
Plan Recognition in Intelligent User Interfaces $ cd my-dir $ cp test1.txt test-dir $ rm test1.txt What task is the user performing? move-file Which files and directories are involved? test1.tx and test-dir Can the task be performed more efficiently? Data is relational in nature - several files and directories and several relations between them
Characteristics of Real World Data • Relational or structured data • Several entities in the domain • Several relations between entities • Do not always follow the i.i.d assumption • Presence of noise or uncertainty • Uncertainty in the types of entities • Uncertainty in the relations Traditional approaches like first-order logic or probabilistic models can handle either structured dataor uncertainty,but not both.
Statistical Relational Learning (SRL) • Integratesfirst-order logic and probabilistic graphical models [Getoor and Taskar, 2007] • Combines strengths of both first-order logic and probabilistic models • SRL formalisms • Stochastic Logic Programs (SLPs) [Muggleton, 1996] • Probabilistic Relational Models (PRMs)[Friedman et al., 1999] • Bayesian Logic Programs (BLPs) [Kersting and De Raedt, 2001] • Markov Logic Networks (MLNs) [Richardson and Dominogs, 2006]
Statistical Relational Learning (SRL) • Integratesfirst-order logic and probabilistic graphical models [Getoor and Taskar, 2007] • Combines strengths of both first-order logic and probabilistic models • SRL formalisms • Stochastic Logic Programs (SLPs) [Muggleton, 1996] • Probabilistic Relational Models (PRMs)[Friedman et al., 1999] • Bayesian Logic Programs (BLPs)[Kersting and De Raedt, 2001] • Markov Logic Networks (MLNs) [Richardson and Dominogs, 2006]
Bayesian Logic Programs (BLPs) • Integrate first-order logic and Bayesian networks • Why BLPs? • Efficient grounding mechanism that includes only those variables that are relevant to the query • Easy to extend by incorporating any type oflogical inferenceto construct networks • Well suited for capturing causal relations in data
Objectives BLPs for Plan Recognition BLPs for Machine Reading
Objectives Plan recognition involves predicting the top-level plan of an agent based on its actions BLPs for Machine Reading
Objectives BLPs for Plan Recognition Machine Reading involves automatic extraction of knowledge from natural language text
Outline • Motivation • Background • First-order logic • Logical Abduction • Bayesian Logic Programs (BLPs) • Completed Work • Part 1 – Extending BLPs for Plan Recognition • Part 2 – Extending BLPs for Machine Reading • Proposed Work • Conclusions
First-order Logic • Terms • Constants – individual entities like anna, bob • Variables – placeholders for objects like X, Y • Predicates • Relations over entities like worksFor, capitalOf • Literal – predicate or its negation applied to terms • Atom – Positive literal like worksFor(X,Y) • Ground literal – literal with no variables like worksFor(anna,bob) • Clause – disjunction of literals • Horn clause has at most one positive literal • Definite clause has exactly one positive literal
First-order Logic • Quantifiers • Universal quantification - true for all objects in the domain • Existential quantification - true for some objects in the domain • Logical Inference • Forward Chaining– For every implication pq, if p is true, then q is concluded to be true • Backward Chaining – For a query literal q, if an implication pq is present and p is true, then q is concluded to be true, otherwise backward chaining tries to prove p
Logical Abduction • Abduction • Process of finding the best explanation for a set of observations • Given • Background knowledge, B, in the form of a set of (Horn) clauses in first-order logic • Observations, O, in the form of atomic facts in first-order logic • Find • A hypothesis, H, a set of assumptions(atomic facts) that logically entail the observations given the theory: B H O • Best explanation is the one with the fewest assumptions
Bayesian Logic Programs (BLPs) [Kersting and De Raedt, 2001] • Set of Bayesian clauses a|a1,a2,....,an • Definite clauses that are universally quantified • Head of the clause - a • Body of the clause - a1, a2, …, an • Range-restricted, i.e variables{head} variables{body} • Associated conditional probability table (CPT) • P(head|body) • Bayesian predicates a, a1, a2, …, an have finite domains • Combining rule like noisy-or for mapping multiple CPTs into a single CPT.
Logical Inference in BLPs SLD Resolution Proof alarm(james) BLP lives(james,yorkshire). lives(stefan,freiburg). neighborhood(james). tornado(yorkshire). burglary(X) | neighborhood(X). alarm(X) | burglary(X). alarm(X) | lives(X,Y), tornado(Y). Query alarm(james) Example from Ngo and Haddawy, 1997
Logical Inference in BLPs SLD Resolution Proof alarm(james) BLP lives(james,yorkshire). lives(stefan,freiburg). neighborhood(james). tornado(yorkshire). burglary(X) | neighborhood(X). alarm(X) | burglary(X). alarm(X) | lives(X,Y), tornado(Y). burglary(james) Query alarm(james) Example from Ngo and Haddawy, 1997
Logical Inference in BLPs SLD Resolution Proof alarm(james) BLP lives(james,yorkshire). lives(stefan,freiburg). neighborhood(james). tornado(yorkshire). burglary(X) | neighborhood(X). alarm(X) | burglary(X). alarm(X) | lives(X,Y), tornado(Y). burglary(james) Query alarm(james) neighborhood(james) Example from Ngo and Haddawy, 1997
Logical Inference in BLPs SLD Resolution Proof alarm(james) BLP lives(james,yorkshire). lives(stefan,freiburg). neighborhood(james). tornado(yorkshire). burglary(X) | neighborhood(X). alarm(X) | burglary(X). alarm(X) | lives(X,Y), tornado(Y). burglary(james) Query alarm(james) neighborhood(james) Example from Ngo and Haddawy, 1997
Logical Inference in BLPs SLD Resolution Proof alarm(james) BLP lives(james,yorkshire). lives(stefan,freiburg). neighborhood(james). tornado(yorkshire). burglary(X) | neighborhood(X). alarm(X) | burglary(X). alarm(X) | lives(X,Y), tornado(Y). burglary(james) Query alarm(james) neighborhood(james) Example from Ngo and Haddawy, 1997
Logical Inference in BLPs SLD Resolution Proof alarm(james) BLP lives(james,yorkshire). lives(stefan,freiburg). neighborhood(james). tornado(yorkshire). burglary(X) | neighborhood(X). alarm(X) | burglary(X). alarm(X) | lives(X,Y), tornado(Y). burglary(james) Query alarm(james) neighborhood(james) Example from Ngo and Haddawy, 1997
Logical Inference in BLPs SLD Resolution Proof alarm(james) BLP lives(james,yorkshire). lives(stefan,freiburg). neighborhood(james). tornado(yorkshire). burglary(X) | neighborhood(X). alarm(X) | burglary(X). alarm(X) | lives(X,Y), tornado(Y). burglary(james) lives(james,Y) tornado(Y) Query alarm(james) neighborhood(james) Example from Ngo and Haddawy, 1997
Logical Inference in BLPs SLD Resolution Proof alarm(james) BLP lives(james,yorkshire). lives(stefan,freiburg). neighborhood(james). tornado(yorkshire). burglary(X) | neighborhood(X). alarm(X) | burglary(X). alarm(X) | lives(X,Y), tornado(Y). burglary(james) lives(james,Y) tornado(Y) Query alarm(james) lives(james,yorkshire) neighborhood(james) Example from Ngo and Haddawy, 1997
Logical Inference in BLPs SLD Resolution Proof alarm(james) BLP lives(james,yorkshire). lives(stefan,freiburg). neighborhood(james). tornado(yorkshire). burglary(X) | neighborhood(X). alarm(X) | burglary(X). alarm(X) | lives(X,Y), tornado(Y). burglary(james) lives(james,Y) tornado(Y) Query alarm(james) lives(james,yorkshire) neighborhood(james) tornado(yorkshire) Example from Ngo and Haddawy, 1997
Logical Inference in BLPs SLD Resolution Proof alarm(james) BLP lives(james,yorkshire). lives(stefan,freiburg). neighborhood(james). tornado(yorkshire). burglary(X) | neighborhood(X). alarm(X) | burglary(X). alarm(X) | lives(X,Y), tornado(Y). burglary(james) lives(james,Y) tornado(Y) Query alarm(james) lives(james,yorkshire) neighborhood(james) tornado(yorkshire) Example from Ngo and Haddawy, 1997
Bayesian Network Construction neighborhood(james) burglary(james) lives(james,yorkshire) tornado(yorkshire) alarm(james) Each ground atom becomes a node (random variable) in the Bayesian network Edges are added from ground atoms in the body of a clause to the ground atom in the head Specify probabilistic parameters using the CPTs associated with Bayesian clauses
Bayesian Network Construction neighborhood(james) burglary(james) lives(james,yorkshire) tornado(yorkshire) alarm(james) Each ground atom becomes a node (random variable) in the Bayesian network Edges are added from ground atoms in the body of a clause to the ground atom in the head Specify probabilistic parameters using the CPTs associated with Bayesian clauses
Bayesian Network Construction neighborhood(james) burglary(james) tornado(yorkshire) lives(james,yorkshire) alarm(james) Each ground atom becomes a node (random variable) in the Bayesian network Edges are added from ground atoms in the body of a clause to the ground atom in the head Specify probabilistic parameters using the CPTs associated with Bayesian clauses
Bayesian Network Construction ✖ neighborhood(james) lives(stefan,freiburg) burglary(james) lives(james,yorkshire) tornado(yorkshire) alarm(james) Each ground atom becomes a node (random variable) in the Bayesian network Edges are added from ground atoms in the body of a clause to the ground atom in the head Specify probabilistic parameters using the CPTs associated with Bayesian clauses Use combining rule to combine multiple CPTs into a single CPT
Probabilistic Inference and Learning • Probabilistic inference • Marginal probability given evidence • Most Probable Explanation (MPE) given evidence • Learning [Kersting and De Raedt, 2008] • Parameters • Expectation Maximization • Gradient-ascent based learning • Structure • Hill climbing search through the space of possible structures
Part 1Extending BLPs for Plan Recognition[Raghavan and Mooney, 2010]
Plan Recognition • Predict an agent’s top-level plans based on the observed actions • Abductive reasoning involving inference of cause from effect • Applications • Story Understanding • Recognize character’s motives or plans based on its actions to answer questions about the story • Strategic planning • Predict other agents’ plans so as to work co-operatively • Intelligent User Interfaces • Predict the task that the user is performing so as to provide valuable tips to perform the task more efficiently
Related Work • First-order logic based approaches [Kautz and Allen, 1986; Ng and Mooney, 1992] • Knowledge base of plans and actions • Default reasoning or logical abduction to predict the best plan based on the observed actions • Unable to handle uncertainty in data or estimate likelihood of alternative plans • Probabilistic graphical models [Charniak and Goldman, 1989; Huber et al., 1994; Pynadath and Wellman, 2000; Bui, 2003; Blaylock and Allen, 2005] • Encode the domain knowledge using Bayesian networks, abstract hidden Markov models, or statistical n-gram models • Unable to handle relational/structured data
Related Work • Markov Logic based approaches [Kate and Mooney, 2009; Singla and Mooney, 2011] • Logical inference in MLNs is deductivein nature • MLN-PC [Kate and Mooney, 2009] • Add reverse implications to handle abductive reasoning • Does not scale to large domains • MLN-HC[Singla and Mooney, 2011] • Improves over MLN-PC by adding hidden causes • Still does not scale to large domains • MLN-HCAM [Singla and Mooney, 2011]uses logical abduction to constrain grounding • Incorporates ideas from the BLP approach for plan recognition [Raghavan and Mooney, 2010] • MLN-HCAM performs better than MLN-PC and MLN-HC
BLPs for Plan Recognition • Why BLPs ? • Directed model captures causal relationships well • Efficient grounding process results in smaller networks, unlike in MLNs • SLD resolution is deductive inference, used for predicting observed actions from top-level plans • Plan recognition is abductive in nature and involves predicting the top-level plan from observed actions BLPs cannot be used as is for plan recognition
Extending BLPs for Plan Recognition BLPs Logical Abduction BALPs BALPs – Bayesian Abductive Logic Programs
Logical Abduction in BALPs • Given • A set of observation literals O = {O1, O2,….On} and a knowledge base KB • Compute a set abductive proofs of O using Stickel’s abduction algorithm[Stickel 1988] • Backchain on each Oi until it is proved or assumed • A literal is said to be proved if it unifies with a fact or the head of some rule in KB, otherwise it is said to be assumed • Construct a Bayesian network using the resulting set of proofs as in BLPs.
Example – Intelligent User Interfaces • Top-level plans predicates • copy-file, move-file, remove-file • Actions predicates • cp, rm • Knowledge Base (KB) • cp(filename,destdir) | copy-file(filename,destdir) • cp(filename,destdir) | move-file(filename,destdir) • rm(filename) | move-file(filename,destdir) • rm(filename) | remove-file(filename) • Observed actions • cp(Test1.txt, Mydir) • rm(Test1.txt)
Abductive Inference Assumed literal copy-file(Test1.txt,Mydir) cp(Test1.txt,Mydir) cp(filename,destdir) | copy-file(filename,destdir)
Abductive Inference Assumed literal copy-file(Test1.txt,Mydir) move-file(Test1.txt,Mydir) cp(Test1.txt,Mydir) cp(filename,destdir) | move-file(filename,destdir)
Abductive Inference Match existing assumption copy-file(Test1.txt,Mydir) move-file(Test1.txt,Mydir) rm(Test1.txt) cp(Test1.txt,Mydir) • rm(filename) | move-file(filename,destdir)
Abductive Inference Assumed literal copy-file(Test1.txt,Mydir) move-file(Test1.txt,Mydir) remove-file(Test1) rm(Test1.txt) cp(Test1.txt,Mydir) • rm(filename) | remove-file(filename)
Structure of Bayesian network copy-file(Test1.txt,Mydir) move-file(Test1.txt,Mydir) remove-file(Test1) rm(Test1.txt) cp(Test1.txt,Mydir)
Probabilistic Inference • Specifying probabilistic parameters • Noisy-and • Specify the CPT for combining the evidence from conjuncts in the body of the clause • Noisy-or • Specify the CPT for combining the evidence from disjunctive contributions from different ground clauses with the same head • Models “explaining away” • Noisy-and and noisy-or models reduce the number of parameters learned from data
Probabilistic Inference copy-file(Test1,txt,Mydir) move-file(Test1,txt,Mydir) remove-file(Test1) rm(Test1.txt) cp(Test1.txt,Mydir) 4 parameters 4 parameters
Probabilistic Inference copy-file(Test1,txt,Mydir) move-file(Test1,txt,Mydir) remove-file(Test1) θ4 θ2 θ3 θ1 rm(Test1.txt) cp(Test1.txt,Mydir) 2 parameters 2 parameters Noisy models require parameters linear in the number of parents
Probabilistic Inference • Most Probable Explanation (MPE) • For multiple plans, compute MPE, the most likely combination of truth values to all unknown literals given this evidence • Marginal Probability • For single top level plan prediction, compute marginal probabilityfor all instances of plan predicate and pick the instance with maximum probability • When exact inference is intractable, SampleSearch[Gogate and Dechter, 2007], an approximate inference algorithm for graphical models with deterministic constraints is used
Probabilistic Inference copy-file(Test1,txt,Mydir) move-file(Test1,txt,Mydir) remove-file(Test1) Noisy-or Noisy-or rm(Test1.txt) cp(Test1.txt,Mydir)
Probabilistic Inference copy-file(Test1,txt,Mydir) move-file(Test1,txt,Mydir) remove-file(Test1) Noisy-or Noisy-or rm(Test1.txt) cp(Test1.txt,Mydir) Evidence
Probabilistic Inference Query variables copy-file(Test1,txt,Mydir) move-file(Test1,txt,Mydir) remove-file(Test1) Noisy-or Noisy-or rm(Test1.txt) cp(Test1.txt,Mydir) Evidence
Probabilistic Inference MPE Query variables FALSE FALSE TRUE copy-file(Test1,txt,Mydir) move-file(Test1,txt,Mydir) remove-file(Test1) Noisy-or Noisy-or rm(Test1.txt) cp(Test1.txt,Mydir) Evidence