Bayesian Logic Programs for Plan Recognition and Machine Reading

Bayesian Logic ProgramsforPlan Recognition and Machine Reading Sindhu Raghavan Advisor: Raymond Mooney PhD Oral Defense Nov 29th, 2012

Outline • Motivation • Background • Bayesian Logic Programs (BLPs) • Plan Recognition • Machine Reading • BLPs for inferring implicit facts • Online Rule Learning • Scoring Rules using WordNet • Future Work • Conclusions

Machine Reading Machine reading involves the automatic extraction of knowledge from natural language text Example “Barack Obama is the current President of the USA……. Obama was born on August 4, 1961, in Hawaii, USA…….” Extracted facts nationState(usa) person(barackobama) isLedBy(usa,barackobama) hasBirthPlace(barackobama,usa) employs(usa, barackobama) Data is relational in nature - several entities and several relations between them

Characteristics of Real World Data • Relational or structured data • Several entities in the domain • Several relations between entities • Not always independent and identically distributed (i.i.d) • Presence of noise or uncertainty • Uncertainty in the types of entities • Uncertainty in the relations Traditional approaches like first-order logic or probabilistic models can handle either structured dataor uncertainty,but not both.

Statistical Relational Learning (SRL) • Integratesfirst-order logic and probabilistic graphical models [Getoor and Taskar, 2007] • Overcome limitations of traditional approaches • SRL formalisms • Stochastic Logic Programs (SLPs) [Muggleton, 1996] • Probabilistic Relational Models (PRMs)[Friedman et al., 1999] • Bayesian Logic Programs (BLPs) [Kersting and De Raedt, 2001] • Markov Logic Networks (MLNs)[Richardson and Domingos, 2006]

Statistical Relational Learning (SRL) • Integratesfirst-order logic and probabilistic graphical models [Getoor and Taskar, 2007] • Overcome limitations of traditional approaches • SRL formalisms • Stochastic Logic Programs (SLPs) [Muggleton, 1996] • Probabilistic Relational Models (PRMs)[Friedman et al., 1999] • Bayesian Logic Programs (BLPs)[Kersting and De Raedt, 2001] • Markov Logic Networks (MLNs) [Richardson and Domingos, 2006]

Bayesian Logic Programs (BLPs)[Kersting and De Raedt, 2001] • Integrate first-order logic and Bayesian networks • Why BLPs? • Efficient grounding mechanism that includes only those variables that are relevant to the query • Easy to extend by incorporating any type oflogical inferenceto construct networks • Well suited for capturing causal relations in data

Objectives Plan Recognition Machine Reading

Objectives Plan recognition involves predicting the top-level plan of an agent based on its observed actions Machine Reading

Objectives Plan Recognition Machine Reading involves automatic extraction of knowledge from natural language text

Common characteristics • Inference and learning from partially observed or incomplete data • Plan recognition • Top-level plan is not observed • Some of the executed actions can be unobserved • Machine Reading • Information that is implicit is rarely observed in data • Common sense knowledge is not always explicitly stated

Thesis Contributions • Plan Recognition • Bayesian Abductive Logic Programs (BALPs) [ECML 2011] • Machine Reading • BLPs for learning to infer implicit facts from natural language text [ACL 2012] • Online rule learner for learning common sense knowledge from natural language extractions [In Submission] • Approach to scoring first-order rules (common sense knowledge) using WordNet[In Submission]

Bayesian Logic Programs (BLPs)[Kersting and De Raedt, 2001] • Set of Bayesian clauses a|a1,a2,....,an • Definite clauses that are universally quantified • Range-restricted, i.e variables{head} variables{body} • Associated conditional probability table (CPT) • P(head|body) • Bayesian predicates a, a1, a2, …, an have finite domains • Combining rule like noisy-or for mapping multiple CPTs into a single CPT • Given a set of Bayesian clauses and a query, SLD resolution is used to construct ground Bayesian networks for probabilistic inference

Probabilistic Inference and Learning • Probabilistic inference • Marginal probability • Exact Inference • Sample Search [Gogate and Dechter, 2007] • Learning [Kersting and De Raedt, 2008] • Parameters • Expectation Maximization • Gradient-ascent based learning

Plan Recognition • Predict an agent’s top-level plan based on its observed actions • Abductive reasoninginvolving inference of cause from effect • Since SLD resolution used in BLPs is deductivein nature, BLPs cannot be used as is plan recognition

Extending BLPs for Plan Recognition BLPs Logical Abduction BALPs BALPs – Bayesian Abductive Logic Programs

Extending BLPs for Plan Recognition BLPs Stickel’s Abduction Algorithm BALPs BALPs – Bayesian Abductive Logic Programs

Experimental Evaluation • Data • Monroe [Blaylock and Allen, 2005] • Linux [Blaylock and Allen, 2005] • Story Understanding [Ng and Mooney, 1992] • Systems compared • BALPs • MLN-HCAM[Singla and Mooney, 2011] • Blaylock and Allen’s system[Blaylock and Allen, 2005] • ACCEL-Simplicity [Ng and Mooney, 1992] • ACCEL-Coherence[Ng and Mooney, 1992]

Summary of Results • Monroe and Linux • BALPs outperform both MLN-HCAM and the system by Blaylock and Allen • Story Understanding • BALPS outperform both MLN-HCAM and ACCEL-Simplicity • ACCEL-Coherence outperforms BALPs and other systems • Specifically developed for text interpretation • Automatic learning of model parameters using EM

Machine Reading • Natural language text is typically “incomplete” • Some information is always implicit • Common sense information is not always explicitly stated • Grice’s maxim of quantity [1975] • Information extraction (IE) systems extract informationthat is explicitly stated[Cowie and Lenhert, 1996; Sarawagi, 2008] • Cannot extract information that is implicit

Example Natural language text “Barack Obama is the President of the United States of America.” Query “Barack Obama is a citizen of what country?” IE systems cannotanswer this query since citizenship information is not explicitly stated.

Objective • Infer implicit facts from explicitly stated information • Extract explicitly stated facts using an off-the-shelf IE system • Learn common sense knowledge in the form of first-order rules to deduceadditional facts • Use BLPs for inference of additional facts

Related Work • Logical deduction based approaches • Learning propositional rules[Nahm and Mooney, 2000] • Purely logical deduction is brittle since itcannot assign probabilities to inferences • Learning probabilistic first-order rules using FOIL and FARMER[Carlson et al., 2010; Doppa et al., 2010] • Probabilities are not computed using well-founded probabilistic graphical models • Use MLN based approaches for inferring additional facts [Schoenmackers et al., 2010; Sorower et al., 2011] • “Brute force” inference could result in intractably large networks for large domains • Scaling of MLNs to large domains [Schoenmackers et al., 2010; Niu et al., 2012]

Objectives • BLPs for learning to infer implicit facts from natural language text • Online rule learner for learning common sense knowledge from natural language extractions • Approach to scoring first-order common sense knowledge using WordNet

. . . System Architecture . . . Barack Obama is the current President of USA……. Obama was born on August 4, 1961, in Hawaii, USA. . . . . . . nationState(USA) Person(BarackObama) isLedBy(USA,BarackObama) hasBirthPlace(BarackObama,USA) hasCitizenship(BarackObama,USA) Training Documents Extracted Facts Information Extractor (IBM SIRE) Rule learner First-Order Logical Rules BLP Weight Learner nationState(B) ∧ isLedBy(B,A)  hasCitizenship(A,B) nationState(B) ∧ employs(B,A)  hasCitizenship(A,B) Test Document Extractions Bayesian Logic Program (BLP) BLP Inference Engine hasCitizenship(A,B) | nationState(B) , isLedBy(B,A) .9 hasCitizenship(A,B) | nationState(B) , employs(B,A) .6 nationState(malaysia) Person(mahathir-mohamad) isLedBy(malaysia,mahathir-mohamad) employs(malaysia,mahatir-mohamad) Inferences with probabilities hasCitizenship(mahathir-mohamad, malaysia) 0.75

System Architecture Training Documents Extracted Facts Information Extractor (IBM SIRE) Inductive Logic Programming (LIME) First-Order Logical Rules BLP Weight Learner Test Document Extractions Bayesian Logic Program (BLP) BLP Inference Engine Inferences with probabilities

Inductive Logic Programming (ILP)for learning first-order rules Positive instances hasCitizenship (BarackObama, USA) hasCitizenship (GeorgeBush, USA) hasCitizenship (IndiraGandhi,India) . . Target relation hasCitizenship(X,Y) ILP Rule Learner Rules nationState(Y) ∧ person(X)∧ isLedBy(Y,X)  hasCitizenship (X,Y) . . Negative instances hasCitizenship (BarackObama, India) hasCitizenship (GeorgeBush, India) hasCitizenship (IndiraGandhi,USA) . . Generated using closed- world assumption KB hasBirthPlace(BarackObama,USA) person(BarackObama) nationState(USA) nationState(India) . .

Inference using BLPs Test document “Barack Obama is the current President of the USA……. Obama was born on August 4, 1961, in Hawaii, USA…….” Extracted facts nationState(usa) person(barackobama) isLedBy(usa,barackobama) hasBirthPlace(barackobama,usa) employs(usa,barackobama) Learned rules nationState(B) ∧ person(A) ∧isLedBy(B,A)  hasCitizenship(A,B) nationState(B) ∧ person(A) ∧ employs(B,A)  hasCitizenship(A,B)

Logical Inference - Proof 1 nationState(B) ∧ person(A) ∧isLedBy(B,A)  hasCitizenship(A,B) isLedBy(usa,barackobama) nationState(usa) person(barackobama) hasCitizenship(barackobama,usa)

Logical Inference - Proof 2 nationState(B) ∧ person(A) ∧ employs(B,A)  hasCitizenship(A,B) employs(usa,barackobama) nationState(usa) person(barackobama) hasCitizenship(barackobama,usa)

Bayesian Network Construction employs (usa, barack obama) person (barack obama) isLedBy (usa, barack obama) nationState (usa) hasCitizenship (barackobama, usa)

Bayesian Network Construction employs (usa, barack obama) person (barack obama) isLedBy (usa, barack obama) nationState (usa) Logical And Logical And dummy2 dummy1 Noisy Or hasCitizenship (barackobama, usa) Marginal Probability ??

Experimental Evaluation • Data • DARPA’s intelligence community (IC) data set from the Machine Reading Project (MRP) • Consists of news articles on politics, terrorism, and other international events • 10,000 documents in total • Perform 10-fold cross validation

Experimental Evaluation • Learning first-order rules using LIME [McCreath and Sharma, 1998] • Learn rules for 13 target relations • Learn rules using both positive and negative instances and using only positive instances • Include all unique rules learned from different models • Learning BLP parameters • Learn noisy-or parameters using Expectation Maximization (EM) • Set priors to maximum likelihood estimates

Experimental Evaluation • Performance evaluation • Lack of ground truth for evaluation • Manually evaluated inferred facts from 40 documents, randomly selected from each test set • Compute precision • Fraction of inferences that are correct • Compute two precision scores • Unadjusted (UA) – does not account for extractor’s mistakes • Adjusted (AD) – account for extractor’s mistakes • Rank inferences using marginal probabilities and evaluate top-n

Experimental Evaluation • Systems compared • BLP Learned Weights • Noisy-or parameters learned using EM • BLP Manual Weights • Noisy-or parameters set to 0.9 • Logical Deduction • MLN Learned Weights • Learn weights using generative online weight learner • MLN Manual Weights • Assign a weight of 10 to all rules and MLE priors to all predicates

Unadjusted Precision

Inferior performance of EM • Insufficient training data • Lack of ground truth information for relations that can be inferred • Implicit relations seen less frequently in training data • EM learns lower weights for rules corresponding to implicit relations

Performance of MLNs • Inferior performance of MLNs • Insufficient training data for learning • Use of closed world assumption for inference and learning • Lack of strictly typed ontology • GeopoliticalEntity could be an Agent as well as Location • Improvements to MLNs • Integrity constraints to avoid inference of spurious facts like employs(a,a) • Incorporate techniques proposed by Sorower et al. [2011]

Limitations of LIME • Assumes data is accurate • Negative instances artificially generated are usually noisy and inaccurate • Extraction errors result in noisy data • Does not scale to large corpora Develop an approach that can learn first-order rules from noisy and incomplete IE extractions

Online Rule Learning • Incorporates the incomplete nature of natural language text • Body consists of relations that are explicitly stated • Head is a relation that can be inferred • Relations that are implicit occur less frequently than those that are explicitly stated • Use frequency of occurrence as a heuristic to distinguish different types of relations • Process examples in an online manner to scale to large corpora

Bayesian Logic Programs for Plan Recognition and Machine Reading

Bayesian Logic Programs for Plan Recognition and Machine Reading

Presentation Transcript

Pattern Recognition and Machine Learning

Bayesian Machine Learning for Signal Processing

Pattern Recognition and Machine Learning

Bayesian Logic Programs

PATTERN RECOGNITION AND MACHINE LEARNING

Pattern Recognition and Machine Learning

Machine Reading

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning

Plan Recognition with Multi-Entity Bayesian Networks

Adbuctive Markov Logic for Plan Recognition

Determinacy Inference for Logic Programs

Bayesian Machine learning and its application

Extending Bayesian Logic Programs for Plan Recognition and Machine Reading

Reading Programs

Learning to “ Read Between the Lines ” using Bayesian Logic Programs

Plan Recognition

Abductive Plan Recognition By Extending Bayesian Logic Programs

Hoare Logic for Concurrent Programs

Using Logic Models to Plan, Monitor, and Evaluate Programs

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning