127 Views

Download Presentation
## Implicit learning of common sense for reasoning

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Implicit learning of common sense for reasoning**Brendan Juba Harvard University**A convenient example**“Thomson visited Cooper’s grave in 1765. At that date, he had been traveling[resp.: dead] for five years.“Who had been traveling [resp.: dead]?”(The Winograd Schema Challenge, [Levesque, Davis, and Morgenstern, 2012]) Our approach: learn sufficient knowledge to answer such queries from examples.**The task**• The examples may be incomplete (a * in the table) • GivenIn_grave(Cooper), we wish to infer¬Traveling(Cooper) • Follows from In_grave(x)⇒¬Alive(x), Traveling(x)⇒Alive(x) • These two rules can be learned from this data • Challenge: how can we tell which rules to learn?**This work**Given: examples, KB, and a query… • Proposes a criterion for learnability of rules in reasoning: “witnessed evaluation” • Presents a simple algorithm for efficiently considering all such rules for reasoning in any “natural” (tractable) fragment • “Natural” defined previously by Beame, Kautz, Sabharwal (JAIR 2004) • Tolerant to counterexamples as appropriate for application to “common sense” reasoning**This work**• Only concerns learned “common sense” • Cf. Spelke’s “core knowledge:” naïve theories, etc. • But: use of logical representations provide potential “hook” into traditional KR • Focuses on confirming or refuting query formulas on a domain(distribution) • As opposed to: predicting missing attributes in a given example (cf. past work on PAC-Semantics)**Why not use…**Bayes nets/Markov Logic/etc.? • Learning is the Achilles heel of these approaches:Even if the distributions are described by a simple network, how do we find the dependencies?**Outline**• PAC-Semantics: model for learned knowledge • Suitable for capturing learned common sense • Witnessed evaluation: a learnability criterion under partial information • “Natural” fragments of proof systems • The algorithm and its guarantee**PAC Semantics (for propositional logic) Valiant, (AIJ 2000)**• Recall: propositional logic consists of formulas built from variables x1,…,xn, and connectives, e.g., ∧(AND), ∨(OR), ¬(NOT) • Defined with respect to a background probability distributionD over {0,1}n (Boolean assignments to x1,…,xn) • Definition. A formula φ(x1,…,xn) is (1-ε)-valid under D if PrD[φ(x1,…,xn)=1] ≥ 1-ε. A RULE OF THUMB…**Examples**In_grave(x)⇒¬Alive(x) Buried Alive!! Appears to be ≈86%-valid… Grave-digger**Examples**Traveling(x)⇒Alive(x) Note: Agreeing with all observed examples does not imply 1-validity. Rare counterexamples may exist. We only get (1-ε)-valid with probability 1-δ**The theorem, informally**Theorem. For every natural tractable proof system, there is an algorithm that efficiently simulates access during proof search to all rules that can be verified (1-ε)-valid on examples. • Can’t afford to explicitly consider all rules! • Won’t even be able to identify rules simulated • Thus: rules are “learned implicitly”**Outline**• PAC-Semantics: model for learned knowledge • Witnessed evaluation: a learnability criterion under partial information • “Natural” fragments of proof systems • The algorithm and its guarantee**Masking processesMichael, (AIJ 2010)**• A masking functionm : {0,1}n → {0,1,*}ntakes an example (x1,…,xn) to a partial example by replacing some values with * • A masking processM is a masking functionvalued random variable • NOTE: the choice of attributes to hide may depend on the example!**Restricting formulas**Given a formula φ and masked example ρ, the restriction of φ under ρ, φ|ρ, is obtained by “plugging in” the values of ρifor xiwhenever ρi≠ * and recursively simplifying(using game-tree evaluation). I.e., φ|ρ is a formula in the unknown values. ∧ =1 ρ: x=0, y=0 ¬z ∨ ∨ =0 ¬x y ¬z z =1**Witnessed formulas**We will learn rules that can be observed to hold under the given partial information: • Definition.ψ is (1-ε)-witnessed under a distribution over partial examples M(D) ifPrρ∈M(D)[ψ|ρ=1] ≥ 1-ε • We will aim to succeed whenever there exists a (1-ε)-witnessed formula that completes a simple proof of the query formula… Remark: equal to “ψ is a tautology given ρ” in standard cases where this is tractable, e.g., CNFs, intersections of halfspaces; remains tractable in cases where this is not, e.g., 3-DNFs**Outline**• PAC-Semantics: model for learned knowledge • Witnessed evaluation: a learnability criterion under partial information • “Natural” fragments of proof systems • The algorithm and its guarantee**Example: Resolution (“RES”)**• A proof system for refuting CNFs (AND of ORs) • Equiv., for proving DNFs (ORs of ANDs) • Operates on clauses—given a set of clauses {C1,…,Ck}, may derive • (“weakening”) Ci∨l from any Ci(where l is any literal—a variable or its negation) • (“cut”) C’i∨C’jfrom Ci=C’i∨xand Cj=C’j∨¬x • Refute a CNF by deriving empty clause from it**Tractable fragments of RES**• Bounded-width • Treelike, bounded clause space ∅ xi ¬xi Space-2 ≡ “unit propagation,” simulates chaining … ¬xi∨xj ¬xi∨¬xj**Tractable fragments of RES**• Bounded-width • Treelike, bounded clause space • Applying a restriction to every step of proofs of these forms yields proofs of the same form(from a refutation of φ, we obtain a refutation of φ|ρ of the same syntactic form) • Def’n (BKS’04): such fragments are “natural”**Other “natural” fragments…**• Bounded width k-DNF resolution • L1-bounded, sparse cutting planes • Degree-bounded polynomial calculus • (more?) Requires that restrictions preserve the special syntactic form**Outline**• PAC-Semantics: model for learned knowledge • Witnessed evaluation: a learnability criterion under partial information • “Natural” fragments of proof systems • The algorithm and its guarantee**The basic algorithm**• Given query DNF φ and masked ex’s {ρ1,…,ρk} • For each ρi, search for a refutation of ¬φ|ρi • If the fraction of successful refutations is greater than (1-ε), accept φ, and otherwise reject. • CAN INCORPORATE KB CNF Φ: REFUTE [Φ∧¬φ]|ρi**Example space-2 treelike RES refutation**∅ Refute ¬Traveling Traveling ¬Alive ¬In_grave∨¬Alive In_grave ¬Traveling∨Alive Given Supporting “common sense” premises**Example [Traveling∧In_grave]|ρ1**∅ Trivial refutation Refute ¬Traveling Traveling ¬Alive =∅ =T =T ¬In_grave∨¬Alive In_grave ¬Traveling∨Alive Given Example ρ1: In_grave= 0, Alive = 1**Example [Traveling∧In_grave]|ρ2**Trivial refutation ∅ Refute =T ¬Traveling =T Traveling ¬Alive =∅ =T =T ¬In_grave∨¬Alive In_grave ¬Traveling∨Alive Given Exampleρ2: Traveling = 0, Alive = 0**The theorem, formally**The algorithm uses 1/γ2log1/δ partial examples to distinguish the following cases w.p. 1-δ: • The queryφ is not (1-ε-γ)-valid • There exists a (1-ε+γ)-witnessed formula ψfor which there exists a proof of the query φ from ψ LEARN ANYψTHAT HELPS VALIDATE THE QUERYφ. N.B.: ψMAY NOT BE 1-VALID**Analysis**• Note that resolution is sound… • So, whenever a proof of φ|ρi exists, φwas satisfied by the example from D • If φ is not (1-ε-γ)-valid, tail bounds imply that it is unlikely that a (1-ε) fraction satisfied φ • On the other hand, consider the proof of φfrom the (1-ε+γ)-witnessed CNF ψ… • With probability (1-ε+γ), all of the clauses of ψsimplify to 1 • The restricted proof does not require clauses of ψ “Implicitly learned”**Recap: this work…**• Proposed a criterion for learnability of common sense rules in reasoning: “witnessed evaluation” • Presented a simple algorithm for efficiently considering all such rules as premises for reasoning in any “natural” (tractable) fragment • “Natural” defined by Beame, Kautz, Sabharwal (JAIR 2004) means: “closed under plugging in partial info.” • Tolerant to counterexamples as appropriate for application to “common sense” reasoning**Prior work: Learning to Reason**• Khardon & Roth (JACM 1997) showed that O(log n)-CNF queries could be efficiently answered using complete examples • No mention of theorem-proving whatsoever! • Could only handle low-width queries under incomplete information (Mach. Learn. 1999) • Noise-tolerant learning captures (some kinds of) common sense (Roth, IJCAI’95)**Work in progress**• Further integration of learning and reasoning • Deciding general RES for limited learning problems in quasipoly-time: arXiv:1304.4633 • Limits of this approach: ECCC TR13-094 • Integration with “fancier” semantics (e.g., naf) • The point: want to consider proofs using such “implicitly learned” facts & rules**Future work**• Empirical validation • Good domain? • Explicit learning of premises • Not hard for our fragments under “bounded concealment” (Michael AIJ 2010) • But: this won’t tolerate counterexamples!