1 / 47

Learning Markov Logic Network Structure Via Hypergraph Lifting

Learning Markov Logic Network Structure Via Hypergraph Lifting. Stanley Kok Dept. of Computer Science and Eng. University of Washington Seattle, USA Joint work with Pedro Domingos. Goal of LHL. Synopsis of LHL. Teaches. Professor. Course. Pete. Advises. Input : Relational DB.

damien
Download Presentation

Learning Markov Logic Network Structure Via Hypergraph Lifting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LearningMarkov Logic Network Structure Via Hypergraph Lifting Stanley Kok Dept. of Computer Science and Eng. University of Washington Seattle, USA Joint work with Pedro Domingos

  2. Goal of LHL Synopsis of LHL Teaches Professor Course Pete Advises Input: Relational DB CS1 Output: Probabilistic KB Pete Paul Pat Phil CS1 CS2 CS3 CS4 CS5 CS6 CS7 CS8 Teaches Paul CS2 2.7Teaches(p, c) ÆTAs(s, c) ) Advises(p, s) 1.4Advises(p, s) ) Teaches(p, c)Æ TAs(s, c) -1.1 TAs(s, c) Æ Advises (s, p) … Advises Advises TAs TAs Pat Input: Relational DB CS3 Pete Pete Pete Sam Pete Sam CS1 CS1 Sam Sam CS1 CS1 Output: Probabilistic KB Advises Phil 2.7 Teaches(p, c) ÆTAs(s, c) ) Advises(p, s) 1.4 Advises(p, s) ) Teaches(p, c)Æ TAs(s, c) -1.1 TAs(s, c) ) Advises(s, p) … Sam Pete Sam Pete Pete Pete CS2 CS2 CS2 Saul CS2 Saul Sam Sara Saul Sue CS4 Sara Paul Paul Sara Paul Paul CS1 CS1 Sara CS2 CS2 Sar … … … … … … … … … … … … CS5 Sam TAs Teaches Teaches CS6 Sara Student CS7 Saul CS8 Sue TAs 2

  3. Experimental Results LHL LHL BUSL MSL BUSL MSL LHL LHL BUSL MSL MSL BUSL Conditional Log-Likelihood (CLL) Area under Prec Recall Curve (AUC) 3

  4. Outline • Background • Learning via Hypergraph Lifting • Experiments • Future Work • Background • Learning via Hypergraph Lifting • Experiments • Future Work 4

  5. Markov Logic • A logical KB is a set of hard constraintson the set of possible worlds • Let’s make them soft constraints:When a world violates a formula,it becomes less probable, not impossible • Give each formula a weight(Higher weight  Stronger constraint)

  6. Markov Logic • A Markov logic network (MLN) is a set of pairs (F,w) • F is a formula in first-order logic • wis a real number vector of truth assignments to ground atoms #true groundings of ith formula weight of ithformula partition function

  7. Challenging task Few approaches to date [Kok & Domingos, ICML’05; Mihalkova & Mooney, ICML’07; Biba et. al. ECAI’08; Huynh & Mooney, ICML’08] Most MLN structure learners Greedily and systematically enumerate formulas Computationally expensive; large search space Susceptible to local optima MLN Structure Learning 7

  8. While beam not empty Add unit clauses to beam While beam has changed Foreach clause c in beam c’à add a literal to c newClausesà newClauses[ c’ beamÃkbest clauses in beam [ newClauses Add best clause in beam to MLN MSL [Kok & Domingos, ICML’05] 8

  9. Find paths of linked ground atoms !formulas Path ´ conjunction that is true at least once Exponential search space of paths Restricted to short paths Relational Pathfinding[Richards & Mooney, AAAI’92] CS1 CS2 CS3 Teaches Pete CS4 Paul CS5 Pete CS1 Pat Phil CS6 Advises Advises( p , s )ÆTeaches( p , c )ÆTAs( s , c ) Advises(Pete, Sam)ÆTeaches(Pete, CS1)ÆTAs(Sam, CS1) Sam CS7 Sam Sara CS8 Saul 9 Sue TAs

  10. Find short paths with a form of relational pathfinding Path!Boolean variable!Node in Markov network Greedily tries to link the nodes with edges Cliques ! clauses Form disjunctions of atoms in clique’s nodes Greedily adds clauses to an empty MLN BUSL[Mihalkova & Mooney, ICML’07] Advises( p,s) ÆTeaches(p,c) … Advises(p,s) VTeaches(p,c)V TAs(s,c) :Advises(p,s) V : Teaches(p,c)V TAs(s,c) … TAs(s,c) 10

  11. Background Learning via Hypergraph Lifting Experiments Future Work Outline 11

  12. Uses relational pathfinding to fuller extent Induces a hypergraph over clusters of constants Learning viaHypergraph Lifting (LHL) CS1 Pete Paul Pat Phil CS1 CS2 CS3 CS4 CS5 CS6 CS7 CS8 Teaches Advises CS2 CS3 Teaches Sam Sara Saul Sue Pete CS4 Paul CS5 Pat TAs “Lift” Phil CS6 Advises Sam CS7 Sara CS8 Saul Sue 12 TAs

  13. Uses a hypergraph(V,E) V: set of nodes E : set of labeled, non-empty, ordered subsets of V Find pathsin a hypergraph Path: set of hyperedgess.t. for any two e0 and en, 9 sequence of hyperedges in set that leads from e0Ãen Learning viaHypergraph Lifting (LHL) 13

  14. Relational DB can be viewed as hypergraph Nodes ´ Constants Hyperedges´ True ground atoms Learning viaHypergraph Lifting (LHL) Teaches Pete Advises CS1 Paul CS2 Advises Pat TAs CS3 Pete Sam Pete CS1 Sam CS1 Phil Pete Pete Sam CS2 Saul CS2 DB CS4 Paul Sara Paul CS1 CS2 Sara … … … … … … CS5 Sam Teaches CS6 Sara CS7 Saul CS8 Sue TAs 14

  15. LHL “lifts” hypergraph into more compact rep. Jointly clusters nodes into higher-level concepts Clusters hyperedges Traces paths in lifted hypergraph LHL = Clustering + Relational Pathfinding CS1 Pete Paul Pat Phil CS1 CS2 CS3 CS4 CS5 CS6 CS7 CS8 Teaches Advises CS2 CS3 Teaches Sam Sara Saul Sue Pete CS4 Paul CS5 Pat TAs Phil CS6 Advises “Lift” Sam CS7 Sara CS8 Saul Sue 15 TAs

  16. Learning via Hypergraph Lifting LHL has three components LiftGraph: Lifts hypergraph FindPaths: Finds paths in lifted hypergraph CreateMLN: Creates rules from paths, and adds good ones to empty MLN 16

  17. Defined using Markov logic Jointly clusters constants in bottom-up agglomerative manner Allows information to propagate from one cluster to another Ground atoms also clustered #Clusters need not be specified in advance Each lifted hyperedge contains ¸ one true ground atom LiftGraph 17

  18. Find cluster assignment Cthat maxmizes posterior prob. P(C | D) / P(D| C)P(C) Learning Problem in LiftGraph Truth values of ground atoms Defined with an MLN Defined with another MLN 18

  19. For each predicater and each cluster combination containing a true ground atom of r, we have an atom prediction rule • LiftGraph’sP(D|C) MLN Professor Course Course Professor Pete Paul Pat Phil CS1 CS2 CS3 CS4 CS5 CS6 CS7 CS8 CS1 CS2 CS3 CS4 CS5 CS6 CS7 CS8 Teaches Teaches Pete Paul Pat Phil Advises Sam Sara Saul Sue TAs Student 19

  20. LiftGraph’sP(D|C) MLN • For each predicaterand each cluster combination containing a true ground atom of r, we have an atom prediction rule Professor Course Pete Paul Pat Phil CS1 CS2 CS3 CS4 CS5 CS6 CS7 CS8 Teaches p 2 Æ c 2 ) Teaches(p,c) 20

  21. For each predicater, we have a default atom prediction rule • LiftGraph’sP(D|C) MLN x 2 Æ y 2 Professor Professor Course Default Cluster Combination Pete Paul Pat Phil Pete Paul Pat Phil CS1 CS2 CS3 CS4 CS5 CS6 CS7 CS8 ) Teaches(x,y) Student Sam Sara Saul Sue Æ y 2 x 2 … 21

  22. Each symbol belongs to exactly one cluster Infinite weight Exponential prior on #cluster combinations Negative weight -¸ • LiftGraph’sP(C) MLN 22

  23. Hard assignments of constants to clusters Weights and log-posterior computed in closed form Searches for cluster assignment with highest log-posterior • LiftGraph 23

  24. LiftGraph’s SearchAlgm CS1 Teaches CS2 Pete Paul Pete Pete CS3 Paul Paul Sam Advises Sara 24

  25. LiftGraph’s SearchAlgm CS1 CS2 CS1 CS2 CS3 CS1 CS1 Teaches CS2 CS2 Pete Paul Teaches CS3 CS3 Sam Sara Advises Sam Sam Advises Sara Sara 25

  26. FindPaths Paths Found Pete Paul Pat Phil Advises(,) CS1 CS2 CS3 CS4 CS5 CS6 CS7 CS8 CS1 CS2 CS3 CS4 CS5 CS6 CS7 CS8 Teaches Advises(,) , Teaches( , ) Advises Sam Sara Saul Sue Pete Paul Pat Phil Advises(,) , Teaches( , ), TAs( , ) TAs Sam Sara Saul Sue 26

  27. Clause Creation Pete Paul Pat Phil Pete Paul Pat Phil Sam Sara Saul Sue Sam Sara Saul Sue Advises(,) p s Advises( ,) Advises(,), :Advises(p, s) V :Teaches(p, c) V:TAs(s,c) Advises(p, s)Æ Teaches(p, c)ÆTAs(s,c) CS1 CS2 CS3 CS4 CS5 CS6 CS7 CS8 CS1 CS2 CS3 CS4 CS5 CS6 CS7 CS8 Advises(p, s) V :Teaches(p, c) V:TAs(s,c) Pete Paul Pat Phil Pete Paul Pat Phil Æ Æ p Teaches(, ) c Teaches(,) Teaches(,), Advises(p, s) VTeaches(p, c) V:TAs(s,c) … CS1 CS2 CS3 CS4 CS5 CS6 CS7 CS8 CS1 CS2 CS3 CS4 CS5 CS6 CS7 CS8 Sam Sara Saul Sue Sam Sara Saul Sue Æ Æ TAs(,) TAs(,) TAs(, ) s c 27

  28. Clause Pruning Score -1.15 : Advises(p, s) V :Teaches(p, c) VTAs(s,c) -1.17 Advises(p, s) V :Teaches(p, c) VTAs(s,c) … … -2.21 : Advises(p, s) V :Teaches(p, c) -2.23 : Advises(p, s) VTAs(s,c) -2.03 :Teaches(p, c) VTAs(s,c) … … :Advises(p, s) -3.13 ` : Teaches(p, c) -2.93 -3.93 TAs(s,c) 28

  29. Clause Pruning Compare each clause against its sub-clauses (taken individually) Score -1.15 : Advises(p, s) V :Teaches(p, c) VTAs(s,c) -1.17 Advises(p, s) V :Teaches(p, c) VTAs(s,c) … … -2.21 : Advises(p, s) V :Teaches(p, c) -2.23 : Advises(p, s) VTAs(s,c) -2.03 :Teaches(p, c) VTAs(s,c) … … :Advises(p, s) -3.13 : Teaches(p, c) -2.93 -3.93 TAs(s,c) 29

  30. Add clauses to empty MLN in order of decreasing score Retrain weights of clauses each time clause is added Retain clause in MLN if overall score improves MLN Creation 30

  31. Background Learning via Hypergraph Lifting Experiments Future Work Outline 31

  32. IMDB Created from IMDB.com DB Movies, actors, etc., and relationships 17,793 ground atoms; 1224 true ones UW-CSE Describes academic department Students, faculty, etc., and relationships 260,254 ground atoms; 2112 true ones Datasets 32

  33. Cora Citations to computer science papers Papers, authors, titles, etc., and their relationships 687,422ground atoms; 42,558 true ones Datasets 33

  34. Five-fold cross validation Inferred prob. true for groundings of each predicate Groundings of all other predicates as evidence Evaluation measures Area under precision-recall curve (AUC) Average conditional log-likelihood (CLL) Methodology 34

  35. MCMC inference algms in Alchemy to evaluate the test atoms 1 million samples 24 hours Methodology 35

  36. Compared with MSL[Kok & Domingos, ICML’05] BUSL[Mihalkova & Mooney, ICML’07] Lesion study NoLiftGraph: LHL with no hypergraph lifting Find paths directly from unliftedhypergraph NoPathFinding: LHL with no pathfinding Use MLN representing LiftGraph Methodology 36

  37. LHL vs. BUSL vs. MSLArea under Prec-Recall Curve IMDB UW-CSE LHL BUSL MSL LHL BUSL MSL Cora LHL BUSL MSL 37

  38. LHL vs. BUSL vs. MSLConditional Log-likelihood IMDB UW-CSE LHL BUSL MSL LHL BUSL MSL Cora LHL BUSL MSL

  39. LHL vs. BUSL vs. MSLRuntime IMDB UW-CSE min hr LHL BUSL MSL LHL BUSL MSL Cora hr LHL BUSL MSL 39

  40. LHL vs. NoLiftGraphArea under Prec-Recall Curve IMDB UW-CSE NoLift Graph NoLift Graph LHL LHL Cora NoLift Graph LHL 40

  41. LHL vs. NoLiftGraphConditional Log-likelihood IMDB UW-CSE NoLift Graph NoLift Graph LHL LHL Cora NoLift Graph LHL 41

  42. LHL vs. NoLiftGraphRuntime IMDB UW-CSE min hr NoLift Graph NoLift Graph LHL LHL Cora hr NoLift Graph LHL 42

  43. LHL vs. NoPathFinding IMDB UW-CSE AUC AUC NoPath Finding NoPath Finding LHL LHL CLL CLL NoPath Finding NoPath Finding LHL LHL 43

  44. ifa is an actor andd is a director, and they both worked in the same movie, thenaprobably worked under d ifp is a professor, andp co-authored a paper with s, thens is likely a student if papers x and y have same author then x and y are likely to be same paper Examples of Rules Learned 44

  45. Motivation Background Learning via Hypergraph Lifting Experiments Future Work Outline 45

  46. Integrate the components of LHL Integrate LHL with lifted inference [Singla & Domingos, AAAI’08] Construct ontology simultaneously with probabilistic KB Further scale LHL up Apply LHL to larger, richer domains e.g., the Web Future Work 46

  47. LHL = Clustering + Relational Pathfinding “Lifts” data into more compact form Essential for speeding up relational pathfinding LHL outperforms state-of-the-art structure learners Conclusion 47

More Related