1 / 56

Overview

Overview. Introduction to PLL Foundations of PLL Logic Programming, Bayesian Networks, Hidden Markov Models, Stochastic Grammars Frameworks of PLL Independent Choice Logic,Stochastic Logic Programs, PRISM,Probabilistic Logic Programs,Probabilistic Relational Models, Bayesian Logic Programs

guang
Download Presentation

Overview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Overview • Introduction to PLL • Foundations of PLL • Logic Programming, Bayesian Networks, Hidden Markov Models, Stochastic Grammars • Frameworks of PLL • Independent Choice Logic,Stochastic Logic Programs, PRISM,Probabilistic Logic • Programs,Probabilistic Relational Models, Bayesian Logic Programs • Relational Hidden Markov Models • Learning • Applications

  2. [Haddawy, Ngo] P(A | B,E) E B e b 0.9 0.1 Earthquake Burglary b 0.2 0.8 e 0.9 0.1 e b Alarm 0.99 0.01 e b MaryCalls JohnCalls Probability of being true State RV Probabilistic Logic Programs (PLPs) • Atoms = set of similar RVs • First arguments = RV • Last argument = state • Clause = CPD entry e b 0.9 • Probability distribution over Herbrand interpretations 0.1 : burglary(true). 0.9 : burglary(false). 0.01 : earthquake(true). 0.99: earthquake(false). 0.9 : alarm(true) :- burglary(true), earthquake(true). ... burglary(true)and burglary(false) true in the same interpretation? false :- burglary(true), burglary(false). burglary(true); burglary(false) :- true. false :- earthquake(true), earthquake(false). ... + Integrity constraints

  3. [Haddawy, Ngo] Probabilistic Logic Programs (PLPs) father(rex,fred). mother(ann,fred). father(brian,doro). mother(utta, doro). father(fred,henry). mother(doro,henry). Context 1.0 : mc(P,a) :- mother(M,P), pc(M,a),mc(M,a). 0.0 : mc(P,b) :- mother(M,P), pc(M,a),mc(M,a). ... 0.5 : pc(P,a) :- father(F,P), pc(F,0),mc(F,a). 0.5 : pc(P,0) :- father(F,P), pc(F,0),mc(F,a). ... 1.0 : bt(P,a) :- mc(P,aa),pc(P,aa) Probabilities false :- pc(P,a),pc(P,b), pc(P,0). pc(P,a);pc(P,b); pc(P,0)´:- person(P). ... Constraints

  4. [Haddawy, Ngo] RV State Dependency Probabilistic Logic Programs (PLPs) father(rex,fred). mother(ann,fred). father(brian,doro). mother(utta, doro). father(fred,henry). mother(doro,henry). Qualitative Part Quantitative Part 1.0 : mc(P,a) :- mother(M,P), pc(M,a),mc(M,a). 0.0 : mc(P,b) :- mother(M,P), pc(M,a),mc(M,a). ... 0.5 : pc(P,a) :- father(F,P), pc(F,0),mc(F,a). 0.5 : pc(P,0) :- father(F,P), pc(F,0),mc(F,a). ... 1.0 : bt(P,a) :- mc(P,aa),pc(P,aa) false :- pc(P,a),pc(P,b), pc(P,0). pc(P,a);pc(P,b); pc(P,0) :- person(P). ...

  5. [Haddawy, Ngo] RV State Dependency Probabilistic Logic Programs (PLPs) father(rex,fred). mother(ann,fred). father(brian,doro). mother(utta, doro). father(fred,henry). mother(doro,henry). Qualitative Part Quantitative Part 1.0 : mc(P,a) :- mother(M,P), pc(M,a),mc(M,a). 0.0 : mc(P,b) :- mother(M,P), pc(M,a),mc(M,a). ... 0.5 : pc(P,a) :- father(F,P), pc(F,0),mc(F,a). 0.5 : pc(P,0) :- father(F,P), pc(F,0),mc(F,a). ... 1.0 : bt(P,a) :- mc(P,aa),pc(P,aa) Variable Binding false :- pc(P,a),pc(P,b), pc(P,0). pc(P,a);pc(P,b); pc(P,0) :- person(P). ...

  6. [Haddawy, Ngo] Probabilistic Logic Programs (PLPs) father(rex,fred). mother(ann,fred). father(brian,doro). mother(utta, doro). father(fred,henry). mother(doro,henry). 1.0 : mc(P,a) :- mother(M,P), pc(M,a),mc(M,a). 0.0 : mc(P,b) :- mother(M,P), pc(M,a),mc(M,a). ... 0.5 : pc(P,a) :- father(F,P), pc(F,0),mc(F,a). 0.5 : pc(P,0) :- father(F,P), pc(F,0),mc(F,a). ... 1.0 : bt(P,a) :- mc(P,aa),pc(P,aa) mc(ann) pc(ann) mc(rex) pc(rex) mc(utta) mc(brian) pc(brian) pc(utta) pc(fred) pc(doro) mc(fred) mc(doro) bt(brian) bt(utta) bt(rex) bt(ann) mc(henry) pc(henry) bt(fred) bt(doro) bt(henry) false :- pc(P,a),pc(P,b), pc(P,0). pc(P,a);pc(P,b); pc(P,0) :- person(P). ...

  7. [Haddawy, Ngo] Probabilistic Logic Programs (PLPs) • Unique probability distribution over Herbrand interpretations • finite branching factor, finite proofs, no self-dependency • Atoms = States • Integrity constraints encode mutually excl. states • Functors • BN used to do inference • Turing-complete programming language • BNs, HMMs, DBNs, SCFGs, ... • No learning

  8. Database theory Entity-Relationship Models Attributes = RV [Getoor,Koller, Pfeffer] P(A | B,E) E B Earthquake Burglary e b 0.9 0.1 b 0.2 0.8 e Alarm 0.9 0.1 e b 0.99 0.01 e b MaryCalls JohnCalls Probabilistic Relational Models (PRMs) Database alarm system Earthquake Burglary Table Alarm MaryCalls JohnCalls Attribute

  9. [Getoor,Koller, Pfeffer] Binary Relation Table Probabilistic Relational Models (PRMs) (Father) (Mother) Bloodtype Bloodtype M-chromosome M-chromosome P-chromosome P-chromosome Person Person M-chromosome P-chromosome Bloodtype Person

  10. [Getoor,Koller, Pfeffer] Probabilistic Relational Models (PRMs) father(Father,Person). (Father) (Mother) Bloodtype Bloodtype M-chromosome M-chromosome P-chromosome P-chromosome Person Person bt(Person,BT). M-chromosome P-chromosome pc(Person,PC). mc(Person,MC). Bloodtype Person Dependencies : bt(Person,BT) :- pc(Person,PC), mc(Person,MC). pc(Person,PC) :- pc_father(Father,PCf), mc_father(Father,MCf). View : pc_father(Person,PCf) | father(Father,Person),pc(Father,PC).

  11. [Getoor,Koller, Pfeffer] mc(ann) pc(ann) mc(rex) pc(rex) mc(utta) mc(brian) pc(brian) pc(utta) pc(fred) pc(doro) mc(fred) mc(doro) bt(brian) bt(utta) bt(rex) bt(ann) mc(henry) pc(henry) bt(fred) bt(doro) bt(henry) Probabilistic Relational Models (PRMs) father(rex,fred). mother(ann,fred). father(brian,doro). mother(utta, doro). father(fred,henry). mother(doro,henry). pc_father(Person,PCf) | father(Father,Person),pc(Father,PC). ... mc(Person,MC) | pc_mother(Person,PCm), pc_mother(Person,MCm). pc(Person,PC) | pc_father(Person,PCf), mc_father(Person,MCf). bt(Person,BT) | pc(Person,PC), mc(Person,MC). State RV

  12. [Getoor,Koller, Pfeffer] Probabilistic Relational Models (PRMs) • Datalog • Unique Probability Distribution over finite Herbrand interpretations • No self-dependency • Discrete and continuous RV • BN used to do inference • Highlight Graphical Representation • BNs • Learning

  13. [Kersting, De Raedt] P(A | B,E) E B Earthquake Burglary e b 0.9 0.1 b 0.2 0.8 e Alarm 0.9 0.1 e b 0.99 0.01 e b MaryCalls JohnCalls local BN fragment earthquake burglary P(A | B,E) E B e b 0.9 0.1 b 0.2 0.8 e 0.9 0.1 e b alarm 0.99 0.01 e b Bayesian Logic Programs (BLPs) Rule Graph earthquake/0 burglary/0 alarm/0 maryCalls/0 johnCalls/0 alarm :- earthquake, burglary.

  14. [Kersting, De Raedt] mc(Person) pc(Mother) mc(Mother) Mother (1.0,0.0,0.0) a a (0.5,0.5,0.0) a b pc mc ... ... ... mother mc Person argument atom Person pc mc bt(Person) pc(Person) mc(Person) (1.0,0.0,0.0,0.0) a a (0.0,0.0,1.0,0.0) a b bt ... ... ... predicate Bayesian Logic Programs (BLPs) Rule Graph pc/1 mc/1 bt/1 variable bt(Person) :- pc(Person),mc(Person).

  15. [Kersting, De Raedt] mc(Person) pc(Mother) mc(Mother) Mother (1.0,0.0,0.0) a a (0.5,0.5,0.0) a b pc mc ... ... ... mother mc Person Bayesian Logic Programs (BLPs) pc/1 mc/1 bt/1 mc(Person) | mother(Mother,Person), pc(Mother),mc(Mother). pc(Person) | father(Father,Person), pc(Father),mc(Father). bt(Person) | pc(Person),mc(Person).

  16. [Kersting, De Raedt] mc(ann) pc(ann) mc(rex) pc(rex) mc(utta) mc(brian) pc(brian) pc(utta) pc(fred) pc(doro) mc(fred) mc(doro) bt(brian) bt(utta) bt(rex) bt(ann) mc(henry) pc(henry) bt(fred) bt(doro) bt(henry) Bayesian Logic Programs (BLPs) father(rex,fred). mother(ann,fred). father(brian,doro). mother(utta, doro). father(fred,henry). mother(doro,henry). mc(Person) | mother(Mother,Person), pc(Mother),mc(Mother). pc(Person) | father(Father,Person), pc(Father),mc(Father). bt(Person) | pc(Person),mc(Person).

  17. [Kersting, De Raedt] Bayesian Logic Programs (BLPs) • Unique probability distribution over Herbrand interpretations • Finite branching factor, finite proofs, no self-dependency • Highlight • Separation of qualitative and quantiative parts • Functors • Graphical Representation • Discrete and continuous RV • BNs, DBNs, HMMs, SCFGs, Prolog ... • Turing-complete programming language • Learning

  18. Combining Partial Knowledge ... Topic discusses Book discusses/2 read/1 prepared read Student prepared(Student,Topic) | read(Student,Book), discusses(Book,Topic). prepared/2 logic prepared bn passes passes/1 prepared Student passes(Student) | prepared(Student,bn), prepared(Student,logic).

  19. discusses(b2,bn) discusses(b1,bn) prepared(s2,bn) prepared(s1,bn) Combining Partial Knowledge • variable #parents for prepared/2 due to read/2 • whether a student prepared a topic depends on the books she read • CPD only for one book-topic pair Topic discusses Book prepared read Student prepared(Student,Topic) | read(Student,Book), discusses(Book,Topic).

  20. Combining Rules Topic P(A|B) and P(A|C) discusses Book prepared read Student CR prepared(Student,Topic) | read(Student,Book), discusses(Book,Topic). P(A|B,C) • Any algorithm which • has an empty output if and only if the input is empty • combines a set of CPDs into a single (combined) CPD • E.g. noisy-or, regression, ...

  21. ... registration_grade/2 registered/2 student_ranking/1 Aggregates Map multisets of values to summary values (e.g., sum, average, max, cardinality)

  22. ... registration_grade/2 registered/2 registered/2 Functional Dependency (average) Course Student grade_avg/1 registration_grade grade_avg Probabilistic Dependency (CPD) grade_avg student_ranking/1 student_ranking Student Aggregates Map multisets of values to summary values (e.g., sum, average, max, cardinality) grade_avg/1 Deterministic

  23. Stochastic Relational Models (SRMs) • Type I, i.e., frequencies in databases • Probability that a select-join query succeeds: • Independently sample tuples ri from Ri; select as values for Ai the values r.Ai

  24. Stochastic Relational Models (SRMs) WHO Mortality Database country.name death.cause pers.sex pers.jCountry pers.dyear pers.jDeath pers.dage query(pers:dage=‚75-79y‘,death:cause=k) = 0.012 query(pers:dage=‚85-89y‘,death:cause=k) = 0.0012 query(pers:dage=‚75-79y‘,death:cause=r) = 0.02 query(pers:dage=‚85-89y‘,death:cause=r) = 0.114 query(pers:dage=‘1-4y‘) = 0.00201 query(pers:dage=‘25-29y‘) = 7.1*10-5 query(pers:dage=‘75-79y‘) = 0.12 query(pers:dage=‘85-89y‘) = 0.176

  25. Overview • Introduction to PLL • Foundations of PLL • Logic Programming, Bayesian Networks, Hidden Markov Models, Stochastic Grammars • Frameworks of PLL • Independent Choice Logic,Stochastic Logic Programs, PRISM,Probabilistic Logic • Programs,Probabilistic Relational Models, Bayesian Logic Programs • Relational Hidden Markov Models • Learning • Applications

  26. CS course on data mining. Lecturer Wolfram Lecturer Luc CS department CS course on statistics CS course on Robotics Relational (Hidden) Markov Models • Each state is trained independently • No sharing of experience, large state space

  27. course(cs,dm) lecturer(cs,luc) lecturer(cs,wolfram) dept(cs) course(cs,stats) course(cs,rob) dept(D) course(D,C) course(D,C) lecturer(D,L) Relational (Hidden) Markov Models (0.7) dept(D) -> course(D,C). (0.2) dept(D) -> lecturer(D,L). ... (0.3) course(D,C) -> lecturer(D,L). (0.3) course(D,C) -> dept(D). (0.3) course(D,C) -> course(D´,C´). ... (0.1) lecturer(D,L) -> course(D,C). ... Abstract states

  28. Relational (Hidden) Markov Models • So far, only transitions between abstract states • Needed: possible transitions and their probabilities for any ground state lecturer(D,L) Possible instantiations for each arguments {cs,math,bio,...} x {luc, wolfram, ...} Chance of instations P(lecturer(cs,luc))

  29. Relational (Hidden) Markov Models RMMs [Anderson et al. 03]: Probability Estimation Trees lecturer(D,L) LOHMMs [Kersting et al. 03]: Naive Bayes P(D) P(L) *

  30. Learning Tasks • Parameter Estimation • Numerical Optimization Problem • Model Selection • Combinatorical Search Learning Algorithm Database Model

  31. Differences between SL and PLL ? • Representation (cf. above) • Structure on the search space becomes more complex • operators for traversing the space • … • Algorithms remain essentially the same

  32. What is the data about? – Model Theoretic E Earthquake Burglary Alarm MaryCalls JohnCalls Model(1) earthquake=yes, burglary=no, alarm=?, marycalls=yes, Johncalls=no Model(3) earthquake=?, burglary=?, alarm=yes, marycalls=yes, Johncalls=yes Model(2) earthquake=no, burglary=no, alarm=no, marycalls=no, Johncalls=no

  33. What is the data about? – Model Theoretic E • Data case: • RV specified = (partial) Herbrand interpretation • Aking to „learning from interpretations“ (ILP) Background m(ann,dorothy), f(brian,dorothy), m(cecily,fred), f(henry,fred), f(fred,bob), m(kim,bob), ... Model(2) bt(cecily)=ab, bt(henry)=a, bt(fred)=?, bt(kim)=a, bt(bob)=b Model(1) pc(brian)=b, bt(ann)=a, bt(brian)=?, bt(dorothy)=a Model(3) pc(rex)=b, bt(doro)=a, bt(brian)=? Bloodtype example

  34. Parameter Estimation – Model Theoretic Database D Learning Algorithm + Parameter Q Underlying Logic program L

  35. Parameter Estimation – Model Theoretic • Estimate the CPD q entries that best fit the data • „Best fit“: ML parameters q* q* = argmaxq P( data | logic program, q) • = argmaxq log P( data | logic program, q) • Reduces to problem within Bayesian networks: given structure, partially observed random varianbles

  36. Parameter Estimation – Model Theoretic +

  37. ... ... ... ... Excourse: Decomposable CRs E • Parameters of the clauses and not of the support network. Single ground instance Multiple ground instance of the same clause Deterministic CPD for Combining Rule

  38. Parameter Estimation – Model Theoretic +

  39. Parameter Estimation – Model Theoretic + Parameter tighting

  40. Inference EM – Model Theoretic EM-algorithm: iterate until convergence Logic Program L Expectation Initial Parameters q0 Current Model (M,qk) Expected counts Maximization Update parameters (ML, MAP)

  41. Model Selection – Model Theoretic Database Learning Algorithm + Language: Bayesian bt/1, pc/1,mc/1 Background Knowledge: Logical mother/2, father/2

  42. Model Selection – Model Theoretic • Combination of ILP and BN learning • Combinatorical search for hypo M* s.t. • M* logically covers the data D • M* is optimal w.r.t. some scoring function score, i.e., M* = argmaxM score(M,D). • Highlights • Refinement operators • Background knowledge • Language biase • Search bias

  43. Refinement Operators • Add a fact, delete a fact or refine an existing clause: • Specialization: • Add atom • apply a substitution { X / Y } where X,Y already appear in atom • apply a substitution { X / f(Y1, … , Yn)} where Yi new variables • apply a substitution {X / c } where c is a constant • Generalization: • delete atom • turn ‘term’ into variable • p(a,f(b)) becomes p(X,f(b)) or p(a,f(X)) • p(a,a) becomes p(X,X) or p(a,X) or p(X,a) • replace two occurences of variable X into X1 and X2 • p(X,X) becomes p(X1,X2)

  44. Original program Data cases mc(X) | m(M,X), mc(M), pc(M). pc(X) | f(F,X), mc(F), pc(F). bt(X) | mc(X), pc(X). {m(ann,john)=true, pc(ann)=a, mc(ann)=?, f(eric,john)=true, pc(eric)=b, mc(eric)=a, mc(john)=ab, pc(john)=a, bt(john) = ? } ... mc(ann) mc(eric) pc(ann) pc(eric) mc(john) pc(john) m(ann,john) f(eric,john) bc(john) Example

  45. Original program mc(X) | m(M,X), mc(M), pc(M). pc(X) | f(F,X), mc(F), pc(F). bt(X) | mc(X), pc(X). mc(ann) mc(eric) pc(ann) pc(eric) Initial hypothesis mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X). mc(john) pc(john) m(ann,john) f(eric,john) bc(john) Example

  46. Original program mc(X) | m(M,X), mc(M), pc(M). pc(X) | f(F,X), mc(F), pc(F). bt(X) | mc(X), pc(X). mc(ann) mc(eric) pc(ann) pc(eric) Initial hypothesis mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X). mc(john) pc(john) m(ann,john) f(eric,john) bc(john) Example

  47. Original program mc(X) | m(M,X), mc(M), pc(M). pc(X) | f(F,X), mc(F), pc(F). bt(X) | mc(X), pc(X). Initial hypothesis mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X). Refinement mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X), pc(X). Example mc(ann) mc(eric) pc(ann) pc(eric) mc(john) pc(john) m(ann,john) f(eric,john) bc(john)

  48. Original program mc(X) | m(M,X), mc(M), pc(M). pc(X) | f(F,X), mc(F), pc(F). bt(X) | mc(X), pc(X). Initial hypothesis mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X). Refinement Refinement mc(X) | m(M,X),mc(X). pc(X) | f(F,X). bt(X) | mc(X), pc(X). mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X), pc(X). Example mc(ann) mc(eric) pc(ann) pc(eric) mc(john) pc(john) m(ann,john) f(eric,john) bc(john)

  49. Original program mc(X) | m(M,X), mc(M), pc(M). pc(X) | f(F,X), mc(F), pc(F). bt(X) | mc(X), pc(X). Initial hypothesis mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X). Refinement Refinement mc(X) | m(M,X),pc(X). pc(X) | f(F,X). bt(X) | mc(X), pc(X). mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X), pc(X). Example mc(ann) mc(eric) pc(ann) pc(eric) mc(john) pc(john) m(ann,john) f(eric,john) bc(john)

  50. Original program mc(X) | m(M,X), mc(M), pc(M). pc(X) | f(F,X), mc(F), pc(F). bt(X) | mc(X), pc(X). Initial hypothesis mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X). Refinement Refinement mc(X) | m(M,X),pc(X). pc(X) | f(F,X). bt(X) | mc(X), pc(X). mc(X) | m(M,X). pc(X) | f(F,X). bt(X) | mc(X), pc(X). Example E mc(ann) mc(eric) pc(ann) pc(eric) mc(john) pc(john) m(ann,john) f(eric,john) bc(john) ...

More Related