1 / 31

Statistical Relational Learning

Statistical Relational Learning. Joint Work with Sriraam Natarajan, Kristian Kersting , Jude Shavlik. Bayesian Networks. Burglary. Earthquake. Alarm. MaryCalls. JohnCalls. Bayesian Network for a City. H3. H1. Burglary. Earthquake. Burglary. Earthquake. Alarm. Alarm. Calls(H2).

lois
Download Presentation

Statistical Relational Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistical Relational Learning Joint Work with Sriraam Natarajan, KristianKersting, Jude Shavlik

  2. Bayesian Networks Burglary Earthquake Alarm MaryCalls JohnCalls

  3. Bayesian Network for a City H3 H1 Burglary Earthquake Burglary Earthquake Alarm Alarm Calls(H2) Calls(H4) Calls(H2) Burglary Earthquake H2 Alarm Calls(H1) Calls(H3) H5 H4 Burglary Earthquake Burglary Earthquake Alarm Alarm Calls(H3) Calls(H5) Calls(H4) Calls(H6)

  4. Shared Variables Earthquake(BL) Burglary(H1) Burglary(H4) Burglary(H3) Burglary(H2) Alarm(H3) Alarm(H4) Alarm(H2) Alarm(H1) Calls(H5) Calls(H4) Calls(H1) Calls(H2) Calls(H3)

  5. First Order Logic HouseInCity(house, city) Earthquake(city) Burglary(house) Neighbor(house, nhouse) Alarm(house) Calls(nhouse) • Alarm(house) :- HouseInCity(house, city), Earthquake(city), Burglary(house)

  6. Logic + Probability = Statistical Relational Learning Models Add Probabilities Logic Statistical Relational Learning (SRL) Diff Add Relations Probabilities CRating PRating

  7. Alphabetic Soup • Knowledge-based model construction[Wellman et al., 1992] • PRISM [Sato & Kameya 1997] • Stochastic logic programs [Muggleton, 1996] • Probabilistic relational models [Friedman et al., 1999] • Bayesian logic programs [Kersting & De Raedt, 2001] • Bayesian logic[Milch et al., 2005] • Markov logic[Richardson & Domingos, 2006] • Relational dependency networks [Neville & Jensen 2007] • ProbLog [De Raedt et al., 2007] And many others!

  8. Relational Database

  9. First Order Logic Student(S) Prof(P) IQ(S,I) Level(P,L) satis(S,B) taughtBy(P,C) takes(S,C) ratings(P,C,R) grde(S,C,G) Course(C) Diff(C)

  10. Graphical Model grades(S, C, G) Diff(S, C, D) avgDiff(S, D) avgGrade(S, G) satisfaction(S, B) P(satisfaction(S, B) | avgGrade(S, G), avgDiff(D))

  11. Relational Decision Tree speed(X,S), S>120 yes no job(X, politician) N yes no N knows(X,Y) no yes Y job(Y, politician) no yes Y N

  12. Relational Decision Tree speed(Alice,150), 150>120 yes no job(X, politician) N yes no N knows(X,Y) no yes Y job(Y, politician) no yes Y N

  13. Relational Decision Tree speed(Alice,150), 150>120 yes no job(Alice, politician) N yes no N knows(X,Y) no yes Y job(Y, politician) no yes Y N

  14. Relational Decision Tree speed(Alice,150), 150>120 yes no job(Alice, politician) N yes no N knows(Alice,John) no yes Y job(Y, politician) no yes Y N

  15. Relational Decision Tree speed(Alice,150), 150>120 yes no job(Alice, politician) N yes no N knows(Alice,John) no yes Y job(John, politician) no yes Y N

  16. Relational Decision Tree speed(Alice,150), 150>120 yes no job(Alice, politician) N yes no N knows(Alice,John) no yes Y job(John, politician) no yes Y N

  17. Relational Decision Tree speed(Alice,150), 150>120 yes no job(Alice, politician) N yes no N knows(Alice,John) no yes Y job(John, politician) no yes Y N

  18. Relational Probability Trees speed(X,S), S>120 • Use probabilities on the leaves • Can be used to represent the conditional distributions • Can use regression values on leaves to represent regression functions yes no job(X, politician) 0.1 yes no knows(X,Y) 0.2 no yes job(Y, politician) 0.8 no yes 0.8 0.4

  19. Structure Learning Problem • Learn the structure of the conditional distributions • Find the parents and the distribution for the target concept avgGrade(S, G) avgDiff(S, D) IQ(S, I) level(P, L) satisfaction(S, B)

  20. Relational Tree Learning adviser(X) paper(X, Y) student(X) student(X) student(X) = T student(X) = F paper(X,Y) 0.25 -0.9 paper(X,Y) = T paper(X,Y) = F 0.7 -0.2

  21. Functional Gradient Boosting • Sequentially learn models where each subsequent model corrects the previous model ψm Data Residues = Induce - Initial Model Predictions + + Iterate + + + + Final Model = … Natarajan et al MLJ’12

  22. Boosting Algorithm For each gradient step m=1 to M For each query predicate, P For each example, x Generate trainset using previous model, Fm-1 Compute gradient for x Learn a regression function, Tm,p Add <x, gradient(x)> to trainset Add Tm,p to the model, Fm Set Fm as current model

  23. UW-CSE • Predict advisedBy relation • Given student, professor, courseTA, courseProf, etc relations • 5-fold cross validation http://pages.cs.wisc.edu/~tushar/rdnboost/index.html

  24. CARDIA • Family history, medical history, physical activity, nutrient intake, obesity questions, pysochosocial, pulmonary function etc • Goal is to identify risk factors in early adulthoodthat causes serious cardio-vascular issues in older adults • Extremely rich dataset with 25 years of information S. Natarajan , J. Carr

  25. Results

  26. Imitation Learning • Expert agent performs actions (trajectories) • Goal: Learn a policy from these trajectories to suggest actions based on current state Natarajan et al. IJCAI’11

  27. Gridworld domain Robocup domain

  28. Alzheimer's Research • AD – Progressive neurodegenerative condition resulting in loss of cognitive abilities and memory • MRI – neuroimaging method • Visualization of brain anatomy • Humans are not very good at identifying people with AD, especially before cognitive decline • MRI data – major source for distinguishing AD vs CN (Cognitively normal) or MCI vs CN Natarajan et al. Under review

  29. Propositional Models (with AAL)

  30. Conclusion • Statistical Relational Learning combines first-order logic with probabilistic models • Relational trees used to represent conditional distributions • Boosting trees can be used to efficiently learn structure of SRL models

More Related