Statistical Relational Learning

Statistical Relational Learning Joint Work with Sriraam Natarajan, KristianKersting, Jude Shavlik

Bayesian Networks Burglary Earthquake Alarm MaryCalls JohnCalls

Bayesian Network for a City H3 H1 Burglary Earthquake Burglary Earthquake Alarm Alarm Calls(H2) Calls(H4) Calls(H2) Burglary Earthquake H2 Alarm Calls(H1) Calls(H3) H5 H4 Burglary Earthquake Burglary Earthquake Alarm Alarm Calls(H3) Calls(H5) Calls(H4) Calls(H6)

Shared Variables Earthquake(BL) Burglary(H1) Burglary(H4) Burglary(H3) Burglary(H2) Alarm(H3) Alarm(H4) Alarm(H2) Alarm(H1) Calls(H5) Calls(H4) Calls(H1) Calls(H2) Calls(H3)

First Order Logic HouseInCity(house, city) Earthquake(city) Burglary(house) Neighbor(house, nhouse) Alarm(house) Calls(nhouse) • Alarm(house) :- HouseInCity(house, city), Earthquake(city), Burglary(house)

Logic + Probability = Statistical Relational Learning Models Add Probabilities Logic Statistical Relational Learning (SRL) Diff Add Relations Probabilities CRating PRating

Alphabetic Soup • Knowledge-based model construction[Wellman et al., 1992] • PRISM [Sato & Kameya 1997] • Stochastic logic programs [Muggleton, 1996] • Probabilistic relational models [Friedman et al., 1999] • Bayesian logic programs [Kersting & De Raedt, 2001] • Bayesian logic[Milch et al., 2005] • Markov logic[Richardson & Domingos, 2006] • Relational dependency networks [Neville & Jensen 2007] • ProbLog [De Raedt et al., 2007] And many others!

Relational Database

First Order Logic Student(S) Prof(P) IQ(S,I) Level(P,L) satis(S,B) taughtBy(P,C) takes(S,C) ratings(P,C,R) grde(S,C,G) Course(C) Diff(C)

Graphical Model grades(S, C, G) Diff(S, C, D) avgDiff(S, D) avgGrade(S, G) satisfaction(S, B) P(satisfaction(S, B) | avgGrade(S, G), avgDiff(D))

Relational Decision Tree speed(X,S), S>120 yes no job(X, politician) N yes no N knows(X,Y) no yes Y job(Y, politician) no yes Y N

Relational Decision Tree speed(Alice,150), 150>120 yes no job(X, politician) N yes no N knows(X,Y) no yes Y job(Y, politician) no yes Y N

Relational Decision Tree speed(Alice,150), 150>120 yes no job(Alice, politician) N yes no N knows(X,Y) no yes Y job(Y, politician) no yes Y N

Relational Decision Tree speed(Alice,150), 150>120 yes no job(Alice, politician) N yes no N knows(Alice,John) no yes Y job(Y, politician) no yes Y N

Relational Decision Tree speed(Alice,150), 150>120 yes no job(Alice, politician) N yes no N knows(Alice,John) no yes Y job(John, politician) no yes Y N

Relational Probability Trees speed(X,S), S>120 • Use probabilities on the leaves • Can be used to represent the conditional distributions • Can use regression values on leaves to represent regression functions yes no job(X, politician) 0.1 yes no knows(X,Y) 0.2 no yes job(Y, politician) 0.8 no yes 0.8 0.4

Structure Learning Problem • Learn the structure of the conditional distributions • Find the parents and the distribution for the target concept avgGrade(S, G) avgDiff(S, D) IQ(S, I) level(P, L) satisfaction(S, B)

Relational Tree Learning adviser(X) paper(X, Y) student(X) student(X) student(X) = T student(X) = F paper(X,Y) 0.25 -0.9 paper(X,Y) = T paper(X,Y) = F 0.7 -0.2

Functional Gradient Boosting • Sequentially learn models where each subsequent model corrects the previous model ψm Data Residues = Induce - Initial Model Predictions + + Iterate + + + + Final Model = … Natarajan et al MLJ’12

Boosting Algorithm For each gradient step m=1 to M For each query predicate, P For each example, x Generate trainset using previous model, Fm-1 Compute gradient for x Learn a regression function, Tm,p Add <x, gradient(x)> to trainset Add Tm,p to the model, Fm Set Fm as current model

UW-CSE • Predict advisedBy relation • Given student, professor, courseTA, courseProf, etc relations • 5-fold cross validation http://pages.cs.wisc.edu/~tushar/rdnboost/index.html

CARDIA • Family history, medical history, physical activity, nutrient intake, obesity questions, pysochosocial, pulmonary function etc • Goal is to identify risk factors in early adulthoodthat causes serious cardio-vascular issues in older adults • Extremely rich dataset with 25 years of information S. Natarajan , J. Carr

Results

Imitation Learning • Expert agent performs actions (trajectories) • Goal: Learn a policy from these trajectories to suggest actions based on current state Natarajan et al. IJCAI’11

Gridworld domain Robocup domain

Alzheimer's Research • AD – Progressive neurodegenerative condition resulting in loss of cognitive abilities and memory • MRI – neuroimaging method • Visualization of brain anatomy • Humans are not very good at identifying people with AD, especially before cognitive decline • MRI data – major source for distinguishing AD vs CN (Cognitively normal) or MCI vs CN Natarajan et al. Under review

Propositional Models (with AAL)

Conclusion • Statistical Relational Learning combines first-order logic with probabilistic models • Relational trees used to represent conditional distributions • Boosting trees can be used to efficiently learn structure of SRL models

Statistical Relational Learning