S tatistical R elational L earning: An Introduction - PowerPoint PPT Presentation

s tatistical r elational l earning an introduction n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
S tatistical R elational L earning: An Introduction PowerPoint Presentation
Download Presentation
S tatistical R elational L earning: An Introduction

play fullscreen
1 / 117
S tatistical R elational L earning: An Introduction
89 Views
Download Presentation
wallace-rohan
Download Presentation

S tatistical R elational L earning: An Introduction

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Statistical Relational Learning: An Introduction Lise Getoor University of Maryland, College Park September 5, 2007 Progic 2007

  2. Statistical Relational Learning: An Introduction biased X ^ Lise Getoor University of Maryland, College Park September 5, 2007 Progic 2007

  3. acknowledgements • Statistical Relational Learning (SRL) is a synthesis of ideas of many individuals who have participated in various SRL events, workshops and classes: • Hendrik Blockeel, Mark Craven, James Cussens, Bruce D’Ambrosio, Luc De Raedt, Tom Dietterich, Pedro Domingos, Saso Dzeroski, Peter Flach, Rob Holte, Manfred Jaeger, David Jensen, Kristian Kersting, Daphne Koller, Heikki Mannila, Andrew McCallum Tom Mitchell, Ray Mooney, Stephen Muggleton, Kevin Murphy, Jen Neville, David Page, Avi Pfeffer, Claudia Perlich, David Poole, Foster Provost, Dan Roth, Stuart Russell, Taisuke Sato, Jude Shavlik, Ben Taskar, Lyle Ungar and many others…

  4. Why SRL? • Traditional statistical machine learning approaches assume: • A random sample of homogeneous objects from single relation • Traditional relational learning approaches assume: • No noise or uncertainty in data • Real world data sets: • Multi-relational and heterogeneous • Noisy and uncertain • Statistical Relational Learning (SRL): • newly emerging research area at the intersection of statistical models and relational learning/inductive logic programming • Sample Domains: • web data, social networks, biological data, communication data, customer networks, sensor networks, natural language, vision, …

  5. SRL Theory • Methods that combine expressive knowledge representation formalisms such as relational and first-order logic with principled probabilistic and statistical approaches to inference and learning • Directed Approaches • Semantics based on Bayesian Networks • Frame-based Directed Models • Rule-based Directed Models • Undirected Approaches • Semantics based on Markov Networks • Frame-based Undirected Models • Rule-based Undirected Models • Process-based Approaches

  6. SRL Theory • Methods that combine expressive knowledge representation formalisms such as relational and first-order logic with principled probabilistic and statistical approaches to inference and learning • Directed Approaches • Semantics based on Bayesian Networks • Frame-based Directed Models • Rule-based Directed Models • Undirected Approaches • Semantics based on Markov Networks • Frame-based Undirected Models • Rule-based Undirected Models • Process-based Approaches

  7. Directed Frame-based Approaches • Probabilistic Relational Models (PRMs) • Representation & Inference [Koller & Pfeffer 98, Pfeffer, Koller, Milch &Takusagawa 99, Pfeffer 00] • Learning [Friedman et al. 99, Getoor, Friedman, Koller & Taskar 01 & 02, Getoor 01] • Probabilistic Entity Relation Models (PERs) • Representation [Heckerman, Meek & Koller 04] • Logical syntax for PRMs (PRL) [Getoor & Grant 06]

  8. Probabilistic Relational Models • BN Tutorial • PRMs w/ Attribute Uncertainty • Inference in PRMs • Learning in PRMs • PRMs w/ Structural Uncertainty • PRMs w/ Class Hierarchies

  9. conditional probability table (CPT) S P(Q| W, S) W w s 0.6 0.4 w s 0.3 0.7 w s 0.4 0.6 0.1 0.9 w s Bayesian Networks Smart Good Writer Reviewer Mood Quality nodes = domain variables edges = direct causal influence Review Length Accepted Network structure encodes conditional independencies: I(Review-Length , Good-Writer | Reviewer-Mood)

  10. S W M Q L A BN Semantics • Compact & natural representation: • nodes  k parents  O(2k n) vs. O(2n) params • natural parameters conditional independencies in BN structure local CPTs full joint distribution over domain + =

  11. S W M Q L A Reasoning in BNs • Full joint distribution answers any query • P(event | evidence) • Allows combination of different types of reasoning: • Causal:P(Reviewer-Mood | Good-Writer) • Evidential: P(Reviewer-Mood | not Accepted) • Intercausal: P(Reviewer-Mood | not Accepted, Quality)

  12. mood good writer pissy false 0.9 pissy true 0.1 good false 0.7 good true 0.3 Variable Elimination • To compute factors A factor is a function from values of variables to positive real numbers

  13. Variable Elimination • To compute

  14. Variable Elimination • To compute sum out l

  15. Variable Elimination • To compute new factor

  16. Variable Elimination • To compute multiply factors together then sum out w

  17. Variable Elimination • To compute new factor

  18. Variable Elimination • To compute

  19. Other Inference Algorithms • Exact • Junction Tree [Lauritzen & Spiegelhalter 88] • Cutset Conditioning [Pearl 87] • Approximate • Loopy Belief Propagation [McEliece et al 98] • Likelihood Weighting [Shwe & Cooper 91] • Markov Chain Monte Carlo [eg MacKay 98] • Gibbs Sampling [Geman & Geman 84] • Metropolis-Hastings [Metropolis et al 53, Hastings 70] • Variational Methods [Jordan et al 98]

  20. Learning BNs Structure and Parameters Parameters only Complete Data Incomplete Data See [Heckerman 98] for a general introduction

  21. BN Parameter Estimation • Assume known dependency structure G • Goal: estimate BN parameters q • entries in local probability models, • q is good if it’s likely to generate observed data. • MLE Principle: Choose q* so as to maximize l • Alternative: incorporate a prior

  22. Learning With Complete Data • Fully observed data: data consists of set of instances, each with a value for all BN variables • With fully observed data, we can compute = number of instances with , and • and similarly for other counts • We then estimate

  23. Dealing w/ missing values • Can’t compute • But can use Expectation Maximization (EM) • Given parameter values, can compute expected counts: • Given expected counts, estimate parameters: • Begin with arbitrary parameter values • Iterate these two steps • Converges to local maximum of likelihood this requires BN inference

  24. Structure search • Begin with an empty network • Consider all neighbors reached by a search operator that are acyclic • add an edge • remove an edge • reverse an edge • For each neighbor • compute ML parameter values • compute score(s) = • Choose the neighbor with the highest score • Continue until reach a local maximum

  25. Mini-BN Tutorial Summary • Representation – probability distribution factored according to the BN DAG • Inference – exact + approximate • Learning – parameters + structure

  26. Probabilistic Relational Models • BN Tutorial • PRMs w/ Attribute Uncertainty • Inference in PRMs • Learning in PRMs • PRMs w/ Structural Uncertainty • PRMs w/ Class Hierarchies

  27. Relational Schema Author Review Good Writer Mood Smart Length Paper Quality Accepted Has Review Author of • Describes the types of objects and relations in the world

  28. Probabilistic Relational Model Review Author Smart Mood Good Writer Length Paper Quality Accepted

  29. Paper.Accepted | æ ö ÷ ç Paper.Quality, P ÷ ç ÷ ç Paper.Review.Mood è ø Probabilistic Relational Model Review Author Smart Mood Good Writer Length Paper Quality Accepted

  30. Probabilistic Relational Model Review Author Smart Mood Good Writer Length Paper P(A | Q, M) Q , M f , f 0 . 1 0 . 9 Quality f , t 0 . 2 0 . 8 Accepted t , f 0 . 6 0 . 4 t , t 0 . 7 0 . 3

  31. Primary Keys Foreign Keys Relational Skeleton Paper P1 Author: A1 Review: R1 Review R1 Author A1 Paper P2 Author: A1 Review: R2 Review R2 Author A2 Review R2 Paper P3 Author: A2 Review: R2 Fixed relational skeleton : • set of objects in each class • relations between them

  32. Smart Smart Mood Mood Mood Good Writer Good Writer Length Length Length Quality Quality Quality Accepted Accepted Accepted PRM w/ Attribute Uncertainty Paper P1 Author: A1 Review: R1 Author A1 Review R1 Paper P2 Author: A1 Review: R2 Author A2 Review R2 Paper P3 Author: A2 Review: R2 Review R3 PRM defines distribution over instantiations of attributes

  33. Low Pissy r2.Mood r3.Mood P(A | Q, M) Q , M P2.Quality P3.Quality f , f 0 . 1 0 . 9 f , t 0 . 2 0 . 8 P(A | Q, M) Q , M t , f 0 . 6 0 . 4 f , f 0 . 1 0 . 9 t , t 0 . 7 0 . 3 f , t 0 . 2 0 . 8 t , f 0 . 6 0 . 4 t , t 0 . 7 0 . 3 A Portion of the BN P2.Accepted P3.Accepted

  34. High Low Pissy Pissy r2.Mood r3.Mood P2.Quality P3.Quality A Portion of the BN P(A | Q, M) Q , M f , f 0 . 1 0 . 9 f , t 0 . 2 0 . 8 P2.Accepted t , f 0 . 6 0 . 4 t , t 0 . 7 0 . 3 P3.Accepted

  35. Review R2 Review R3 Review R1 Mood Mood Mood Length Length Length Paper P1 Quality Accepted PRM: Aggregate Dependencies Paper Review Mood Quality Length Accepted

  36. Review R3 Review R1 Review R2 Mood Mood Mood Length Length Length Paper P1 Quality Accepted PRM: Aggregate Dependencies Paper Review Mood Quality Length Accepted P(A | Q, M) Q , M f , f 0 . 1 0 . 9 f , t 0 . 2 0 . 8 t , f 0 . 6 0 . 4 t , t 0 . 7 0 . 3 mode sum, min, max, avg, mode, count

  37. Objects Attributes PRM with AU Semantics Author • Review • R1 Author A1 Paper Paper P1 • Review • R2 Author A2 Review Paper P2 • Review • R3 Paper P3 PRM + relational skeleton = probability distribution over completions I:

  38. Probabilistic Relational Models • BN Tutorial • PRMs w/ Attribute Uncertainty • Inference in PRMs • Learning in PRMs • PRMs w/ Structural Uncertainty • PRMs w/ Class Hierarchies

  39. PRM Inference • Simple idea: enumerate all attributes of all objects • Construct a Bayesian network over all the attributes

  40. Inference Example • Review • R1 Skeleton • Paper • P1 • Review • R2 • Author • A1 • Review • R3 • Paper • P2 • Review • R4 Query is P(A1.good-writer) Evidence is P1.accepted = T, P2.accepted = T

  41. P1.Quality P2.Quality R1.Mood R3.Mood R2.Mood R4.Mood R1.Length R3.Length R2.Length R4.Length P1.Accepted P2.Accepted PRM Inference: Constructed BN A1.Smart A1.Good Writer

  42. PRM Inference • Problems with this approach: • constructed BN may be very large • doesn’t exploit object structure • Better approach: • reason about objects themselves • reason about whole classes of objects • In particular, exploit: • reuse of inference • encapsulation of objects

  43. R2.Mood R1.Mood R2.Length R1.Length PRM Inference: Interfaces Variables pertaining to R2: inputs and internal attributes A1.Smart A1.Good Writer P1.Quality P1.Accepted

  44. R1.Mood R1.Length PRM Inference: Interfaces Interface: imported and exported attributes A1.Smart A1.Good Writer R2.Mood P1.Quality R2.Length P1.Accepted

  45. P1.Quality P2.Quality R1.Mood R3.Mood R2.Mood R4.Mood R1.Length R3.Length R2.Length R4.Length P1.Accepted P2.Accepted PRM Inference: Encapsulation R1 and R2 are encapsulated inside P1 A1.Smart A1.Good Writer

  46. P1.Quality P2.Quality R1.Mood R3.Mood R2.Mood R4.Mood R1.Length R3.Length R2.Length R4.Length P1.Accepted P2.Accepted PRM Inference: Reuse A1.Smart A1.Good Writer

  47. Structured Variable Elimination Author 1 A1.Smart A1.Good Writer Paper-1 Paper-2

  48. Structured Variable Elimination Author 1 A1.Smart A1.Good Writer Paper-1 Paper-2

  49. Structured Variable Elimination Paper 1 A1.Smart A1.Good Writer Review-1 Review-2 P1.Quality P1.Accepted

  50. Structured Variable Elimination Paper 1 A1.Smart A1.Good Writer Review-1 Review-2 P1.Quality P1.Accepted