1 / 70

Statistical Learning from Relational Data

Statistical Learning from Relational Data. Daphne Koller Stanford University Joint work with many many people. Relational Data is Everywhere. The web Webpages (& the entities they represent), hyperlinks Social networks People, institutions, friendship links Biological data

natara
Download Presentation

Statistical Learning from Relational Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistical Learning from Relational Data Daphne Koller Stanford University Joint work with many many people

  2. Relational Data is Everywhere • The web • Webpages (& the entities they represent), hyperlinks • Social networks • People, institutions, friendship links • Biological data • Genes, proteins, interactions, regulation • Bibliometrics • Papers, authors, journals, citations • Corporate databases • Customers, products, transactions

  3. Relational Data is Different • Data instances not independent • Topics of linked webpages are correlated • Data instances are not identically distributed: • Heterogeneous instances (papers, authors) No IID assumption  This is a good thing 

  4. New Learning Tasks • Collective classification of related instances • Labeling an entire website of related webpages • Relational clustering • Finding coherent clusters in the genome • Link prediction & classification • Predicting when two people are likely to be friends • Pattern detection in network of related objects • Finding groups (research groups, terrorist groups)

  5. Probabilistic Models • Uncertainty model: • space of “possible worlds”; • probability distribution over this space. • Worlds: often defined via a set of state variables • medical diagnosis: diseases, symptoms, findings, … each world: an assignment of values to variables • Number of worlds is exponential in # of vars • 2n if we have n binary variables

  6. Outline • Relational Bayesian networks* • Relational Markov networks • Collective Classification • Relational clustering * with Avi Pfeffer, Nir Friedman, Lise Getoor

  7. CPD P(G|D,I) A B C Bayesian Networks Difficulty Intelligence Grade nodes = variables edges = direct influence SAT Job Graph structure encodes independence assumptions: Letter conditionally independent of Intelligence given Grade

  8. Intell_Jane Diffic_CS101 Intell_George Diffic_CS101 Grade_Jane_CS101 Grade_George_CS101 Intell_George Diffic_Geo101 Grade_George_Geo101 Bayesian Networks: Problem • Bayesian nets use propositional representation • Real world has objects, related to each other Intelligence Difficulty These “instances” are not independent A C Grade

  9. Registration Grade Satisfaction Relational Schema • Specifies types of objects in domain, attributes of each type of object & types of relations between objects Classes Student Professor Intelligence Teaching-Ability Teach Take Attributes Relations In Course Difficulty

  10. Teaching-ability Teaching-ability Intelligence Welcome to Geo101 Difficulty Intelligence Difficulty St. Nordaf University World  Prof. Jones Prof. Smith Teaches Teaches Grade In-course Registered Satisfac George Grade Registered Satisfac In-course Welcome to CS101 Grade Registered Jane Satisfac In-course

  11. Professor Teaching-Ability Student Intelligence Course Difficulty A B C Reg Grade Satisfaction Relational Bayesian Networks • Universals: Probabilistic patterns hold for all objects in class • Locality: Represent direct probabilistic dependencies • Links define potential interactions [K. & Pfeffer; Poole; Ngo & Haddawy]

  12. Teaching-ability Teaching-ability Grade Intelligence Satisfac Welcome to Geo101 Grade Difficulty Satisfac Intelligence Grade Difficulty Satisfac RBN Semantics • Ground model: • variables: attributes of all objects • dependencies: determined by relational links & template model Prof. Jones Prof. Smith George Welcome to Welcome to CS101 CS101 Jane

  13. C Welcome to Geo101 Welcome to A low high CS101 The Web of Influence easy / hard low / high

  14. Outline • Relational Bayesian networks* • Relational Markov networks† • Collective Classification • Relational clustering * with Avi Pfeffer, Nir Friedman, Lise Getoor † with Ben Taskar, Pieter Abbeel

  15. Why Undirected Models? • Symmetric, non-causal interactions • E.g., web: categories of linked pages are correlated • Cannot introduce direct edges because of cycles • Patterns involving multiple entities • E.g., web: “triangle” patterns • Directed edges not appropriate • “Solution”: Impose arbitrary direction • Not clear how to parameterize CPD for variables involved in multiple interactions • Very difficult within a class-based parameterization [Taskar, Abbeel, K. 2001]

  16. Template potential Markov Networks James Mary Kyle Noah Laura

  17. Student1 Reg1 Template potential Intelligence Grade Course Study Group Difficulty Reg2 Student2 Grade Intelligence Relational Markov Networks • Universals: Probabilistic patterns hold for all groups of objects • Locality: Represent local probabilistic dependencies • Sets of links give us possible interactions

  18. Welcome to Geo101 Difficulty Difficulty RMN Semantics Intelligence Grade Geo Study Group George Grade Welcome to CS101 Intelligence Grade Jane CS Study Group Grade Intelligence Jill

  19. Outline • Relational Bayesian Networks • Relational Markov Networks • Collective Classification* • Discriminative training • Web page classification • Link prediction • Relational clustering * with Ben Taskar, Carlos Guestrin, Ming Fai Wong, Pieter Abbeel

  20. Student Course Reg Collective Classification Probabilistic Relational Model Training Data Features: .x Labels: .y* Learning Model Structure New Data Conclusions • Train on one year of student intelligence, course difficulty, and grades • Given only grades in following year, predict all students’ intelligence Inference Features: ’.x Labels: ’.y Example:

  21. Student1 Reg1 Intelligence Grade Course Study Group Difficulty Reg2 Student2 Grade Intelligence Learning RMN Parameters Parameterize potentials as log-linear model Template potential 

  22. Max Likelihood Estimation We don’t care about the joint distribution P(.x, .y) Estimation Classification maximizew argmaxy .x .y*

  23. Tom Mitchell Professor Project-of WebKB Project Advisor-of Member Sean Slattery Student Web  KB [Craven et al.]

  24. Web Classification Experiments • WebKB dataset • Four CS department websites • Bag of words on each page • Links between pages • Anchor text for links • Experimental setup • Trained on three universities • Tested on fourth • Repeated for all four combinations

  25. Category ... WordN Word1 Standard Classification Page Categories: faculty course project student other Professor department extract information computer science machine learning …

  26. Category ... ... LinkWordN WordN Word1 0.18 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 Logistic Standard Classification Page working with Tom Mitchell … Discriminatively trained naïve Markov = Logistic Regression test set error • 4-fold CV: • Trained on 3 universities • Tested on 4th

  27. Power of Context Professor? Student? Post-doc?

  28. Compatibility (From,To) FT To- Page From- Page Category Category ... ... WordN WordN Word1 Word1 Link Collective Classification

  29. To- Page From- Page Category Category ... ... WordN WordN Word1 Word1 Link 0.18 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 Logistic Links Collective Classification Classify all pages collectively, maximizing the joint label probability test set error [Taskar, Abbeel, K., 2002]

  30. More Complex Structure

  31. Students Faculty W1 C Wn S S Courses More Complex Structure

  32. 0.18 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 Collective Classification: Results 35.4% error reduction over logistic test set error [Taskar, Abbeel, K., 2002] Logistic Links Section Link+Section

  33. Max Conditional Likelihood We don’t care about the conditional distribution P(.y|.x) Estimation Classification maximizew argmaxy .x .y*

  34. Estimation Classification maximize ||w||=1 argmaxy .x .y* Max Margin Estimation What we really want: correct class labels Quadratic program  margin # labeling mistakes in y Exponentially many constraints  [Taskar, Guestrin, K., 2003] (see also [Collins, 2002; Hoffman 2003])

  35. Max Margin Markov Networks • We use structure of Markov network to provide equivalent formulation of QP • Exponential only in tree width of network • Complexity = max-likelihood classification • Can solve approximately in networks where induced width is too large • Analogous to loopy belief propagation • Can use kernel-based features! • SVMs meet graphical models [Taskar, Guestrin, K., 2003]

  36. WebKB Revisited 16.1% relative reduction in error relative to cond. likelihood RMNs

  37. Member Advisor-of Member Predicting Relationships Tom Mitchell Professor WebKB Project Even more interesting: relationships between objects Sean Slattery Student

  38. Introduce exists/type attribute for each potential link Learn discriminative model for this attribute Collectively predict its value in new world Predicting Relations 72.9% error reduction over flat To- Page From- Page Category Category ... ... Word1 WordN Word1 WordN Relation Exists/ Type ... LinkWordN LinkWord1 [Taskar, Wong, Abbeel, K., 2003]

  39. Outline • Relational Bayesian Networks • Relational Markov Networks • Collective Classification • Relational clustering • Movie data* • Biological data† * with Ben Taskar, Eran Segal † with Eran Segal, Nir Friedman, Aviv Regev, Dana Pe’er, Haidong Wang, Micha Shapira, David Botstein

  40. Student Course Reg Relational Clustering Probabilistic Relational Model Unlabeled Relational Data Learning Clustering of instances Model Structure • Given only students’ grades, cluster similar students Example:

  41. Learning w. Missing Data: EM • EM Algorithm applies essentially unchanged • E-step computes expected sufficient statistics, aggregated over all objects in class • M-step uses ML (or MAP) parameter estimation • Key difference: • In general, the hidden variables are not independent • Computation of expected sufficient statistics requires inference over entire network

  42. Students Courses A B C P(Registration.Grade | Course.Difficulty, Student.Intelligence) Learning w. Missing Data: EM [Dempster et al. 77] low / high easy / hard

  43. Movie Data Internet Movie Database http://www.imdb.com

  44. Actor Director Movie Rating Genres #Votes Year MPAA Rating Discovering Hidden Types Learn model using EM Type Type Type [Taskar, Segal, K., 2001]

  45. Directors Movies Actors Alfred Hitchcock Stanley Kubrick David Lean Milos Forman Terry Gilliam Francis Coppola Wizard of Oz Cinderella Sound of Music The Love Bug Pollyanna The Parent Trap Mary Poppins Swiss Family Robinson Sylvester Stallone Bruce Willis Harrison Ford Steven Seagal Kurt Russell Kevin Costner Jean-Claude Van Damme Arnold Schwarzenegger Terminator 2 Batman Batman Forever GoldenEye Starship Troopers Mission: Impossible Hunt for Red October Steven Spielberg Tim Burton Tony Scott James Cameron John McTiernan Joel Schumacher Anthony Hopkins Robert De Niro Tommy Lee Jones Harvey Keitel Morgan Freeman Gary Oldman … … Discovering Hidden Types [Taskar, Segal, K., 2001]

  46. Gene 1 Gene 2 Coding Coding Control Control RNA Protein Swi5 Transcription factor Biology 101: Gene Expression Swi5 DNA Cells express different subsets of their genes in different tissues and under different conditions

  47. Gene Expression Microarrays • Measure mRNA level for all genes in one condition • Hundreds of experiments • Highly noisy Expression of gene i in experiment j Experiments Induced Genes Repressed

  48. Clustering Standard Analysis • Cluster genes by similarity of expression profiles • Manually examine clusters to understand what’s common to genes in cluster

  49. Gene Experiment Properties of Experiment j Properties of Gene i Attributes Attributes Expression level of Gene i in Experiment j Level Expression General Approach • Expression level is a function of gene properties and experiment properties • Learn model that best explains the data • Observed properties: gene sequence, array condition, … • Hidden properties: gene cluster • Assignment to hidden variables (e.g., module assignment) • Expression level as function of properties

  50. g.C P(Ei.L| g.C) Naïve Bayes 1 g.C 0 CPD 1 CPD 2 CPD k 2 0 g.E1 g.E2 g.Ek 3 0 Clustering as a PRM Gene Experiment Cluster ID Level Expression

More Related