1 / 94

Graphical Models

Graphical Models. Tamara L Berg CSE 595 Words & Pictures. Announcements. HW3 online tonight Start thinking about project ideas Project proposals in class Oct 30 Come to office hours Oct 23-25 to discuss ideas Projects should be completed in groups of 3

denton
Download Presentation

Graphical Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Graphical Models Tamara L Berg CSE 595 Words & Pictures

  2. Announcements HW3 online tonight Start thinking about project ideas Project proposals in class Oct 30 Come to office hours Oct 23-25 to discuss ideas Projects should be completed in groups of 3 Topics are open to your choice, but need to involve some text and some visual analysis.

  3. Project Types • Implement an existing paper • Do something new • Do something related to your research

  4. Data Sources • Flickr • API provided for search. Example code to retrieve images and associated textual meta-data available here: http://graphics.cs.cmu.edu/projects/im2gps/ • SBU Captioned Photo Dataset • 1 million user captioned photographs. Data links available here: http://dsl1.cewit.stonybrook.edu/~vicente/sbucaptions/ • ImageClef • 20,000 images with associated segmentations, tags and descriptions available here: http://www.imageclef.org/ • Scrape data from your favorite web source (e.g. Wikipedia, pinterest, facebook) • Rip data from a movie/tv show and find associated text scripts • Sign language • 6000 frames continuous signing sequence, for 296 frames, the position of the left and right, upper arm, lower arm and hand were manually segmented http://www.robots.ox.ac.uk/~vgg/data/sign_language/index.html

  5. Generative vs Discriminative Discriminative version – build a classifier to discriminate between monkeys and non-monkeys. P(monkey|image)

  6. Generative vs Discriminative Generative version - build a model of the joint distribution. P(image,monkey)

  7. Generative vs Discriminative Can use Bayes rule to compute p(monkey|image) if we know p(image,monkey)

  8. Generative vs Discriminative Can use Bayes rule to compute p(monkey|image) if we know p(image,monkey) Generative Discriminative

  9. Talk Outline 1. Quick introduction to graphical models 2. Examples: - Naïve Bayes, pLSA, LDA

  10. Slide from Dan Klein

  11. Random Variables Random variables Let be a realization of

  12. Random Variables Random variables Let be a realization of A random variable is some aspect of the world about which we (may) have uncertainty. Random variables can be: Binary (e.g. {true,false}, {spam/ham}), Take on a discrete set of values (e.g. {Spring, Summer, Fall, Winter}), Or be continuous (e.g. [0 1]).

  13. Joint Probability Distribution Random variables Let be a realization of Joint Probability Distribution: Also written Gives a real value for all possible assignments.

  14. Queries Also written Joint Probability Distribution: Given a joint distribution, we can reason about unobserved variables given observations (evidence): Stuff you care about Stuff you already know

  15. Representation Also written Joint Probability Distribution: One way to represent the joint probability distribution for discrete is as an n-dimensional table, each cell containing the probability for a setting of X. This would have entries if each ranges over values.

  16. Representation Also written Joint Probability Distribution: One way to represent the joint probability distribution for discrete is as an n-dimensional table, each cell containing the probability for a setting of X. This would have entries if each ranges over values. Graphical Models!

  17. Representation Also written Joint Probability Distribution: Graphical models represent joint probability distributions more economically, using a set of “local” relationships among variables.

  18. Graphical Models Graphical models offer several useful properties: 1. They provide a simple way to visualize the structure of a probabilistic model and can be used to design and motivate new models. 2. Insights into the properties of the model, including conditional independence properties, can be obtained by inspection of the graph. 3. Complex computations, required to perform inference and learning in sophisticated models, can be expressed in terms of graphical manipulations, in which underlying mathematical expressions are carried along implicitly. from Chris Bishop

  19. Main kinds of models • Undirected (also called Markov Random Fields) - links express constraints between variables. • Directed (also called Bayesian Networks) - have a notion of causality -- one can regard an arc from A to B as indicating that A "causes" B.

  20. Main kinds of models • Undirected (also called Markov Random Fields) - links express constraints between variables. • Directed (also called Bayesian Networks) - have a notion of causality -- one can regard an arc from A to B as indicating that A "causes" B.

  21. Directed Graphical Models Directed Graph, G = (X,E) Nodes Edges • Each node is associated with a random variable

  22. Directed Graphical Models Directed Graph, G = (X,E) Nodes Edges • Each node is associated with a random variable • Definition of joint probability in a graphical model: • where are the parents of

  23. Example

  24. Example Joint Probability:

  25. Example 0 0 0 0 1 1 1 1 0 0 0 0 1 0 1 1 1 1 0 1 0 0 1 1

  26. Conditional Independence Independence: Conditional Independence: Or,

  27. Conditional Independence

  28. Conditional Independence By Chain Rule (using the usual arithmetic ordering)

  29. Example Joint Probability:

  30. Conditional Independence By Chain Rule (using the usual arithmetic ordering) Joint distribution from the example graph:

  31. Conditional Independence By Chain Rule (using the usual arithmetic ordering) Joint distribution from the example graph: Missing variables in the local conditional probability functions correspond to missing edges in the underlying graph. Removing an edge into node i eliminates an argument from the conditional probability factor

  32. Observations • Graphs can have observed (shaded) and unobserved nodes. If nodes are always unobserved they are called hidden or latent variables • Probabilistic inference in graphical models is the problem of computing a conditional probability distribution over the values of some of the nodes (the “hidden” or “unobserved” nodes), given the values of other nodes (the “evidence” or “observed” nodes).

  33. Inference – computing conditional probabilities Conditional Probabilities: Marginalization:

  34. Inference Algorithms • Exact algorithms • Elimination algorithm • Sum-product algorithm • Junction tree algorithm • Sampling algorithms • Importance sampling • Markov chain Monte Carlo • Variational algorithms • Mean field methods • Sum-product algorithm and variations • Semidefinite relaxations

  35. Talk Outline 1. Quick introduction to graphical models 2. Examples: - Naïve Bayes, pLSA, LDA

  36. A Simple Example – Naïve Bayes C – Class F - Features We only specify (parameters): prior over class labels how each feature depends on the class

  37. A Simple Example – Naïve Bayes C – Class F - Features We only specify (parameters): prior over class labels how each feature depends on the class

  38. A Simple Example – Naïve Bayes C – Class F - Features We only specify (parameters): prior over class labels how each feature depends on the class

  39. Slide from Dan Klein

  40. Slide from Dan Klein

  41. Slide from Dan Klein

  42. Percentage of documents in training set labeled as spam/ham Slide from Dan Klein

  43. In the documents labeled as spam, occurrence percentage of each word (e.g. # times “the” occurred/# total words). Slide from Dan Klein

  44. In the documents labeled as ham, occurrence percentage of each word (e.g. # times “the” occurred/# total words). Slide from Dan Klein

  45. Classification The class that maximizes:

  46. Classification • In practice • Multiplying lots of small probabilities can result in floating point underflow

  47. Classification • In practice • Multiplying lots of small probabilities can result in floating point underflow • Since log(xy) = log(x) + log(y), we can sum log probabilities instead of multiplying probabilities.

  48. Classification • In practice • Multiplying lots of small probabilities can result in floating point underflow • Since log(xy) = log(x) + log(y), we can sum log probabilities instead of multiplying probabilities. • Since log is a monotonic function, the class with the highest score does not change.

  49. Classification • In practice • Multiplying lots of small probabilities can result in floating point underflow • Since log(xy) = log(x) + log(y), we can sum log probabilities instead of multiplying probabilities. • Since log is a monotonic function, the class with the highest score does not change. • So, what we usually compute in practice is:

  50. Naïve Bayes for modeling text/metadata topics

More Related