Graphical models
Download
1 / 94

Graphical Models - PowerPoint PPT Presentation


  • 86 Views
  • Uploaded on

Graphical Models. Tamara L Berg CSE 595 Words & Pictures. Announcements. HW3 online tonight Start thinking about project ideas Project proposals in class Oct 30 Come to office hours Oct 23-25 to discuss ideas Projects should be completed in groups of 3

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Graphical Models' - denton


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Graphical models

Graphical Models

Tamara L Berg

CSE 595 Words & Pictures


Announcements
Announcements

HW3 online tonight

Start thinking about project ideas

Project proposals in class Oct 30

Come to office hours Oct 23-25 to discuss ideas

Projects should be completed in groups of 3

Topics are open to your choice, but need to involve some text and some visual analysis.


Project types
Project Types

  • Implement an existing paper

  • Do something new

  • Do something related to your research


Data sources
Data Sources

  • Flickr

    • API provided for search. Example code to retrieve images and associated textual meta-data available here: http://graphics.cs.cmu.edu/projects/im2gps/

  • SBU Captioned Photo Dataset

    • 1 million user captioned photographs. Data links available here: http://dsl1.cewit.stonybrook.edu/~vicente/sbucaptions/

  • ImageClef

    • 20,000 images with associated segmentations, tags and descriptions available here: http://www.imageclef.org/

  • Scrape data from your favorite web source (e.g. Wikipedia, pinterest, facebook)

  • Rip data from a movie/tv show and find associated text scripts

  • Sign language

    • 6000 frames continuous signing sequence, for 296 frames, the position of the left and right, upper arm, lower arm and hand were manually segmented http://www.robots.ox.ac.uk/~vgg/data/sign_language/index.html


Generative vs discriminative
Generative vs Discriminative

Discriminative version – build a classifier to discriminate between monkeys and non-monkeys.

P(monkey|image)


Generative vs discriminative1
Generative vs Discriminative

Generative version - build a model of the joint distribution.

P(image,monkey)


Generative vs discriminative2
Generative vs Discriminative

Can use Bayes rule to compute p(monkey|image) if we know p(image,monkey)


Generative vs discriminative3
Generative vs Discriminative

Can use Bayes rule to compute p(monkey|image) if we know p(image,monkey)

Generative

Discriminative


Talk outline
Talk Outline

1. Quick introduction to graphical models

2. Examples:

- Naïve Bayes, pLSA, LDA



Random variables
Random Variables

Random variables

Let

be a realization of


Random variables1
Random Variables

Random variables

Let

be a realization of

A random variable is some aspect of the world about which we (may) have uncertainty.

Random variables can be:

Binary (e.g. {true,false}, {spam/ham}),

Take on a discrete set of values

(e.g. {Spring, Summer, Fall, Winter}),

Or be continuous (e.g. [0 1]).


Joint probability distribution
Joint Probability Distribution

Random variables

Let

be a realization of

Joint Probability Distribution:

Also written

Gives a real value for all possible assignments.


Queries
Queries

Also written

Joint Probability Distribution:

Given a joint distribution, we can reason about unobserved variables given observations (evidence):

Stuff you care about

Stuff you already know


Representation
Representation

Also written

Joint Probability Distribution:

One way to represent the joint probability distribution for discrete is as an n-dimensional table, each cell containing the probability for a setting of X. This would have entries if each ranges over values.


Representation1
Representation

Also written

Joint Probability Distribution:

One way to represent the joint probability distribution for discrete is as an n-dimensional table, each cell containing the probability for a setting of X. This would have entries if each ranges over values.

Graphical Models!


Representation2
Representation

Also written

Joint Probability Distribution:

Graphical models represent joint probability distributions more economically, using a set of “local” relationships among variables.


Graphical models1
Graphical Models

Graphical models offer several useful properties:

1. They provide a simple way to visualize the structure of a probabilistic model and can be used to design and motivate new models.

2. Insights into the properties of the model, including conditional independence properties, can be obtained by inspection of the graph.

3. Complex computations, required to perform inference and learning in sophisticated models, can be expressed in terms of graphical manipulations, in which underlying mathematical expressions are carried along implicitly.

from Chris Bishop


Main kinds of models
Main kinds of models

  • Undirected (also called Markov Random Fields) - links express constraints between variables.

  • Directed (also called Bayesian Networks) - have a notion of causality -- one can regard an arc from A to B as indicating that A "causes" B.


Main kinds of models1
Main kinds of models

  • Undirected (also called Markov Random Fields) - links express constraints between variables.

  • Directed (also called Bayesian Networks) - have a notion of causality -- one can regard an arc from A to B as indicating that A "causes" B.


Directed graphical models
Directed Graphical Models

Directed Graph, G = (X,E)

Nodes

Edges

  • Each node is associated with a random variable


Directed graphical models1
Directed Graphical Models

Directed Graph, G = (X,E)

Nodes

Edges

  • Each node is associated with a random variable

  • Definition of joint probability in a graphical model:

  • where are the parents of



Example1
Example

Joint Probability:


Example2
Example

0

0

0

0

1

1

1

1

0

0

0

0

1

0

1

1

1

1

0

1

0

0

1

1


Conditional independence
Conditional Independence

Independence:

Conditional Independence:

Or,



Conditional independence2
Conditional Independence

By Chain Rule (using the usual arithmetic ordering)


Example3
Example

Joint Probability:


Conditional independence3
Conditional Independence

By Chain Rule (using the usual arithmetic ordering)

Joint distribution from the example graph:


Conditional independence4
Conditional Independence

By Chain Rule (using the usual arithmetic ordering)

Joint distribution from the example graph:

Missing variables in the local conditional probability functions correspond to missing edges in the underlying graph.

Removing an edge into node i eliminates an argument from the conditional probability factor


Observations
Observations

  • Graphs can have observed (shaded) and unobserved nodes. If nodes are always unobserved they are called hidden or latent variables

  • Probabilistic inference in graphical models is the problem of computing a conditional probability distribution over the values of some of the nodes (the “hidden” or “unobserved” nodes), given the values of other nodes (the “evidence” or “observed” nodes).


Inference computing conditional probabilities
Inference – computing conditional probabilities

Conditional Probabilities:

Marginalization:


Inference algorithms
Inference Algorithms

  • Exact algorithms

    • Elimination algorithm

    • Sum-product algorithm

    • Junction tree algorithm

  • Sampling algorithms

    • Importance sampling

    • Markov chain Monte Carlo

  • Variational algorithms

    • Mean field methods

    • Sum-product algorithm and variations

    • Semidefinite relaxations


Talk outline1
Talk Outline

1. Quick introduction to graphical models

2. Examples:

- Naïve Bayes, pLSA, LDA


A simple example na ve bayes
A Simple Example – Naïve Bayes

C – Class

F - Features

We only specify (parameters):

prior over class labels

how each feature depends on the class


A simple example na ve bayes1
A Simple Example – Naïve Bayes

C – Class

F - Features

We only specify (parameters):

prior over class labels

how each feature depends on the class


A simple example na ve bayes2
A Simple Example – Naïve Bayes

C – Class

F - Features

We only specify (parameters):

prior over class labels

how each feature depends on the class






In the documents labeled as spam, occurrence percentage of each word

(e.g. # times “the” occurred/# total words).

Slide from Dan Klein


In the documents labeled as ham, occurrence percentage of each word

(e.g. # times “the” occurred/# total words).

Slide from Dan Klein


Classification
Classification each word

The class that maximizes:


Classification1
Classification each word

  • In practice

    • Multiplying lots of small probabilities can result in floating point underflow


Classification2
Classification each word

  • In practice

    • Multiplying lots of small probabilities can result in floating point underflow

    • Since log(xy) = log(x) + log(y), we can sum log probabilities instead of multiplying probabilities.


Classification3
Classification each word

  • In practice

    • Multiplying lots of small probabilities can result in floating point underflow

    • Since log(xy) = log(x) + log(y), we can sum log probabilities instead of multiplying probabilities.

    • Since log is a monotonic function, the class with the highest score does not change.


Classification4
Classification each word

  • In practice

    • Multiplying lots of small probabilities can result in floating point underflow

    • Since log(xy) = log(x) + log(y), we can sum log probabilities instead of multiplying probabilities.

    • Since log is a monotonic function, the class with the highest score does not change.

    • So, what we usually compute in practice is:


Na ve bayes for modeling text metadata topics
Naïve each word Bayes for modeling text/metadata topics


Harvesting image databases from the web schroff f criminisi a and zisserman a
Harvesting Image Databases from the Web each word Schroff, F. , Criminisi, A. and Zisserman, A.

Download images from the web via a search query (e.g. penguin).

Re-rank images using a naïve Bayes model trained on text surrounding the images and meta-data features (image alt tag, image title tag, image filename).

Top ranked images used to train an SVM classifier to further improve ranking.


Results
Results each word


Results1
Results each word


Na ve bayes on images
Naïve each word Bayes on images


Visual Categorization with Bags of each word KeypointsGabriella Csurka, Christopher R. Dance, Lixin Fan, JuttaWillamowski, Cédric Bray


Method
Method each word

Steps:

  • Detect and describe image patches

  • Assign patch descriptors to a set of predetermined clusters (a vocabulary) with a vector quantization algorithm

  • Construct a bag of keypoints, which counts the number of patches assigned to each cluster

  • Apply a multi-class classifier (naïve Bayes), treating the bag of keypoints as the feature vector, and thus determine which category or categories to assign to the image.


Na ve bayes
Naïve each word Bayes

C – Class

F - Features

We only specify (parameters):

prior over class labels

how each feature depends on the class


Naive bayes parameters
Naive each word Bayes Parameters

Problem: Categorize images as one of 7 object classes using Naïve Bayes classifier:

  • Classes: object categories (face, car, bicycle, etc)

  • Features – Images represented as a histogram where bins are the cluster centers or visual word vocabulary. Features are vocabulary counts.

    treated as uniform.

    learned from training data – images labeled with category.


Results2
Results each word


Talk outline2
Talk Outline each word

1. Quick introduction to graphical models

2. Examples:

- Naïve Bayes, pLSA, LDA


pLSA each word


pLSA each word


Joint Probability: each word

Marginalizing over topics determines the conditional

probability:


Fitting the model
Fitting the model each word

Need to:

Determine the topic vectors common to all documents.

Determine the mixture components specific to each document.

Goal: a model that gives high probability to the words that appear in the corpus.

Maximum likelihood estimation of the parameters is obtained by maximizing the objective function:


Plsa on images
pLSA each word on images


Discovering objects and their location in images each word Josef Sivic, Bryan C. Russell, Alexei A. Efros, Andrew Zisserman, William T. Freeman

Documents – Images

Words – visual words (vector quantized SIFT descriptors)

Topics – object categories

Images are modeled as a mixture of topics (objects).


Goals
Goals each word

They investigate three areas:

  • (i) topic discovery, where categories are discovered by pLSA clustering on all available images.

  • (ii) classification of unseen images, where topics corresponding to object categories are learnt on one set of images, and then used to determine the object categories present in another set.

  • (iii) object detection, where you want to determine the location and approximate segmentation of object(s) in each image.


I topic discovery
( each word i) Topic Discovery

Most likely words for 4 learnt topics (face, motorbike, airplane, car)


Ii image classification
(ii) Image Classification each word

Confusion table for unseen test images against pLSA trained on images containing four object categories, but no background images.


Ii image classification1
(ii) Image Classification each word

Confusion table for unseen test images against pLSA trained on images containing four object categories, and background images. Performance is not quite as good.





Talk outline3
Talk Outline each word

1. Quick introduction to graphical models

2. Examples:

- Naïve Bayes, pLSA, LDA


Lda david m blei andrew y ng michael jordan
LDA each word David M Blei, Andrew Y Ng, Michael Jordan


LDA each word

Per-document topic proportions

Per-word topic assignment

Observed word


LDA each word

pLSA

Per-document topic proportions

Per-word topic assignment

Observed word


LDA each word

Dirichlet parameter

Per-document topic proportions

Per-word topic assignment

Observed word


LDA each word

topics

Dirichlet parameter

Per-document topic proportions

Per-word topic assignment

Observed word



Joint distribution
Joint Distribution each word

Joint distribution:


Lda on text
LDA on text each word

Topic discovery from a text corpus. Highly ranked words for 4 topics.


LDA in each word

Animals on the Web

Tamara L Berg, David Forsyth


Animals on the Web Outline: each word

  • Harvest pictures of animals from the web using Google Text Search.

  • Select visual exemplars using text based information LDA

  • Use vision and text cues to extend to similar images.


Text Model each word

Latent Dirichlet Allocation (LDA) on the words in collected web pages to discover 10 latent topics for each category.

Each topic defines a distribution over words. Select the 50 most likely words for each topic.

Example Frog Topics:

1.) frog frogs water tree toad leopard green southern music king irish eggs folk princess river ball range eyes game species legs golden bullfrog session head spring book deep spotted de am free mouse information round poison yellow upon collection nature paper pond re lived center talk buy arrow common prince

2.) frog information january links common red transparent music king water hop tree pictures pond green people available book call press toad funny pottery toads section eggs bullet photo nature march movies commercial november re clear eyed survey link news boston list frogs bull sites butterfly court legs type dot blue


Select exemplars
Select Exemplars each word

Rank images according to whether they have these likely words near the image in the associated page.

Select up to 30 images per topic as exemplars.

1.) frog frogs water tree toad leopard green southern music king irish eggs folk princess river ball range eyes game species legs golden bullfrog session head ...

2.) frog information january links common red transparent music king water hop tree pictures pond green people available book call press ...



A bayesian hierarchical model for learning natural scene categories fei fei li pietro perona
A Bayesian Hierarchical Model for Learning Natural Scene CategoriesFei-Fei Li, PietroPerona

An unsupervised approach to learn and recognize natural scene categories. A scene is represented by a collection of local regions. Each region is represented as part of a “theme” (e.g. rock, grass etc) learned from data.


Generating scenes
Generating Scenes Categories

1.) Choose a category label (e.g. mountain scene).

2.) Given the mountain class, draw a probability vector that will determine what intermediate theme(s) (grass rock etc) to select while generating each patch of the scene.

3.) For creating each patch in the image, first determine a particular theme out of the mixture of possible themes, and then draw a codeword given this theme. For example, if a “rock” theme is selected, this will in turn privilege some codewords that occur more frequently in rocks (e.g. slanted lines).

4.) Repeat the process of drawing both the theme and codeword many times, eventually forming an entire bag of patches that would construct a scene of mountains.


Results modeling themes
Results – modeling themes Categories

Left - distribution of the 40 intermediate themes.

Right - distribution of codewords as well as the appearance of 10 codewords selected from the top 20 most likely codewords for this category model.


Results modeling themes1
Results – modeling themes Categories

Left - distribution of the 40 intermediate themes.

Right - distribution of codewords as well as the appearance of 10 codewords selected from the top 20 most likely codewords for this category model.


Results scene classification
Results – Scene Classification Categories

correct

incorrect


Results scene classification1
Results – Scene Classification Categories

correct

incorrect


ad