Alip automatic linguistic indexing of pictures
This presentation is the property of its rightful owner.
Sponsored Links
1 / 33

ALIP: Automatic Linguistic Indexing of Pictures PowerPoint PPT Presentation


  • 73 Views
  • Uploaded on
  • Presentation posted in: General

ALIP: Automatic Linguistic Indexing of Pictures. Jia Li The Pennsylvania State University. Can a computer do this?. “Building, sky, lake, landscape, Europe, tree”. Outline. Background Statistical image modeling approach The system architecture The image model Experiments

Download Presentation

ALIP: Automatic Linguistic Indexing of Pictures

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Alip automatic linguistic indexing of pictures

ALIP: Automatic Linguistic Indexing of Pictures

Jia Li

The Pennsylvania State University


Can a computer do this

Can a computer do this?

  • “Building, sky, lake, landscape, Europe, tree”


Outline

Outline

  • Background

  • Statistical image modeling approach

    • The system architecture

    • The image model

  • Experiments

  • Conclusions and future work


Image database

Image Database

  • The image database contains categorized images.

  • Each category is annotated with a few words.

    • Landscape, glacier

    • Africa, wildlife

  • Each category of images is referred to as a concept.


A category of images

A Category of Images

Annotation: “man, male, people, cloth, face”


Alip automatic linguistic indexing for pictures

ALIP: Automatic Linguistic Indexing for Pictures

  • Learn relations between annotation words and images using the training database.

  • Profile each category by a statistical image model: 2-D Multiresolution Hidden Markov Model (2-D MHMM).

  • Assess the similarity between an image and a category by its likelihood under the profiling model.


Outline1

Outline

  • Background

  • Statistical image modeling approach

    • The system architecture

    • The image model

  • Experiments

  • Conclusions and future work


Training process

Training Process


Automatic annotation process

Automatic Annotation Process


Training

Training

Training images used to train a concept with

description “man, male, people, cloth, face”


Outline2

Outline

  • Background

  • Statistical image modeling approach

    • The system architecture

    • The image model

  • Experiments

  • Conclusions and future work


2d hmm

2D HMM

Regard an image as a grid. A feature vector is computed for each node.

  • Each node exists in a hidden state.

  • The states are governed by a Markov mesh (a causal Markov random field).

  • Given the state, the feature vector is conditionally independent of other feature vectors and follows a normal distribution.

  • The states are introduced to efficiently model the spatial dependence among feature vectors.

  • The states are not observable, which makes estimation difficult.


2d hmm1

2D HMM

The underlying states are governed by a Markov mesh.

(i’,j’)<(i,j) if i’<i; or i’=i & j’<j

Context: the set of states for (i’, j’): (i’, j’)<(i, j)


2 d mhmm

2-D MHMM

Filtering, e.g.,

by wavelet transform

  • Incorporate features at multiple resolutions.

  • Provide more flexibility for modeling statistical dependence.

  • Reduce computation by representing context information hierarchically.


2d mhmm

2D MHMM

  • An image is a pyramid grid.

  • A Markovian dependence is assumed across resolutions.

  • Given the state of a parent node, the states of its child nodes follow a Markov mesh with transition probabilities depending on the parent state.


2d mhmm1

2D MHMM

  • First-order Markov dependence across resolutions.


2d mhmm2

2D MHMM

  • The child nodes at resolution r of node (k,l) at resolution r-1:

  • Conditional independence given the parent state:


2 d mhmm1

2-D MHMM

  • Statistical dependence among the states of sibling blocks is characterized by a 2-D HMM.

  • The transition probability depends on:

    • The neighboring states in both directions

    • The state of the parent block


2 d mhmm summary

2-D MHMM (Summary)

  • 2-D MHMM finds “modes” of the feature vectors and characterizes their inter- and intra-scale spatial dependence.


Estimation of 2 d hmm

Estimation of 2-D HMM

  • Parameters to be estimated:

    • Transition probabilities

    • Mean and covariance matrix of each Gaussian distribution

  • EM algorithm is applied for ML estimation.


Em iteration

EM Iteration


Em iteration1

EM Iteration


Computation issues

Computation Issues

An approximation to the

classification EM approach


Annotation process

Annotation Process

  • Rank the categories by the likelihoods of an image to be annotated under their profiling 2-D MHMMs.

  • Select annotation words from those used to describe the top ranked categories.

  • Statistical significance is computed for each candidate word.

  • Words that are unlikely to have appeared by chance are selected.

  • Favor the selection of rare words.


Outline3

Outline

  • Background

  • Statistical image modeling approach

    • The system architecture

    • The image model

  • Experiments

  • Conclusions and future work


Initial experiment

Initial Experiment

  • 600 concepts, each trained with 40 images

  • 15 minutes Pentium CPU time per concept, train only once

  • highly parallelizable algorithm


Preliminary results

Preliminary Results

Computer Prediction: people, Europe, man-made, water

Building, sky, lake, landscape, Europe, tree

People, Europe, female

Food, indoor, cuisine, dessert

Snow, animal, wildlife, sky, cloth, ice, people


More results

More Results


Results using our own photographs

Results: using our own photographs

  • P: Photographer annotation

  • Underlined words: words predicted by computer

  • (Parenthesis): words not in the learned “dictionary” of the computer


Systematic evaluation

Systematic Evaluation

10 classes:

Africa,

beach,

buildings,

buses,

dinosaurs,

elephants,

flowers,

horses,

mountains,

food.


600 class classification

600-class Classification

  • Task: classify a given image to one of the 600 semantic classes

  • Gold standard: the photographer/publisher classification

  • This procedure provides lower-bounds of the accuracy measures because:

    • There can be overlapsof semantics among classes (e.g., “Europe” vs. “France” vs. “Paris”, or, “tigers I” vs. “tigers II”)

    • Training images in the same class may not be visually similar (e.g., the class of “sport events” include different sports and different shooting angles)

  • Result: with 11,200 test images, 15% of the time ALIP selected the exact class as the best choice

    • I.e., ALIP is about 90 times more intelligent than a system with random-drawing system


More information

More Information

  • http://www.stat.psu.edu/~jiali/index.demo.html

  • J. Li, J. Z. Wang, ``Automatic linguistic indexing of pictures by a statistical modeling approach,'' IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(9):1075-1088,2003.


Conclusions

Conclusions

  • Automatic Linguistic Indexing of Pictures

    • Highly challenging

    • Much more to be explored

  • Statistical modeling has shown some success.

  • To be explored:

    • Training image database is not categorized.

    • Better modeling techniques.

    • Real-world applications.


  • Login