Alip automatic linguistic indexing of pictures
Sponsored Links
This presentation is the property of its rightful owner.
1 / 33

ALIP: Automatic Linguistic Indexing of Pictures PowerPoint PPT Presentation


  • 90 Views
  • Uploaded on
  • Presentation posted in: General

ALIP: Automatic Linguistic Indexing of Pictures. Jia Li The Pennsylvania State University. Can a computer do this?. “Building, sky, lake, landscape, Europe, tree”. Outline. Background Statistical image modeling approach The system architecture The image model Experiments

Download Presentation

ALIP: Automatic Linguistic Indexing of Pictures

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


ALIP: Automatic Linguistic Indexing of Pictures

Jia Li

The Pennsylvania State University


Can a computer do this?

  • “Building, sky, lake, landscape, Europe, tree”


Outline

  • Background

  • Statistical image modeling approach

    • The system architecture

    • The image model

  • Experiments

  • Conclusions and future work


Image Database

  • The image database contains categorized images.

  • Each category is annotated with a few words.

    • Landscape, glacier

    • Africa, wildlife

  • Each category of images is referred to as a concept.


A Category of Images

Annotation: “man, male, people, cloth, face”


ALIP: Automatic Linguistic Indexing for Pictures

  • Learn relations between annotation words and images using the training database.

  • Profile each category by a statistical image model: 2-D Multiresolution Hidden Markov Model (2-D MHMM).

  • Assess the similarity between an image and a category by its likelihood under the profiling model.


Outline

  • Background

  • Statistical image modeling approach

    • The system architecture

    • The image model

  • Experiments

  • Conclusions and future work


Training Process


Automatic Annotation Process


Training

Training images used to train a concept with

description “man, male, people, cloth, face”


Outline

  • Background

  • Statistical image modeling approach

    • The system architecture

    • The image model

  • Experiments

  • Conclusions and future work


2D HMM

Regard an image as a grid. A feature vector is computed for each node.

  • Each node exists in a hidden state.

  • The states are governed by a Markov mesh (a causal Markov random field).

  • Given the state, the feature vector is conditionally independent of other feature vectors and follows a normal distribution.

  • The states are introduced to efficiently model the spatial dependence among feature vectors.

  • The states are not observable, which makes estimation difficult.


2D HMM

The underlying states are governed by a Markov mesh.

(i’,j’)<(i,j) if i’<i; or i’=i & j’<j

Context: the set of states for (i’, j’): (i’, j’)<(i, j)


2-D MHMM

Filtering, e.g.,

by wavelet transform

  • Incorporate features at multiple resolutions.

  • Provide more flexibility for modeling statistical dependence.

  • Reduce computation by representing context information hierarchically.


2D MHMM

  • An image is a pyramid grid.

  • A Markovian dependence is assumed across resolutions.

  • Given the state of a parent node, the states of its child nodes follow a Markov mesh with transition probabilities depending on the parent state.


2D MHMM

  • First-order Markov dependence across resolutions.


2D MHMM

  • The child nodes at resolution r of node (k,l) at resolution r-1:

  • Conditional independence given the parent state:


2-D MHMM

  • Statistical dependence among the states of sibling blocks is characterized by a 2-D HMM.

  • The transition probability depends on:

    • The neighboring states in both directions

    • The state of the parent block


2-D MHMM (Summary)

  • 2-D MHMM finds “modes” of the feature vectors and characterizes their inter- and intra-scale spatial dependence.


Estimation of 2-D HMM

  • Parameters to be estimated:

    • Transition probabilities

    • Mean and covariance matrix of each Gaussian distribution

  • EM algorithm is applied for ML estimation.


EM Iteration


EM Iteration


Computation Issues

An approximation to the

classification EM approach


Annotation Process

  • Rank the categories by the likelihoods of an image to be annotated under their profiling 2-D MHMMs.

  • Select annotation words from those used to describe the top ranked categories.

  • Statistical significance is computed for each candidate word.

  • Words that are unlikely to have appeared by chance are selected.

  • Favor the selection of rare words.


Outline

  • Background

  • Statistical image modeling approach

    • The system architecture

    • The image model

  • Experiments

  • Conclusions and future work


Initial Experiment

  • 600 concepts, each trained with 40 images

  • 15 minutes Pentium CPU time per concept, train only once

  • highly parallelizable algorithm


Preliminary Results

Computer Prediction: people, Europe, man-made, water

Building, sky, lake, landscape, Europe, tree

People, Europe, female

Food, indoor, cuisine, dessert

Snow, animal, wildlife, sky, cloth, ice, people


More Results


Results: using our own photographs

  • P: Photographer annotation

  • Underlined words: words predicted by computer

  • (Parenthesis): words not in the learned “dictionary” of the computer


Systematic Evaluation

10 classes:

Africa,

beach,

buildings,

buses,

dinosaurs,

elephants,

flowers,

horses,

mountains,

food.


600-class Classification

  • Task: classify a given image to one of the 600 semantic classes

  • Gold standard: the photographer/publisher classification

  • This procedure provides lower-bounds of the accuracy measures because:

    • There can be overlapsof semantics among classes (e.g., “Europe” vs. “France” vs. “Paris”, or, “tigers I” vs. “tigers II”)

    • Training images in the same class may not be visually similar (e.g., the class of “sport events” include different sports and different shooting angles)

  • Result: with 11,200 test images, 15% of the time ALIP selected the exact class as the best choice

    • I.e., ALIP is about 90 times more intelligent than a system with random-drawing system


More Information

  • http://www.stat.psu.edu/~jiali/index.demo.html

  • J. Li, J. Z. Wang, ``Automatic linguistic indexing of pictures by a statistical modeling approach,'' IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(9):1075-1088,2003.


Conclusions

  • Automatic Linguistic Indexing of Pictures

    • Highly challenging

    • Much more to be explored

  • Statistical modeling has shown some success.

  • To be explored:

    • Training image database is not categorized.

    • Better modeling techniques.

    • Real-world applications.


  • Login