alip automatic linguistic indexing of pictures
Download
Skip this Video
Download Presentation
ALIP: Automatic Linguistic Indexing of Pictures

Loading in 2 Seconds...

play fullscreen
1 / 33

ALIP: Automatic Linguistic Indexing of Pictures - PowerPoint PPT Presentation


  • 132 Views
  • Uploaded on

ALIP: Automatic Linguistic Indexing of Pictures. Jia Li The Pennsylvania State University. Can a computer do this?. “Building, sky, lake, landscape, Europe, tree”. Outline. Background Statistical image modeling approach The system architecture The image model Experiments

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' ALIP: Automatic Linguistic Indexing of Pictures' - elsu


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
alip automatic linguistic indexing of pictures

ALIP: Automatic Linguistic Indexing of Pictures

Jia Li

The Pennsylvania State University

can a computer do this
Can a computer do this?
  • “Building, sky, lake, landscape, Europe, tree”
outline
Outline
  • Background
  • Statistical image modeling approach
    • The system architecture
    • The image model
  • Experiments
  • Conclusions and future work
image database
Image Database
  • The image database contains categorized images.
  • Each category is annotated with a few words.
    • Landscape, glacier
    • Africa, wildlife
  • Each category of images is referred to as a concept.
a category of images
A Category of Images

Annotation: “man, male, people, cloth, face”

alip automatic linguistic indexing for pictures
ALIP: Automatic Linguistic Indexing for Pictures
  • Learn relations between annotation words and images using the training database.
  • Profile each category by a statistical image model: 2-D Multiresolution Hidden Markov Model (2-D MHMM).
  • Assess the similarity between an image and a category by its likelihood under the profiling model.
outline1
Outline
  • Background
  • Statistical image modeling approach
    • The system architecture
    • The image model
  • Experiments
  • Conclusions and future work
training
Training

Training images used to train a concept with

description “man, male, people, cloth, face”

outline2
Outline
  • Background
  • Statistical image modeling approach
    • The system architecture
    • The image model
  • Experiments
  • Conclusions and future work
2d hmm
2D HMM

Regard an image as a grid. A feature vector is computed for each node.

  • Each node exists in a hidden state.
  • The states are governed by a Markov mesh (a causal Markov random field).
  • Given the state, the feature vector is conditionally independent of other feature vectors and follows a normal distribution.
  • The states are introduced to efficiently model the spatial dependence among feature vectors.
  • The states are not observable, which makes estimation difficult.
2d hmm1
2D HMM

The underlying states are governed by a Markov mesh.

(i’,j’)<(i,j) if i’<i; or i’=i & j’<j

Context: the set of states for (i’, j’): (i’, j’)<(i, j)

2 d mhmm
2-D MHMM

Filtering, e.g.,

by wavelet transform

  • Incorporate features at multiple resolutions.
  • Provide more flexibility for modeling statistical dependence.
  • Reduce computation by representing context information hierarchically.
2d mhmm
2D MHMM
  • An image is a pyramid grid.
  • A Markovian dependence is assumed across resolutions.
  • Given the state of a parent node, the states of its child nodes follow a Markov mesh with transition probabilities depending on the parent state.
2d mhmm1
2D MHMM
  • First-order Markov dependence across resolutions.
2d mhmm2
2D MHMM
  • The child nodes at resolution r of node (k,l) at resolution r-1:
  • Conditional independence given the parent state:
2 d mhmm1
2-D MHMM
  • Statistical dependence among the states of sibling blocks is characterized by a 2-D HMM.
  • The transition probability depends on:
    • The neighboring states in both directions
    • The state of the parent block
2 d mhmm summary
2-D MHMM (Summary)
  • 2-D MHMM finds “modes” of the feature vectors and characterizes their inter- and intra-scale spatial dependence.
estimation of 2 d hmm
Estimation of 2-D HMM
  • Parameters to be estimated:
    • Transition probabilities
    • Mean and covariance matrix of each Gaussian distribution
  • EM algorithm is applied for ML estimation.
computation issues
Computation Issues

An approximation to the

classification EM approach

annotation process
Annotation Process
  • Rank the categories by the likelihoods of an image to be annotated under their profiling 2-D MHMMs.
  • Select annotation words from those used to describe the top ranked categories.
  • Statistical significance is computed for each candidate word.
  • Words that are unlikely to have appeared by chance are selected.
  • Favor the selection of rare words.
outline3
Outline
  • Background
  • Statistical image modeling approach
    • The system architecture
    • The image model
  • Experiments
  • Conclusions and future work
initial experiment
Initial Experiment
  • 600 concepts, each trained with 40 images
  • 15 minutes Pentium CPU time per concept, train only once
  • highly parallelizable algorithm
preliminary results
Preliminary Results

Computer Prediction: people, Europe, man-made, water

Building, sky, lake, landscape, Europe, tree

People, Europe, female

Food, indoor, cuisine, dessert

Snow, animal, wildlife, sky, cloth, ice, people

results using our own photographs
Results: using our own photographs
  • P: Photographer annotation
  • Underlined words: words predicted by computer
  • (Parenthesis): words not in the learned “dictionary” of the computer
systematic evaluation
Systematic Evaluation

10 classes:

Africa,

beach,

buildings,

buses,

dinosaurs,

elephants,

flowers,

horses,

mountains,

food.

600 class classification
600-class Classification
  • Task: classify a given image to one of the 600 semantic classes
  • Gold standard: the photographer/publisher classification
  • This procedure provides lower-bounds of the accuracy measures because:
    • There can be overlapsof semantics among classes (e.g., “Europe” vs. “France” vs. “Paris”, or, “tigers I” vs. “tigers II”)
    • Training images in the same class may not be visually similar (e.g., the class of “sport events” include different sports and different shooting angles)
  • Result: with 11,200 test images, 15% of the time ALIP selected the exact class as the best choice
    • I.e., ALIP is about 90 times more intelligent than a system with random-drawing system
more information
More Information
  • http://www.stat.psu.edu/~jiali/index.demo.html
  • J. Li, J. Z. Wang, ``Automatic linguistic indexing of pictures by a statistical modeling approach,\'\' IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(9):1075-1088,2003.
conclusions
Conclusions
  • Automatic Linguistic Indexing of Pictures
    • Highly challenging
    • Much more to be explored
  • Statistical modeling has shown some success.
  • To be explored:
    • Training image database is not categorized.
    • Better modeling techniques.
    • Real-world applications.
ad