WHY MEANINGFUL AUTOMATIC TAGGING OF IMAGES IS VERY HARD - PowerPoint PPT Presentation

Why meaningful automatic tagging of images is very hard l.jpg
Download
1 / 20

WHY MEANINGFUL AUTOMATIC TAGGING OF IMAGES IS VERY HARD. Theo Pavlidis Stony Brook University t.pavlidis@ieee.org. We expect that dealing with images to be much harder than dealing with text.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

WHY MEANINGFUL AUTOMATIC TAGGING OF IMAGES IS VERY HARD

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Why meaningful automatic tagging of images is very hard l.jpg

WHY MEANINGFUL AUTOMATIC TAGGING OF IMAGES IS VERY HARD

Theo Pavlidis

Stony Brook University

t.pavlidis@ieee.org

ICME2009 talk


We expect that dealing with images to be much harder than dealing with text l.jpg

We expect that dealing with images to be much harder than dealing with text.

  • The human visual system has evolved from animal visual systems over a period of more than 200 million years.

  • Speech is barely over 100 thousand years old.

  • Written text is about 5 thousand years old.

In humans the visual system occupies 1/3 of the brain, a much larger portion than the auditory system. 85% of human sensorial information is the result of visual inputs.

ICME2009 talk


Three specific pieces of evidence why auto tagging is hard l.jpg

Three Specific Pieces of Evidence why Auto-Tagging is hard

  • Failure of past pixel-based techniques to scale to real world data.

  • Efforts to base tagging on non-pixel information and their limits.

  • Security systems based on the assumption that automatic tagging is impossible.

ICME2009 talk


Pixel based methods do not scale l.jpg

Pixel-based methods do not scale

  • Methods work well in published examples, but fail at large because of:

    • Huge cardinality of the set of all possible images: the number of different discernible images is at least 1025 (over a trillion squared).

    • Semantic gap (actually semantic abyss)

ICME2009 talk


A pair from a set of 5 36 10 25 images l.jpg

A pair from a set of 536 (>1025) images

ICME2009 talk


Cardinality problems l.jpg

Cardinality Problems

  • Because the number of images is so large it is very hard to find a representative sample.

  • Even if many of the different images may have the “same” meaning for a human viewer, their pixel values may differ a lot. Hence the semantic and other gaps.

  • Aside: The cardinality problem can be dealt by limiting the class of images and the matching rules (examples are applications in biometrics). Using synthetic data (if we know the rules) also helps.

ICME2009 talk


The semantic abyss l.jpg

The Semantic Abyss

Perceptually close(agreement amongst observers)

Computationally close(similar pixel statistics)

ICME2009 talk


The conceptual abyss l.jpg

The Conceptual Abyss

Conceptually Close(but not for all observers.)

Computationally close(Large areas withsimilar local pixel statistics)

ICME2009 talk


A major obstacle l.jpg

A Major Obstacle

  • Human observers tend to agree on images that are quite similar or quite dissimilar (slide on “semantic abyss”) but not on those in between (slide on “conceptual abyss”).

  • If there is no agreement on similarity amongst human observers how can we establish computational measures for similarity?

ICME2009 talk


Tagging labeling is much harder than matching because it requires interpretation l.jpg

Tagging (Labeling) is much harder than matching because it requires interpretation

ΠΑΝΚΟΣΜΙΟΣ ΠΟΛΕΜΟΣ

ΠΟΛΕΜΟΣ ΠΑΤΗΡ ΠΑΝΤΩΝ

Not surprisingly, results of online systems are poor.

ICME2009 talk


Results from alipr l.jpg

Results from ALIPR

building, landmark, rock, historical, ruin, texture, man-made, landscape, natural, sky, ocean, castle, car, beach, grass

indoor, rock, flower, food, pattern, yellow, texture, agate, vegetable, natural, fruit, barbecue, cuisine, dessert, tree.

ICME2009 talk


Result no 1 from a new site l.jpg

Result No. 1 from a new site

Mammals, show, Business Woman, animals, black, business, attitude, full, office workers, business, computers, office, smiles, close-up, businessman, adults, parents

ICME2009 talk


Result no 2 from a new site l.jpg

Result No. 2 from a new site

Rest, chairs, architecture, animals, Europe, church, boats, livestock, ports, city, Italy, the sea, building, boat, beach, housing, harbor, holiday

ICME2009 talk


Three specific pieces of evidence why auto tagging is hard14 l.jpg

Three Specific Pieces of Evidence why Auto-Tagging is hard

  • Failure of past pixel-based techniques to scale to real world data.

  • Efforts to base tagging on non-pixel information and their limits.

  • Security systems based on the assumption that automatic tagging is impossible.

ICME2009 talk


Efforts to base tagging on non pixel information and their limits l.jpg

Efforts to base tagging on non-pixel information and their limits

  • Iftext is available with an image, then several authors (starting in 1995) have described methods for assigning tags (coupled with image analysis).

  • Linguistic ambiguity presents challenges to the labeling process.

ICME2009 talk


Efforts to base tagging on non pixel information and their limits16 l.jpg

Efforts to base tagging on non-pixel information and their limits

  • For images obtained with digital cameras, the EXIF record in combination with some pixel information can be used to assign tags, e.g. “Sunset in New York City Harbor”. (See Wong and Leung [15].)

  • But the EXIF record is not always available and it may not be preserved by image processing programs.

ICME2009 talk


Three specific pieces of evidence why auto tagging is hard17 l.jpg

Three Specific Pieces of Evidence why Auto-Tagging is hard

  • Failure of past pixel-based techniques to scale to real world data.

  • Efforts to base tagging on non-pixel information and their limits.

  • Security systems based on the assumption that automatic tagging is impossible.

ICME2009 talk


Security systems based on human interaction proof hip l.jpg

Security systems basedon Human Interaction Proof (HIP)

  • HIP (and CAPTCHA) are methods that try to distinguish human users from web-bots.

  • Currently they relyon distorted text.

  • A more secure system for the future is to ask what is in an image. (Assuming that web-bots cannot do that.)

    • But then we need enormous human labor to label images for checking the answers

ICME2009 talk


Harnessing human labor l.jpg

Harnessing Human Labor

  • Luis Von Ahn (a co-inventor of CAPTCHA) observed that people spent a lot of time playing computer games, so he created the ESP game where people end up labeling images.

  • Google licensed the ESP method and created the Google Image Labeler.

  • Results of human labeling are “cleaned-up” by statistical analysis.

ICME2009 talk


Conclusions l.jpg

Conclusions

  • Automating tagging by image processing techniques seems impossible in the foreseeable future.

  • There is a need for more research on methods for direct or indirect human tagging.

ICME2009 talk


  • Login