1 / 14

Learning Models for Object Recognition from Natural Language Descriptions

Learning Models for Object Recognition from Natural Language Descriptions. Presenters: Sagardeep Mahapatra – 108771077 Keerti Korrapati - 108694316. . Goal Learning models for visual object recognition from natural language descriptions alone Why learn model from natural language?

manon
Download Presentation

Learning Models for Object Recognition from Natural Language Descriptions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning Models for Object Recognitionfrom Natural Language Descriptions Presenters: Sagardeep Mahapatra – 108771077 Keerti Korrapati - 108694316

  2. . Goal • Learning models for visual object recognition from natural language descriptions alone Why learn model from natural language? • Manually collecting and labeling large image sets is difficult • New training set needs to be created for each new category • Finding images for fined grained object categories is tough • Ex- species of plants and animals • But detailed visual descriptions may be readily available

  3. . Outline • Datasets for training and testing • Natural Language Processing methods • Template Filling • Extraction of visual attributes from test images • Score an image against the learnt template models • Results • Observations

  4. . Dataset • Text descriptions associated with ten species of butterflies from the eNature guide to construct the template model • Butterflies, because they have distinctive visual features like wing colors, spots, etc • Images downloaded from google for each of the ten butterfly categories form the testing set Danaus plexippus Heliconius charitonius Heliconius erato Junonia coenia Lycaena phlaeas Nymphalis antiopa Papilio cresphontes Pieris rapae Vanessa atalanta Vanessa cardui

  5. . Natural Language Processing • Goal: Convert unstructured data in descriptions into structured templates Factual but unstructured data in text Information Extraction ……….. …….…. ………..

  6. . Tokenization Part-of-Speech Tagging Custom Transformation Chunking Template Filling Template Filling • Text is tokenized into words • Tokens are tagged with parts of speech (using C&C tagger) • Custom transformations are performed to correct known mistakes • Required because eNature guide tends to suppress some information • Chunks of texts matching pre-defined tag sequence are extracted • Ex- noun phrases (‘wings have blue spots’), adjective phrases (‘wings are black’) • Extracted phrases are filtered through a list of colors, patterns and positions to fill the template slots

  7. Visual Processing Performed based on two attributes of butterflies • Dominant Wing Color • Colored Spots 1) Image Segmentation • Variation in the background can pose challenges during image classification • Hence, the butterfly image was segmented from the background using the ‘star shape’ graph cut approach

  8. 2) Spot Detection (Using a spot classifier) • Hand marked butterfly images with no prior class information form the training set for the spot classifier • Candidate regions likely to be spots are extracted by using Difference-of-Gaussians interest point operator • Image descriptors (SIFT features) are extracted around the candidate spot to classify it as a spot or non-spot • 3) Color Modelling • Required to connect color names of dominant wing colors and spot colors in learnt templates to image observations • For each color name ci, probability distribution p(z|ci) was learnt from training butterfly images ,where z is a pixel color observation in the L*a*b* color space

  9. Generative Model Given an input image I the probability of the image given a butterfly category Bi as a product over the spot and wing observations: Spot color name prior Equal priors to all spot colors Dominant color name prior

  10. . Experimental Results Two set of experiments were performed • Performance of human beings in recognizing butterflies from textual descriptions • Because this may be reasonably considered as an upper bound • Performance of the proposed method

  11. Human Performance

  12. Performance of proposed method

  13. Observations • Accuracy of proposed method was comparable to accuracy of non-native English speakers • Accuracy of proposed method was more than 80 percent for four categories • Classification of ‘Heliconius charitonius’ was the toughest for humans and also with the ground-truth and learnt templates • Performance with ground-truth templates was comparable to that with the learnt templates • Errors in templates due to NLP methods did not have much impact

  14. Thank You

More Related