Learning local affine representations for texture and object recognition
This presentation is the property of its rightful owner.
Sponsored Links
1 / 49

Learning Local Affine Representations for Texture and Object Recognition PowerPoint PPT Presentation


  • 90 Views
  • Uploaded on
  • Presentation posted in: General

Learning Local Affine Representations for Texture and Object Recognition. Svetlana Lazebnik Beckman Institute, University of Illinois at Urbana-Champaign (joint work with Cordelia Schmid, Jean Ponce). Overview. Goal: Recognition of 3D textured surfaces, object classes Our contribution:

Download Presentation

Learning Local Affine Representations for Texture and Object Recognition

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Learning Local Affine Representations for Texture and Object Recognition

Svetlana Lazebnik

Beckman Institute, University of Illinois at Urbana-Champaign

(joint work with Cordelia Schmid, Jean Ponce)


Overview

  • Goal:

    • Recognition of 3D textured surfaces, object classes

  • Our contribution:

    • Texture and object representations based on local affine regions

  • Advantages of proposed approach:

    • Distinctive, repeatable primitives

    • Robustness to clutter and occlusion

    • Ability to approximate 3D geometric transformations


The Scope

  • Recognition of single-texture images (CVPR 2003)

  • Recognition of individual texture regions in multi-texture images (ICCV 2003)

  • Recognition of object classes (BMVC 2004, work in progress)


1. Recognition of Single-Texture Images


Affine Region Detectors

Harris detector (H)

Laplacian detector (L)

Mikolajczyk & Schmid (2002), Gårding & Lindeberg (1996)


Affine Rectification Process

Patch 1

Patch 2

Rectified patches (rotational ambiguity)


Rotation-Invariant Descriptors 1: Spin Images

  • Based on range spin images (Johnson & Hebert 1998)

  • Two-dimensional histogram: distance from center × intensity value


Rotation-Invariant Descriptors 2: RIFT

  • Based on SIFT (Lowe 1999)

  • Two-dimensional histogram: distance from center × gradient orientation

  • Gradient orientation is measured w.r.t. to the direction pointing from the center of the patch


Signatures and EMD

  • SignaturesS = {(m1, w1), … , (mk, wk)}mi — cluster centerwi — relative weight

  • Earth Mover’s Distance (Rubner et al. 1998)

    • Computed from ground distances d(mi, m'j)

    • Can compare signatures of different sizes

    • Insensitive to the number of clusters


Database: Textured Surfaces

25 textures, 40 sample images each (640x480)


Evaluation

  • Channels: HS, HR, LS, LR

    • Combined through addition of EMD matrices

  • Classification results

    • 10 training images per class, rates averaged over 200 random training subsets


Comparative Evaluation


(H+L)(S+R)

VZ-Joint

VZ-MRF

Results of Evaluation:Classification rate vs. number of training samples

  • Conclusion: an intrinsically invariant representation is necessary to deal with intra-class variations when they are not adequately represented in the training set


Summary

  • A sparse texture representation based on local affine regions

  • Two novel descriptors (spin images, RIFT)

  • Successful recognition in the presence of viewpoint changes, non-rigidity, non-homogeneity

  • A flexible approach to invariance


2. Recognition of Individual Regions in Multi-Texture Images

  • A two-layer architecture:

    • Local appearance + neighborhood relations

  • Learning:

    • Represent the local appearance of each texture class using a mixture-of-Gaussians model

    • Compute co-occurrence statistics of sub-class labels over affinely adapted neighborhoods

  • Recognition:

    • Obtain initial class membership probabilities from the generative model

    • Use relaxation to refine these probabilities


Two Learning Scenarios

  • Fully supervised: every region in the training image is labeled with its texture class

  • Weakly supervised: each training image is labeled with the classes occurring in it

brick

brick, marble, carpet


Neighborhood Statistics

  • Estimate:

  • probability p(c,c')

  • correlation r(c,c')

Neighborhood definition


Relaxation (Rosenfeld et al. 1976)

  • Iterative process:

    • Initialized with posterior probabilities p(c|xi) obtained from the generative model

    • For each region i and each sub-class label c, update the probability pi(c) based on neighbor probabilities pj(c') and correlations r(c,c')

  • Shortcomings:

    • No formal guarantee of convergence

    • After the initialization, the updates to the probability values do not depend on the image data


Experiment 1: 3D Textured Surfaces

Single-texture images

T1 (brick)

T2 (carpet)

T3 (chair)

T4 (floor 1)

T5 (floor 2)

T6 (marble)

T7 (wood)

Multi-texture images

10 single-texture training images per class, 13 two-texture training images, 45 multi-texture test images


Effect of Relaxation on Labeling

Original image

Top: before relaxation, bottom: after relaxation


Retrieval

(single-texture training images)

T1 (brick)

T2 (carpet)

T3 (chair)

T4 (floor 1)

T5 (floor 2)

T6 (marble)

T7 (wood)


Successful Segmentation Examples


Unsuccessful Segmentation Examples


Experiment 2: Animals

  • No manual segmentation

  • Training data: 10 sample images per class

  • Test data: 20 samples per class + 20 negative images

cheetah, background

zebra, background

giraffe, background


Cheetah Results


Zebra Results


Giraffe Results


Summary

Future Work

  • A two-level representation (local appearance + neighborhood relations)

  • Weakly supervised learning of texture models

  • Design an improved representation using a random field framework, e.g., conditional random fields (Lafferty 2001, Kumar & Hebert 2003)

  • Develop a procedure for weakly supervised learning of random field parameters

  • Apply method to recognition of natural texture categories


3. Recognition of Object Classes

The approach:

  • Represent objects using multiple composite semi-local affine parts

    • More expressive than individual regions

    • Not globally rigid

  • Correspondence search is key to learning and detection


Correspondence Search

  • Basic operation: a two-image matching procedure for finding collections of affine regions that can be mapped onto each other using a single affine transformation

  • Implementation: greedy search based on geometric and photometric consistency constraints

    • Returns multiple correspondence hypotheses

    • Automatically determines number of regions in correspondence

    • Works on unsegmented, cluttered images (weakly supervised learning)

A


Matching: 3D Objects


Matching: 3D Objects

closeup

closeup


Matching: Faces

spurious match ???


Finding Symmetries


Finding Repeated Patterns and Symmetries


Learning Object Models for Recognition

  • Match multiple pairs of training images to produce a set of candidate parts

  • Use additional validation images to evaluate repeatability of parts and individual regions

  • Retain a fixed number of parts having the best repeatability score


Recognition Experiment: Butterflies

Admiral Swallowtail Machaon Monarch 1 Monarch 2 Peacock Zebra

  • 16 training images (8 pairs) per class

  • 10 validation images per class

  • 437 test images

  • 619 images total


Butterfly Parts


Recognition

  • Top 10 parts per class used for recognition

  • Relative repeatability score:

  • Classification results:

total number of regions detectedtotal part size

Total part size (smallest/largest)


Classification Rate vs. Number of Parts


Detection Results (ROC Curves)

Circles: reference relative repeatability rates. Red square: ROC equal error rate (in parentheses)


Successful Detection Examples

Training images

Test images (blue: occluded regions)

All ellipses found in the test images


Unsuccessful Detection Examples

Training images

Test images (blue: occluded regions)

All ellipses found in the test image


Summary

Summary

  • Semi-local affine parts for describing structure of 3D objects

  • Finding a part vocabulary:

    • Correspondence search between pairs of images

    • Validation

  • Additional application:

    • Finding symmetry and repetition

Future Work

  • Find a better affine region detector

  • Represent, learn inter-part relations

  • Evaluation: CalTech database, harder classes, etc.


Birds

Egret

Puffin

Snowy Owl

Mandarin Duck

Wood Duck


Birds: Candidate Parts

Mandarin Duck

Puffin


Objects without Characteristic Texture

(LeCun’04)


Summary of Talk

  • Recognition of single-texture images

    • Distribution of local appearance descriptors

  • Recognition of individual regions in multi-texture images

    • Local appearance + loose statistical neighborhood relations

  • Recognition of object categories

    • Local appearance + strong geometric relations

      For more information: http://www-cvr.ai.uiuc.edu/ponce_grp


Issues, Extensions

  • Weakly supervised learning

    • Evaluation methods?

    • Learning from contaminated data?

  • Probabilistic vs. geometric approaches to invariance

  • EM vs. direct correspondence search

  • Training set size

  • Background modeling

  • Strengthening the representation

    • Heterogeneous local features

    • Automatic feature selection

    • Inter-part relations


  • Login