learning local affine representations for texture and object recognition
Download
Skip this Video
Download Presentation
Learning Local Affine Representations for Texture and Object Recognition

Loading in 2 Seconds...

play fullscreen
1 / 49

Learning Local Affine Representations for Texture and Object Recognition - PowerPoint PPT Presentation


  • 148 Views
  • Uploaded on

Learning Local Affine Representations for Texture and Object Recognition. Svetlana Lazebnik Beckman Institute, University of Illinois at Urbana-Champaign (joint work with Cordelia Schmid, Jean Ponce). Overview. Goal: Recognition of 3D textured surfaces, object classes Our contribution:

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Learning Local Affine Representations for Texture and Object Recognition' - denver


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
learning local affine representations for texture and object recognition

Learning Local Affine Representations for Texture and Object Recognition

Svetlana Lazebnik

Beckman Institute, University of Illinois at Urbana-Champaign

(joint work with Cordelia Schmid, Jean Ponce)

overview
Overview
  • Goal:
    • Recognition of 3D textured surfaces, object classes
  • Our contribution:
    • Texture and object representations based on local affine regions
  • Advantages of proposed approach:
    • Distinctive, repeatable primitives
    • Robustness to clutter and occlusion
    • Ability to approximate 3D geometric transformations
the scope
The Scope
  • Recognition of single-texture images (CVPR 2003)
  • Recognition of individual texture regions in multi-texture images (ICCV 2003)
  • Recognition of object classes (BMVC 2004, work in progress)
affine region detectors
Affine Region Detectors

Harris detector (H)

Laplacian detector (L)

Mikolajczyk & Schmid (2002), Gårding & Lindeberg (1996)

affine rectification process
Affine Rectification Process

Patch 1

Patch 2

Rectified patches (rotational ambiguity)

rotation invariant descriptors 1 spin images
Rotation-Invariant Descriptors 1: Spin Images
  • Based on range spin images (Johnson & Hebert 1998)
  • Two-dimensional histogram: distance from center × intensity value
rotation invariant descriptors 2 rift
Rotation-Invariant Descriptors 2: RIFT
  • Based on SIFT (Lowe 1999)
  • Two-dimensional histogram: distance from center × gradient orientation
  • Gradient orientation is measured w.r.t. to the direction pointing from the center of the patch
signatures and emd
Signatures and EMD
  • SignaturesS = {(m1, w1), … , (mk, wk)}mi — cluster centerwi — relative weight
  • Earth Mover’s Distance (Rubner et al. 1998)
    • Computed from ground distances d(mi, m\'j)
    • Can compare signatures of different sizes
    • Insensitive to the number of clusters
database textured surfaces
Database: Textured Surfaces

25 textures, 40 sample images each (640x480)

evaluation
Evaluation
  • Channels: HS, HR, LS, LR
    • Combined through addition of EMD matrices
  • Classification results
    • 10 training images per class, rates averaged over 200 random training subsets
results of evaluation classification rate vs number of training samples

(H+L)(S+R)

VZ-Joint

VZ-MRF

Results of Evaluation:Classification rate vs. number of training samples
  • Conclusion: an intrinsically invariant representation is necessary to deal with intra-class variations when they are not adequately represented in the training set
summary
Summary
  • A sparse texture representation based on local affine regions
  • Two novel descriptors (spin images, RIFT)
  • Successful recognition in the presence of viewpoint changes, non-rigidity, non-homogeneity
  • A flexible approach to invariance
2 recognition of individual regions in multi texture images
2. Recognition of Individual Regions in Multi-Texture Images
  • A two-layer architecture:
    • Local appearance + neighborhood relations
  • Learning:
    • Represent the local appearance of each texture class using a mixture-of-Gaussians model
    • Compute co-occurrence statistics of sub-class labels over affinely adapted neighborhoods
  • Recognition:
    • Obtain initial class membership probabilities from the generative model
    • Use relaxation to refine these probabilities
two learning scenarios
Two Learning Scenarios
  • Fully supervised: every region in the training image is labeled with its texture class
  • Weakly supervised: each training image is labeled with the classes occurring in it

brick

brick, marble, carpet

neighborhood statistics
Neighborhood Statistics
  • Estimate:
  • probability p(c,c\')
  • correlation r(c,c\')

Neighborhood definition

relaxation rosenfeld et al 1976
Relaxation (Rosenfeld et al. 1976)
  • Iterative process:
    • Initialized with posterior probabilities p(c|xi) obtained from the generative model
    • For each region i and each sub-class label c, update the probability pi(c) based on neighbor probabilities pj(c\') and correlations r(c,c\')
  • Shortcomings:
    • No formal guarantee of convergence
    • After the initialization, the updates to the probability values do not depend on the image data
experiment 1 3d textured surfaces
Experiment 1: 3D Textured Surfaces

Single-texture images

T1 (brick)

T2 (carpet)

T3 (chair)

T4 (floor 1)

T5 (floor 2)

T6 (marble)

T7 (wood)

Multi-texture images

10 single-texture training images per class, 13 two-texture training images, 45 multi-texture test images

effect of relaxation on labeling
Effect of Relaxation on Labeling

Original image

Top: before relaxation, bottom: after relaxation

retrieval
Retrieval

(single-texture training images)

T1 (brick)

T2 (carpet)

T3 (chair)

T4 (floor 1)

T5 (floor 2)

T6 (marble)

T7 (wood)

experiment 2 animals
Experiment 2: Animals
  • No manual segmentation
  • Training data: 10 sample images per class
  • Test data: 20 samples per class + 20 negative images

cheetah, background

zebra, background

giraffe, background

future work

Summary

Future Work
  • A two-level representation (local appearance + neighborhood relations)
  • Weakly supervised learning of texture models
  • Design an improved representation using a random field framework, e.g., conditional random fields (Lafferty 2001, Kumar & Hebert 2003)
  • Develop a procedure for weakly supervised learning of random field parameters
  • Apply method to recognition of natural texture categories
3 recognition of object classes
3. Recognition of Object Classes

The approach:

  • Represent objects using multiple composite semi-local affine parts
    • More expressive than individual regions
    • Not globally rigid
  • Correspondence search is key to learning and detection
correspondence search
Correspondence Search
  • Basic operation: a two-image matching procedure for finding collections of affine regions that can be mapped onto each other using a single affine transformation
  • Implementation: greedy search based on geometric and photometric consistency constraints
    • Returns multiple correspondence hypotheses
    • Automatically determines number of regions in correspondence
    • Works on unsegmented, cluttered images (weakly supervised learning)

A

matching 3d objects1
Matching: 3D Objects

closeup

closeup

matching faces
Matching: Faces

spurious match ???

learning object models for recognition
Learning Object Models for Recognition
  • Match multiple pairs of training images to produce a set of candidate parts
  • Use additional validation images to evaluate repeatability of parts and individual regions
  • Retain a fixed number of parts having the best repeatability score
recognition experiment butterflies
Recognition Experiment: Butterflies

Admiral Swallowtail Machaon Monarch 1 Monarch 2 Peacock Zebra

  • 16 training images (8 pairs) per class
  • 10 validation images per class
  • 437 test images
  • 619 images total
recognition
Recognition
  • Top 10 parts per class used for recognition
  • Relative repeatability score:
  • Classification results:

total number of regions detectedtotal part size

Total part size (smallest/largest)

detection results roc curves
Detection Results (ROC Curves)

Circles: reference relative repeatability rates. Red square: ROC equal error rate (in parentheses)

successful detection examples
Successful Detection Examples

Training images

Test images (blue: occluded regions)

All ellipses found in the test images

unsuccessful detection examples
Unsuccessful Detection Examples

Training images

Test images (blue: occluded regions)

All ellipses found in the test image

summary1

Summary

Summary
  • Semi-local affine parts for describing structure of 3D objects
  • Finding a part vocabulary:
    • Correspondence search between pairs of images
    • Validation
  • Additional application:
    • Finding symmetry and repetition

Future Work

  • Find a better affine region detector
  • Represent, learn inter-part relations
  • Evaluation: CalTech database, harder classes, etc.
birds
Birds

Egret

Puffin

Snowy Owl

Mandarin Duck

Wood Duck

birds candidate parts
Birds: Candidate Parts

Mandarin Duck

Puffin

summary of talk
Summary of Talk
  • Recognition of single-texture images
    • Distribution of local appearance descriptors
  • Recognition of individual regions in multi-texture images
    • Local appearance + loose statistical neighborhood relations
  • Recognition of object categories
    • Local appearance + strong geometric relations

For more information: http://www-cvr.ai.uiuc.edu/ponce_grp

issues extensions
Issues, Extensions
  • Weakly supervised learning
    • Evaluation methods?
    • Learning from contaminated data?
  • Probabilistic vs. geometric approaches to invariance
  • EM vs. direct correspondence search
  • Training set size
  • Background modeling
  • Strengthening the representation
    • Heterogeneous local features
    • Automatic feature selection
    • Inter-part relations
ad