learning local affine representations for texture and object recognition n.
Skip this Video
Download Presentation
Learning Local Affine Representations for Texture and Object Recognition

Loading in 2 Seconds...

play fullscreen
1 / 49

Learning Local Affine Representations for Texture and Object Recognition - PowerPoint PPT Presentation

  • Uploaded on

Learning Local Affine Representations for Texture and Object Recognition. Svetlana Lazebnik Beckman Institute, University of Illinois at Urbana-Champaign (joint work with Cordelia Schmid, Jean Ponce). Overview. Goal: Recognition of 3D textured surfaces, object classes Our contribution:

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Learning Local Affine Representations for Texture and Object Recognition' - denver

Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
learning local affine representations for texture and object recognition

Learning Local Affine Representations for Texture and Object Recognition

Svetlana Lazebnik

Beckman Institute, University of Illinois at Urbana-Champaign

(joint work with Cordelia Schmid, Jean Ponce)

  • Goal:
    • Recognition of 3D textured surfaces, object classes
  • Our contribution:
    • Texture and object representations based on local affine regions
  • Advantages of proposed approach:
    • Distinctive, repeatable primitives
    • Robustness to clutter and occlusion
    • Ability to approximate 3D geometric transformations
the scope
The Scope
  • Recognition of single-texture images (CVPR 2003)
  • Recognition of individual texture regions in multi-texture images (ICCV 2003)
  • Recognition of object classes (BMVC 2004, work in progress)
affine region detectors
Affine Region Detectors

Harris detector (H)

Laplacian detector (L)

Mikolajczyk & Schmid (2002), Gårding & Lindeberg (1996)

affine rectification process
Affine Rectification Process

Patch 1

Patch 2

Rectified patches (rotational ambiguity)

rotation invariant descriptors 1 spin images
Rotation-Invariant Descriptors 1: Spin Images
  • Based on range spin images (Johnson & Hebert 1998)
  • Two-dimensional histogram: distance from center × intensity value
rotation invariant descriptors 2 rift
Rotation-Invariant Descriptors 2: RIFT
  • Based on SIFT (Lowe 1999)
  • Two-dimensional histogram: distance from center × gradient orientation
  • Gradient orientation is measured w.r.t. to the direction pointing from the center of the patch
signatures and emd
Signatures and EMD
  • SignaturesS = {(m1, w1), … , (mk, wk)}mi — cluster centerwi — relative weight
  • Earth Mover’s Distance (Rubner et al. 1998)
    • Computed from ground distances d(mi, m'j)
    • Can compare signatures of different sizes
    • Insensitive to the number of clusters
database textured surfaces
Database: Textured Surfaces

25 textures, 40 sample images each (640x480)

  • Channels: HS, HR, LS, LR
    • Combined through addition of EMD matrices
  • Classification results
    • 10 training images per class, rates averaged over 200 random training subsets
results of evaluation classification rate vs number of training samples




Results of Evaluation:Classification rate vs. number of training samples
  • Conclusion: an intrinsically invariant representation is necessary to deal with intra-class variations when they are not adequately represented in the training set
  • A sparse texture representation based on local affine regions
  • Two novel descriptors (spin images, RIFT)
  • Successful recognition in the presence of viewpoint changes, non-rigidity, non-homogeneity
  • A flexible approach to invariance
2 recognition of individual regions in multi texture images
2. Recognition of Individual Regions in Multi-Texture Images
  • A two-layer architecture:
    • Local appearance + neighborhood relations
  • Learning:
    • Represent the local appearance of each texture class using a mixture-of-Gaussians model
    • Compute co-occurrence statistics of sub-class labels over affinely adapted neighborhoods
  • Recognition:
    • Obtain initial class membership probabilities from the generative model
    • Use relaxation to refine these probabilities
two learning scenarios
Two Learning Scenarios
  • Fully supervised: every region in the training image is labeled with its texture class
  • Weakly supervised: each training image is labeled with the classes occurring in it


brick, marble, carpet

neighborhood statistics
Neighborhood Statistics
  • Estimate:
  • probability p(c,c')
  • correlation r(c,c')

Neighborhood definition

relaxation rosenfeld et al 1976
Relaxation (Rosenfeld et al. 1976)
  • Iterative process:
    • Initialized with posterior probabilities p(c|xi) obtained from the generative model
    • For each region i and each sub-class label c, update the probability pi(c) based on neighbor probabilities pj(c') and correlations r(c,c')
  • Shortcomings:
    • No formal guarantee of convergence
    • After the initialization, the updates to the probability values do not depend on the image data
experiment 1 3d textured surfaces
Experiment 1: 3D Textured Surfaces

Single-texture images

T1 (brick)

T2 (carpet)

T3 (chair)

T4 (floor 1)

T5 (floor 2)

T6 (marble)

T7 (wood)

Multi-texture images

10 single-texture training images per class, 13 two-texture training images, 45 multi-texture test images

effect of relaxation on labeling
Effect of Relaxation on Labeling

Original image

Top: before relaxation, bottom: after relaxation


(single-texture training images)

T1 (brick)

T2 (carpet)

T3 (chair)

T4 (floor 1)

T5 (floor 2)

T6 (marble)

T7 (wood)

experiment 2 animals
Experiment 2: Animals
  • No manual segmentation
  • Training data: 10 sample images per class
  • Test data: 20 samples per class + 20 negative images

cheetah, background

zebra, background

giraffe, background

future work


Future Work
  • A two-level representation (local appearance + neighborhood relations)
  • Weakly supervised learning of texture models
  • Design an improved representation using a random field framework, e.g., conditional random fields (Lafferty 2001, Kumar & Hebert 2003)
  • Develop a procedure for weakly supervised learning of random field parameters
  • Apply method to recognition of natural texture categories
3 recognition of object classes
3. Recognition of Object Classes

The approach:

  • Represent objects using multiple composite semi-local affine parts
    • More expressive than individual regions
    • Not globally rigid
  • Correspondence search is key to learning and detection
correspondence search
Correspondence Search
  • Basic operation: a two-image matching procedure for finding collections of affine regions that can be mapped onto each other using a single affine transformation
  • Implementation: greedy search based on geometric and photometric consistency constraints
    • Returns multiple correspondence hypotheses
    • Automatically determines number of regions in correspondence
    • Works on unsegmented, cluttered images (weakly supervised learning)


matching 3d objects1
Matching: 3D Objects



matching faces
Matching: Faces

spurious match ???

learning object models for recognition
Learning Object Models for Recognition
  • Match multiple pairs of training images to produce a set of candidate parts
  • Use additional validation images to evaluate repeatability of parts and individual regions
  • Retain a fixed number of parts having the best repeatability score
recognition experiment butterflies
Recognition Experiment: Butterflies

Admiral Swallowtail Machaon Monarch 1 Monarch 2 Peacock Zebra

  • 16 training images (8 pairs) per class
  • 10 validation images per class
  • 437 test images
  • 619 images total
  • Top 10 parts per class used for recognition
  • Relative repeatability score:
  • Classification results:

total number of regions detectedtotal part size

Total part size (smallest/largest)

detection results roc curves
Detection Results (ROC Curves)

Circles: reference relative repeatability rates. Red square: ROC equal error rate (in parentheses)

successful detection examples
Successful Detection Examples

Training images

Test images (blue: occluded regions)

All ellipses found in the test images

unsuccessful detection examples
Unsuccessful Detection Examples

Training images

Test images (blue: occluded regions)

All ellipses found in the test image



  • Semi-local affine parts for describing structure of 3D objects
  • Finding a part vocabulary:
    • Correspondence search between pairs of images
    • Validation
  • Additional application:
    • Finding symmetry and repetition

Future Work

  • Find a better affine region detector
  • Represent, learn inter-part relations
  • Evaluation: CalTech database, harder classes, etc.



Snowy Owl

Mandarin Duck

Wood Duck

birds candidate parts
Birds: Candidate Parts

Mandarin Duck


summary of talk
Summary of Talk
  • Recognition of single-texture images
    • Distribution of local appearance descriptors
  • Recognition of individual regions in multi-texture images
    • Local appearance + loose statistical neighborhood relations
  • Recognition of object categories
    • Local appearance + strong geometric relations

For more information: http://www-cvr.ai.uiuc.edu/ponce_grp

issues extensions
Issues, Extensions
  • Weakly supervised learning
    • Evaluation methods?
    • Learning from contaminated data?
  • Probabilistic vs. geometric approaches to invariance
  • EM vs. direct correspondence search
  • Training set size
  • Background modeling
  • Strengthening the representation
    • Heterogeneous local features
    • Automatic feature selection
    • Inter-part relations