Learning local affine representations for texture and object recognition
1 / 49

Learning Local Affine Representations for Texture and Object Recognition - PowerPoint PPT Presentation

  • Uploaded on
  • Presentation posted in: General

Learning Local Affine Representations for Texture and Object Recognition. Svetlana Lazebnik Beckman Institute, University of Illinois at Urbana-Champaign (joint work with Cordelia Schmid, Jean Ponce). Overview. Goal: Recognition of 3D textured surfaces, object classes Our contribution:

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

Learning Local Affine Representations for Texture and Object Recognition

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Learning local affine representations for texture and object recognition

Learning Local Affine Representations for Texture and Object Recognition

Svetlana Lazebnik

Beckman Institute, University of Illinois at Urbana-Champaign

(joint work with Cordelia Schmid, Jean Ponce)



  • Goal:

    • Recognition of 3D textured surfaces, object classes

  • Our contribution:

    • Texture and object representations based on local affine regions

  • Advantages of proposed approach:

    • Distinctive, repeatable primitives

    • Robustness to clutter and occlusion

    • Ability to approximate 3D geometric transformations

The scope

The Scope

  • Recognition of single-texture images (CVPR 2003)

  • Recognition of individual texture regions in multi-texture images (ICCV 2003)

  • Recognition of object classes (BMVC 2004, work in progress)

1 recognition of single texture images

1. Recognition of Single-Texture Images

Affine region detectors

Affine Region Detectors

Harris detector (H)

Laplacian detector (L)

Mikolajczyk & Schmid (2002), Gårding & Lindeberg (1996)

Affine rectification process

Affine Rectification Process

Patch 1

Patch 2

Rectified patches (rotational ambiguity)

Rotation invariant descriptors 1 spin images

Rotation-Invariant Descriptors 1: Spin Images

  • Based on range spin images (Johnson & Hebert 1998)

  • Two-dimensional histogram: distance from center × intensity value

Rotation invariant descriptors 2 rift

Rotation-Invariant Descriptors 2: RIFT

  • Based on SIFT (Lowe 1999)

  • Two-dimensional histogram: distance from center × gradient orientation

  • Gradient orientation is measured w.r.t. to the direction pointing from the center of the patch

Signatures and emd

Signatures and EMD

  • SignaturesS = {(m1, w1), … , (mk, wk)}mi — cluster centerwi — relative weight

  • Earth Mover’s Distance (Rubner et al. 1998)

    • Computed from ground distances d(mi, m'j)

    • Can compare signatures of different sizes

    • Insensitive to the number of clusters

Database textured surfaces

Database: Textured Surfaces

25 textures, 40 sample images each (640x480)



  • Channels: HS, HR, LS, LR

    • Combined through addition of EMD matrices

  • Classification results

    • 10 training images per class, rates averaged over 200 random training subsets

Comparative evaluation

Comparative Evaluation

Results of evaluation classification rate vs number of training samples




Results of Evaluation:Classification rate vs. number of training samples

  • Conclusion: an intrinsically invariant representation is necessary to deal with intra-class variations when they are not adequately represented in the training set



  • A sparse texture representation based on local affine regions

  • Two novel descriptors (spin images, RIFT)

  • Successful recognition in the presence of viewpoint changes, non-rigidity, non-homogeneity

  • A flexible approach to invariance

2 recognition of individual regions in multi texture images

2. Recognition of Individual Regions in Multi-Texture Images

  • A two-layer architecture:

    • Local appearance + neighborhood relations

  • Learning:

    • Represent the local appearance of each texture class using a mixture-of-Gaussians model

    • Compute co-occurrence statistics of sub-class labels over affinely adapted neighborhoods

  • Recognition:

    • Obtain initial class membership probabilities from the generative model

    • Use relaxation to refine these probabilities

Two learning scenarios

Two Learning Scenarios

  • Fully supervised: every region in the training image is labeled with its texture class

  • Weakly supervised: each training image is labeled with the classes occurring in it


brick, marble, carpet

Neighborhood statistics

Neighborhood Statistics

  • Estimate:

  • probability p(c,c')

  • correlation r(c,c')

Neighborhood definition

Relaxation rosenfeld et al 1976

Relaxation (Rosenfeld et al. 1976)

  • Iterative process:

    • Initialized with posterior probabilities p(c|xi) obtained from the generative model

    • For each region i and each sub-class label c, update the probability pi(c) based on neighbor probabilities pj(c') and correlations r(c,c')

  • Shortcomings:

    • No formal guarantee of convergence

    • After the initialization, the updates to the probability values do not depend on the image data

Experiment 1 3d textured surfaces

Experiment 1: 3D Textured Surfaces

Single-texture images

T1 (brick)

T2 (carpet)

T3 (chair)

T4 (floor 1)

T5 (floor 2)

T6 (marble)

T7 (wood)

Multi-texture images

10 single-texture training images per class, 13 two-texture training images, 45 multi-texture test images

Effect of relaxation on labeling

Effect of Relaxation on Labeling

Original image

Top: before relaxation, bottom: after relaxation



(single-texture training images)

T1 (brick)

T2 (carpet)

T3 (chair)

T4 (floor 1)

T5 (floor 2)

T6 (marble)

T7 (wood)

Successful segmentation examples

Successful Segmentation Examples

Unsuccessful segmentation examples

Unsuccessful Segmentation Examples

Experiment 2 animals

Experiment 2: Animals

  • No manual segmentation

  • Training data: 10 sample images per class

  • Test data: 20 samples per class + 20 negative images

cheetah, background

zebra, background

giraffe, background

Cheetah results

Cheetah Results

Zebra results

Zebra Results

Giraffe results

Giraffe Results

Future work


Future Work

  • A two-level representation (local appearance + neighborhood relations)

  • Weakly supervised learning of texture models

  • Design an improved representation using a random field framework, e.g., conditional random fields (Lafferty 2001, Kumar & Hebert 2003)

  • Develop a procedure for weakly supervised learning of random field parameters

  • Apply method to recognition of natural texture categories

3 recognition of object classes

3. Recognition of Object Classes

The approach:

  • Represent objects using multiple composite semi-local affine parts

    • More expressive than individual regions

    • Not globally rigid

  • Correspondence search is key to learning and detection

Correspondence search

Correspondence Search

  • Basic operation: a two-image matching procedure for finding collections of affine regions that can be mapped onto each other using a single affine transformation

  • Implementation: greedy search based on geometric and photometric consistency constraints

    • Returns multiple correspondence hypotheses

    • Automatically determines number of regions in correspondence

    • Works on unsegmented, cluttered images (weakly supervised learning)


Matching 3d objects

Matching: 3D Objects

Matching 3d objects1

Matching: 3D Objects



Matching faces

Matching: Faces

spurious match ???

Finding symmetries

Finding Symmetries

Finding repeated patterns and symmetries

Finding Repeated Patterns and Symmetries

Learning object models for recognition

Learning Object Models for Recognition

  • Match multiple pairs of training images to produce a set of candidate parts

  • Use additional validation images to evaluate repeatability of parts and individual regions

  • Retain a fixed number of parts having the best repeatability score

Recognition experiment butterflies

Recognition Experiment: Butterflies

Admiral Swallowtail Machaon Monarch 1 Monarch 2 Peacock Zebra

  • 16 training images (8 pairs) per class

  • 10 validation images per class

  • 437 test images

  • 619 images total

Butterfly parts

Butterfly Parts



  • Top 10 parts per class used for recognition

  • Relative repeatability score:

  • Classification results:

total number of regions detectedtotal part size

Total part size (smallest/largest)

Classification rate vs number of parts

Classification Rate vs. Number of Parts

Detection results roc curves

Detection Results (ROC Curves)

Circles: reference relative repeatability rates. Red square: ROC equal error rate (in parentheses)

Successful detection examples

Successful Detection Examples

Training images

Test images (blue: occluded regions)

All ellipses found in the test images

Unsuccessful detection examples

Unsuccessful Detection Examples

Training images

Test images (blue: occluded regions)

All ellipses found in the test image




  • Semi-local affine parts for describing structure of 3D objects

  • Finding a part vocabulary:

    • Correspondence search between pairs of images

    • Validation

  • Additional application:

    • Finding symmetry and repetition

Future Work

  • Find a better affine region detector

  • Represent, learn inter-part relations

  • Evaluation: CalTech database, harder classes, etc.





Snowy Owl

Mandarin Duck

Wood Duck

Birds candidate parts

Birds: Candidate Parts

Mandarin Duck


Objects without characteristic texture

Objects without Characteristic Texture


Summary of talk

Summary of Talk

  • Recognition of single-texture images

    • Distribution of local appearance descriptors

  • Recognition of individual regions in multi-texture images

    • Local appearance + loose statistical neighborhood relations

  • Recognition of object categories

    • Local appearance + strong geometric relations

      For more information: http://www-cvr.ai.uiuc.edu/ponce_grp

Issues extensions

Issues, Extensions

  • Weakly supervised learning

    • Evaluation methods?

    • Learning from contaminated data?

  • Probabilistic vs. geometric approaches to invariance

  • EM vs. direct correspondence search

  • Training set size

  • Background modeling

  • Strengthening the representation

    • Heterogeneous local features

    • Automatic feature selection

    • Inter-part relations

  • Login