Learning local affine representations for texture and object recognition
1 / 49

Learning Local Affine Representations for Texture and Object Recognition - PowerPoint PPT Presentation

  • Uploaded on

Learning Local Affine Representations for Texture and Object Recognition. Svetlana Lazebnik Beckman Institute, University of Illinois at Urbana-Champaign (joint work with Cordelia Schmid, Jean Ponce). Overview. Goal: Recognition of 3D textured surfaces, object classes Our contribution:

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Learning Local Affine Representations for Texture and Object Recognition' - denver

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Learning local affine representations for texture and object recognition

Learning Local Affine Representations for Texture and Object Recognition

Svetlana Lazebnik

Beckman Institute, University of Illinois at Urbana-Champaign

(joint work with Cordelia Schmid, Jean Ponce)

Overview Recognition

  • Goal:

    • Recognition of 3D textured surfaces, object classes

  • Our contribution:

    • Texture and object representations based on local affine regions

  • Advantages of proposed approach:

    • Distinctive, repeatable primitives

    • Robustness to clutter and occlusion

    • Ability to approximate 3D geometric transformations

The scope
The Scope Recognition

  • Recognition of single-texture images (CVPR 2003)

  • Recognition of individual texture regions in multi-texture images (ICCV 2003)

  • Recognition of object classes (BMVC 2004, work in progress)

Affine region detectors
Affine Region Detectors Recognition

Harris detector (H)

Laplacian detector (L)

Mikolajczyk & Schmid (2002), Gårding & Lindeberg (1996)

Affine rectification process
Affine Rectification Process Recognition

Patch 1

Patch 2

Rectified patches (rotational ambiguity)

Rotation invariant descriptors 1 spin images
Rotation-Invariant Descriptors 1: RecognitionSpin Images

  • Based on range spin images (Johnson & Hebert 1998)

  • Two-dimensional histogram: distance from center × intensity value

Rotation invariant descriptors 2 rift
Rotation-Invariant Descriptors 2: RIFT Recognition

  • Based on SIFT (Lowe 1999)

  • Two-dimensional histogram: distance from center × gradient orientation

  • Gradient orientation is measured w.r.t. to the direction pointing from the center of the patch

Signatures and emd
Signatures and EMD Recognition

  • SignaturesS = {(m1, w1), … , (mk, wk)}mi — cluster centerwi — relative weight

  • Earth Mover’s Distance (Rubner et al. 1998)

    • Computed from ground distances d(mi, m'j)

    • Can compare signatures of different sizes

    • Insensitive to the number of clusters

Database textured surfaces
Database: Textured Surfaces Recognition

25 textures, 40 sample images each (640x480)

Evaluation Recognition

  • Channels: HS, HR, LS, LR

    • Combined through addition of EMD matrices

  • Classification results

    • 10 training images per class, rates averaged over 200 random training subsets

Results of evaluation classification rate vs number of training samples

(H+L)(S+R) Recognition



Results of Evaluation:Classification rate vs. number of training samples

  • Conclusion: an intrinsically invariant representation is necessary to deal with intra-class variations when they are not adequately represented in the training set

Summary Recognition

  • A sparse texture representation based on local affine regions

  • Two novel descriptors (spin images, RIFT)

  • Successful recognition in the presence of viewpoint changes, non-rigidity, non-homogeneity

  • A flexible approach to invariance

2 recognition of individual regions in multi texture images
2. Recognition of Individual Regions in Multi-Texture Images Recognition

  • A two-layer architecture:

    • Local appearance + neighborhood relations

  • Learning:

    • Represent the local appearance of each texture class using a mixture-of-Gaussians model

    • Compute co-occurrence statistics of sub-class labels over affinely adapted neighborhoods

  • Recognition:

    • Obtain initial class membership probabilities from the generative model

    • Use relaxation to refine these probabilities

Two learning scenarios
Two Learning Scenarios Recognition

  • Fully supervised: every region in the training image is labeled with its texture class

  • Weakly supervised: each training image is labeled with the classes occurring in it


brick, marble, carpet

Neighborhood statistics
Neighborhood Statistics Recognition

  • Estimate:

  • probability p(c,c')

  • correlation r(c,c')

Neighborhood definition

Relaxation rosenfeld et al 1976
Relaxation (Rosenfeld et al. 1976) Recognition

  • Iterative process:

    • Initialized with posterior probabilities p(c|xi) obtained from the generative model

    • For each region i and each sub-class label c, update the probability pi(c) based on neighbor probabilities pj(c') and correlations r(c,c')

  • Shortcomings:

    • No formal guarantee of convergence

    • After the initialization, the updates to the probability values do not depend on the image data

Experiment 1 3d textured surfaces
Experiment 1: 3D Textured Surfaces Recognition

Single-texture images

T1 (brick)

T2 (carpet)

T3 (chair)

T4 (floor 1)

T5 (floor 2)

T6 (marble)

T7 (wood)

Multi-texture images

10 single-texture training images per class, 13 two-texture training images, 45 multi-texture test images

Effect of relaxation on labeling
Effect of Relaxation on Labeling Recognition

Original image

Top: before relaxation, bottom: after relaxation

Retrieval Recognition

(single-texture training images)

T1 (brick)

T2 (carpet)

T3 (chair)

T4 (floor 1)

T5 (floor 2)

T6 (marble)

T7 (wood)

Experiment 2 animals
Experiment 2: Animals Recognition

  • No manual segmentation

  • Training data: 10 sample images per class

  • Test data: 20 samples per class + 20 negative images

cheetah, background

zebra, background

giraffe, background

Cheetah results
Cheetah Results Recognition

Zebra results
Zebra Results Recognition

Giraffe results
Giraffe Results Recognition

Future work

Summary Recognition

Future Work

  • A two-level representation (local appearance + neighborhood relations)

  • Weakly supervised learning of texture models

  • Design an improved representation using a random field framework, e.g., conditional random fields (Lafferty 2001, Kumar & Hebert 2003)

  • Develop a procedure for weakly supervised learning of random field parameters

  • Apply method to recognition of natural texture categories

3 recognition of object classes
3. Recognition of Object Classes Recognition

The approach:

  • Represent objects using multiple composite semi-local affine parts

    • More expressive than individual regions

    • Not globally rigid

  • Correspondence search is key to learning and detection

Correspondence search
Correspondence Search Recognition

  • Basic operation: a two-image matching procedure for finding collections of affine regions that can be mapped onto each other using a single affine transformation

  • Implementation: greedy search based on geometric and photometric consistency constraints

    • Returns multiple correspondence hypotheses

    • Automatically determines number of regions in correspondence

    • Works on unsegmented, cluttered images (weakly supervised learning)


Matching 3d objects1
Matching: 3D Objects Recognition



Matching faces
Matching: Faces Recognition

spurious match ???

Finding symmetries
Finding Symmetries Recognition

Learning object models for recognition
Learning Object Models for Recognition Recognition

  • Match multiple pairs of training images to produce a set of candidate parts

  • Use additional validation images to evaluate repeatability of parts and individual regions

  • Retain a fixed number of parts having the best repeatability score

Recognition experiment butterflies
Recognition Experiment: Butterflies Recognition

Admiral Swallowtail Machaon Monarch 1 Monarch 2 Peacock Zebra

  • 16 training images (8 pairs) per class

  • 10 validation images per class

  • 437 test images

  • 619 images total

Butterfly parts
Butterfly Parts Recognition

Recognition Recognition

  • Top 10 parts per class used for recognition

  • Relative repeatability score:

  • Classification results:

total number of regions detectedtotal part size

Total part size (smallest/largest)

Classification rate vs number of parts
Classification Rate vs. RecognitionNumber of Parts

Detection results roc curves
Detection Results (ROC Curves) Recognition

Circles: reference relative repeatability rates. Red square: ROC equal error rate (in parentheses)

Successful detection examples
Successful Detection Examples Recognition

Training images

Test images (blue: occluded regions)

All ellipses found in the test images

Unsuccessful detection examples
Unsuccessful Detection Examples Recognition

Training images

Test images (blue: occluded regions)

All ellipses found in the test image


Summary Recognition


  • Semi-local affine parts for describing structure of 3D objects

  • Finding a part vocabulary:

    • Correspondence search between pairs of images

    • Validation

  • Additional application:

    • Finding symmetry and repetition

Future Work

  • Find a better affine region detector

  • Represent, learn inter-part relations

  • Evaluation: CalTech database, harder classes, etc.

Birds Recognition



Snowy Owl

Mandarin Duck

Wood Duck

Birds candidate parts
Birds: Candidate Parts Recognition

Mandarin Duck


Objects without characteristic texture
Objects without Characteristic Texture Recognition


Summary of talk
Summary of Talk Recognition

  • Recognition of single-texture images

    • Distribution of local appearance descriptors

  • Recognition of individual regions in multi-texture images

    • Local appearance + loose statistical neighborhood relations

  • Recognition of object categories

    • Local appearance + strong geometric relations

      For more information: http://www-cvr.ai.uiuc.edu/ponce_grp

Issues extensions
Issues, Extensions Recognition

  • Weakly supervised learning

    • Evaluation methods?

    • Learning from contaminated data?

  • Probabilistic vs. geometric approaches to invariance

  • EM vs. direct correspondence search

  • Training set size

  • Background modeling

  • Strengthening the representation

    • Heterogeneous local features

    • Automatic feature selection

    • Inter-part relations