visual grouping and recognition n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Visual Grouping and Recognition PowerPoint Presentation
Download Presentation
Visual Grouping and Recognition

Loading in 2 Seconds...

play fullscreen
1 / 103

Visual Grouping and Recognition - PowerPoint PPT Presentation


  • 114 Views
  • Uploaded on

Visual Grouping and Recognition. Jitendra Malik U.C. Berkeley. Collaborators. Grouping: Jianbo Shi (CMU), Serge Belongie (UCSD) , Thomas Leung (Fuji) Database of human segmented images and ecological statistics: David Martin, Charless Fowlkes, Xiaofeng Ren

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Visual Grouping and Recognition' - lyndon


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
visual grouping and recognition

Visual Grouping and Recognition

Jitendra Malik

U.C. Berkeley

collaborators
Collaborators
  • Grouping: Jianbo Shi (CMU), Serge Belongie (UCSD) , Thomas Leung (Fuji)
  • Database of human segmented images and ecological statistics: David Martin, Charless Fowlkes, Xiaofeng Ren
  • Recognition: Serge Belongie, Jan Puzicha
the visual system performs
The visual system performs
  • Inference of lightness, shape and spatial relations
  • Perceptual Organization
  • Active interaction with environment
a brief history of vision science
A brief history of vision science
  • 1850-1900
    • Trichromacy, stereopsis, eye movements, contrast, visual acuity..
  • 1900-1950
    • Apparent movement, grouping, figure-ground..
  • 1950-2000
    • Ecological optics, geometrical analysis of shape cues, physiology of V1 and extra-striate areas..
the debate and sometimes both were right
The debate..(and sometimes both were right !)
  • Helmholtz argued that perception is unconscious inference. Associations are earned through experience.
  • Hering proposed physiological mechanisms—opponent color channels, contrast mechanisms, conjunctive and disjunctive eye movements..
the twentieth century
The Twentieth Century..
  • The Gestalt movement emphasized perceptual organization.
    • Grouping
    • Figure/ground
    • Configuration effects on perception of brightness and lightness
gibson s ecological optics 1950
Gibson’s ecological optics (1950)
  • Emphasized richness of information about shape and surface layout available to a moving observer
    • Optical flow
    • Texture Gradients
    • ( and the classical cues such as stereopsis etc)
the visual system performs1
The visual system performs
  • Inference of lightness, shape and spatial relations
  • Perceptual Organization
  • Active interaction with environment
what enables us to parse a scene
What enables us to parse a scene?
  • Low level cues
    • Color/texture
    • Contours
    • Motion
  • Mid level cues
    • T-junctions
    • Convexity
  • High level Cues
    • Familiar Object
    • Familiar Motion
focus of this talk
Focus of this talk
  • Provide a mathematical foundation for the grouping problem in terms of the ecological statistics of natural images.
    • This research agenda was first proposed by Egon Brunswik, more than 50 years ago, who sought to justify Gestalt grouping factors in probabilistic terms.
outline of talk
Outline of talk
  • Creating a dataset of human segmented images
  • Measuring ecological statistics of various Gestalt grouping factors
  • Using these measurements to calibrate and validate approaches to grouping
outline of talk1
Outline of talk
  • Creating a dataset of human segmented images
  • Measuring ecological statistics of various Gestalt grouping factors
  • Using these measurements to calibrate and validate approaches to grouping
what kind of segmentations
What kind of segmentations?
  • What is a valid segmentation?
  • Is there a correct segmentation?
  • What granularity?
the image dataset
The Image Dataset
  • 1000 Corel images
    • Photographs of natural scenes
    • Texture is common
    • Large variety of subject matter
    • 481 x 321 x 24b
establishing ground truth
Establishing Ground truth
  • Def: Segmentation

= Partition of image pixels into exclusive sets

  • Custom tool to facilitate manual segmentation
    • Java application, on website
  • Multiple segmentations/image
  • Currently: 1000 images, 5000 segmentations, 20 subjects
    • Data collection ongoing
  • Naïve subjects (UCB undergrads) given simple, non-technical instructions
directions to image segmentors
Directions to Image Segmentors
  • You will be presented a photographic image
  • Divide the image into some number of segments, where the segments represent “things” or “parts of things” in the scene
  • The number of segments is up to you, as it depends on the image. Something between 2 and 30 is likely to be appropriate.
  • It is important that all of the segments have approximately equal importance.
perceptual organization produces a hierarchy
Perceptual organization produces a hierarchy

image

Each subject picks a cross

section from

this hierarchy

background

left bird

right bird

beak

grass

bush

far

beak

eye

head

body

eye

head

body

quantifying inconsistency

refinement of

S1

S2

Quantifying inconsistency..

How much is segmentation S1 a refinement of segmentation S2 at pixel pi?

E(S1,S2,pi) = |(R(S1,pi)\R(S2,pi)|

|R(S1,pi)|

segmentation error measure
Segmentation Error Measure
  • One-way Local Refinement Error:

LRE(S1,S2,pi) = ||(R(S1,pi) \ R(S2,pi)||

||R(S1,pi)||

  • Segmentation Error defined to allow refinement in either direction at each pixel:

SE(S1,S2) = 1/n imin{LRE(S1,S2,pi), LRE(S2,S1,pi)}

gray color invneg datasets
Gray, Color, InvNeg Datasets
  • Explore how various high/low-level cues affect the task of image segmentation by subjects
    • Color = full color image
    • Gray = luminance image
    • InvNeg = inverted negative luminance image
slide39

Color

Gray

InvNeg

slide41

Color

Gray

InvNeg

gray vs color vs invneg segmentations
Gray vs. Color vs. InvNeg Segmentations

SE (gray, gray) = 0.047

SE (gray, color) = 0.047

SE (gray, invneg) = 0.059

  • Color may affect attention, but doesn’t seem to affect perceptual organization
  • InvNeg seems to interfere with high-level cues

2500 gray segmentations

2500 color segmentations

200 invneg segmentations

outline of talk2
Outline of talk
  • Creating a dataset of human segmented images
  • Measuring ecological statistics of various Gestalt grouping factors
  • Using these measurements to calibrate and validate approaches to grouping
natural images aren t generic signals
Natural images aren’t generic signals
  • Filter statistics are far from Gaussian..
    • Ruderman 1994,1997
    • Field, Olshausen 1996
    • Huang,Mumford 1999,2000
    • Buccigrossi,Simoncelli 1999
  • These properties (e.g. scale-invariance, sparsity, heavy tails) can be exploited for image compression.
quantifying the power of cues
Quantifying the power of cues
  • Bayes Risk
  • Mutual information
mutual information
Mutual information

where x is a cue and y is indicator of being in same segment

distribution of region area
Distribution of Region Area

y = Kx-

 = 0.913

distribution of length
Distribution of length
  • Decompose contours at high curvature extrema
distribution of length2
Distribution of Length

Slope = 2.05 in Log-Log Plot

I.e, frequency  1 / ( length )^2

( for region area it’s roughly 1/area )

outline of talk3
Outline of talk
  • Creating a dataset of human segmented images
  • Measuring ecological statistics of various Gestalt grouping factors
  • Using these measurements to calibrate and validate approaches to grouping
computational mechanisms for visual grouping

Computational Mechanisms for Visual Grouping

Jitendra Malik, Serge Belongie, Jianbo Shi, Thomas Leung

U.C. Berkeley

edge based image segmentation
Edge-based image segmentation
  • Edge detection by gradient operators
  • Linking by dynamic programming, voting, relaxation, …

Montanari 71, Parent&Zucker 89, Guy&Medioni 96, Shaashua&Ullman 88

Williams&Jacobs 95, Geiger&Kumaran 96, Heitger&von der Heydt 93

- Natural for encoding curvilinear grouping

- Hard decisions often made prematurely

- Produce meaningless clutter in textured regions

region based image segmentation
Region-based image segmentation
  • 1970s produced region growing, split-and-merge, etc...
  • 1980s led to approaches based on a global criterion for image segmentation
    • Markov Random Fields e.g. Geman&Geman 84
    • Variational approaches e.g. Mumford&Shah 89
    • Expectation-Maximization e.g. Ayer&Sawhney 95, Weiss 97
  • Global method, but computational complexity precludes exact MAP estimation
    • Curvilinear grouping not easily enforced
    • Unable to handle line-drawings
    • Problems due to local minima
our approach
Our Approach
  • Global decision good, local bad
    • Formulate as hierarchical graph partitioning
  • Efficient computation
    • Draw on ideas from spectral graph theory to define an eigenvalue problem which can be solved for finding segmentation.
  • Develop suitable encoding of visual cues in terms of graph weights.
image segmentation as graph partitioning
Image Segmentation as Graph Partitioning

Build a weighted graph G=(V,E) from image

V: image pixels

E: connections between pairs of nearby pixels

Partition graph so that similarity within group is large and similarity between groups is small -- Normalized Cuts [Shi&Malik 97]

normalized cuts as a spring mass system
Normalized Cuts as a Spring-Mass system
  • Each pixel is a point mass; each connection is a spring:
  • Fundamental modes are generalized eigenvectors of
some terminology for graph partitioning
Some Terminology for Graph Partitioning
  • How do we bipartition a graph:
normalized cut a measure of dissimilarity
Normalized Cut, A measure of dissimilarity
  • Minimum cut is not appropriate since it favors cutting small pieces.
  • Normalized Cut, Ncut:
normalized cut and normalized association
Normalized Cut and Normalized Association
  • Minimizing similarity between the groups, and maximizing similarity within the groups can be achieved simultaneously.
solving the normalized cut problem
Solving the Normalized Cut problem
  • Exact discrete solution to Ncut is NP-complete even on regular grid,
    • [Papadimitriou’97]
  • Drawing on spectral graph theory, good approximation can be obtained by solving a generalized eigenvalue problem.
normalized cut as generalized eigenvalue problem
Normalized Cut As Generalized Eigenvalue problem
  • Rewriting Normalized Cut in matrix form:
normalized cut as generalized eigenvalue problem2
Normalized Cut As Generalized Eigenvalue problem
  • The eigenvector with the second smallest eigenvalue of the generalized eigensystem:
  • is the solution to the constrained Raleigh quotient:
interpretation as a dynamical system
Interpretation as a Dynamical System
  • The equivalent spring-mass system:
  • The generalized eigenvectors are the fundamental modes of oscillation.
computational aspects
Computational Aspects
  • Solving for the generalized eigensystem:
  • (D-W) is of size , but it is sparse with O(N) nonzero entries, where N is the number of pixels.
  • Using Lanczos algorithm.
overall procedure
Overall Procedure
  • Construct a weighted graph G=(V,E) from an image
  • Connect each pair of pixels, and assign graph edge weight,
  • Solve for the smallest few eigenvectors,
  • Recursively subdivide if Ncut value is below a pre-specified value.
normalized cuts approach
Normalized Cuts Approach
  • Global decision good, local bad
    • Formulate as hierarchical graph partitioning
  • Efficient computation
    • Draw on ideas from spectral graph theory to define an eigenvalue problem which can be solved for finding segmentation.
  • Develop suitable encoding of visual cues in terms of graph weights.
cue integration
Cue Integration
  • based on Texton histograms
  • based on Intervening contour
filters for texture description
Filters for Texture Description
  • Elongated directional Gaussian derivatives
  • 2nd derivative and Hilbert transform
  • L1 normalized for scale invariance
  • 6 orientations, 3 scales
  • Zero mean
textons
Textons
  • K-means on vectors of filter responses
benefits of the texton representation
Benefits of the Texton Representation
  • Discrete point sets well suited to tools of computational geometry, point process statistics
  • Defining Local Scale Selection
  • Measuring Texture Similarity
texton histograms
Texton Histograms

Chi square test:

i

0.1

j

k

0.8

intervening contours
Intervening Contours

as and are more likely to belong to the same region than are and .

estimating for contour cue
Estimating for contour cue

Image

Orientation Energy

  • Estimate where is the maximum orientation energy along segment ij
orientation energy
Orientation Energy
  • Gaussian 2nd derivative and its Hilbert pair
  • Can detect combination of bar and edge features; also insensitive to linear shading [Perona&Malik 90]
  • Multiple scales
challenges of cue integration
Challenges of Cue Integration
  • Contour cue tends to fragment textured regions
  • Texture cue tends to create 1D regions from contours
contour as a problem for texture processing
Contour as a problem for texture processing

Segmentation based on Gaussian Mixture Model EM

cue integration1
Cue Integration
  • Gate contour vs. texture cue based on region-boundary vs. region-interior label
  • Compute boundary vs. interior label using statistical test on region uniformity
  • Multiply to get combined weight:
motion segmentation with normalized cuts
Motion Segmentation with Normalized Cuts
  • Networks of spatial-temporal connections:
motion segmentation with normalized cuts1
Motion Segmentation with Normalized Cuts
  • Motion “proto-volume” in space-time
  • Group correspondence
results
Results
  • video
framework for recognition
Framework for Recognition

(1) Segmentation

PixelsSegments

(2) Association

SegmentsRegions

(3) Matching

RegionsPrototypes

~10 views/object. Matching tolerant to pose/illumination changes, intra-category variation, error in previous steps

Over-segmentation necessary; Under-segmentation fatal

Enumerate: # of size k regions in image with n segments is ~(4**k)*n/k