- 124 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'Visual Grouping and Recognition' - lyndon

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Outline of talk### Computational Mechanisms for Visual Grouping

Collaborators

- Grouping: Jianbo Shi (CMU), Serge Belongie (UCSD) , Thomas Leung (Fuji)
- Database of human segmented images and ecological statistics: David Martin, Charless Fowlkes, Xiaofeng Ren
- Recognition: Serge Belongie, Jan Puzicha

The visual system performs

- Inference of lightness, shape and spatial relations
- Perceptual Organization
- Active interaction with environment

A brief history of vision science

- 1850-1900
- Trichromacy, stereopsis, eye movements, contrast, visual acuity..
- 1900-1950
- Apparent movement, grouping, figure-ground..
- 1950-2000
- Ecological optics, geometrical analysis of shape cues, physiology of V1 and extra-striate areas..

The debate..(and sometimes both were right !)

- Helmholtz argued that perception is unconscious inference. Associations are earned through experience.
- Hering proposed physiological mechanisms—opponent color channels, contrast mechanisms, conjunctive and disjunctive eye movements..

The Twentieth Century..

- The Gestalt movement emphasized perceptual organization.
- Grouping
- Figure/ground
- Configuration effects on perception of brightness and lightness

Gibson’s ecological optics (1950)

- Emphasized richness of information about shape and surface layout available to a moving observer
- Optical flow
- Texture Gradients
- ( and the classical cues such as stereopsis etc)

The visual system performs

- Inference of lightness, shape and spatial relations
- Perceptual Organization
- Active interaction with environment

What enables us to parse a scene?

- Low level cues
- Color/texture
- Contours
- Motion
- Mid level cues
- T-junctions
- Convexity
- High level Cues
- Familiar Object
- Familiar Motion

Focus of this talk

- Provide a mathematical foundation for the grouping problem in terms of the ecological statistics of natural images.
- This research agenda was first proposed by Egon Brunswik, more than 50 years ago, who sought to justify Gestalt grouping factors in probabilistic terms.

Outline of talk

- Creating a dataset of human segmented images
- Measuring ecological statistics of various Gestalt grouping factors
- Using these measurements to calibrate and validate approaches to grouping

Outline of talk

- Creating a dataset of human segmented images
- Measuring ecological statistics of various Gestalt grouping factors
- Using these measurements to calibrate and validate approaches to grouping

What kind of segmentations?

- What is a valid segmentation?
- Is there a correct segmentation?
- What granularity?

The Image Dataset

- 1000 Corel images
- Photographs of natural scenes
- Texture is common
- Large variety of subject matter
- 481 x 321 x 24b

Establishing Ground truth

- Def: Segmentation

= Partition of image pixels into exclusive sets

- Custom tool to facilitate manual segmentation
- Java application, on website
- Multiple segmentations/image
- Currently: 1000 images, 5000 segmentations, 20 subjects
- Data collection ongoing
- Naïve subjects (UCB undergrads) given simple, non-technical instructions

Directions to Image Segmentors

- You will be presented a photographic image
- Divide the image into some number of segments, where the segments represent “things” or “parts of things” in the scene
- The number of segments is up to you, as it depends on the image. Something between 2 and 30 is likely to be appropriate.
- It is important that all of the segments have approximately equal importance.

Perceptual organization produces a hierarchy

image

Each subject picks a cross

section from

this hierarchy

background

left bird

right bird

beak

grass

bush

far

beak

eye

head

body

eye

head

body

S1

S2

Quantifying inconsistency..How much is segmentation S1 a refinement of segmentation S2 at pixel pi?

E(S1,S2,pi) = |(R(S1,pi)\R(S2,pi)|

|R(S1,pi)|

Segmentation Error Measure

- One-way Local Refinement Error:

LRE(S1,S2,pi) = ||(R(S1,pi) \ R(S2,pi)||

||R(S1,pi)||

- Segmentation Error defined to allow refinement in either direction at each pixel:

SE(S1,S2) = 1/n imin{LRE(S1,S2,pi), LRE(S2,S1,pi)}

Gray, Color, InvNeg Datasets

- Explore how various high/low-level cues affect the task of image segmentation by subjects
- Color = full color image
- Gray = luminance image
- InvNeg = inverted negative luminance image

Gray vs. Color vs. InvNeg Segmentations

SE (gray, gray) = 0.047

SE (gray, color) = 0.047

SE (gray, invneg) = 0.059

- Color may affect attention, but doesn’t seem to affect perceptual organization
- InvNeg seems to interfere with high-level cues

2500 gray segmentations

2500 color segmentations

200 invneg segmentations

Outline of talk

- Creating a dataset of human segmented images
- Measuring ecological statistics of various Gestalt grouping factors
- Using these measurements to calibrate and validate approaches to grouping

Natural images aren’t generic signals

- Filter statistics are far from Gaussian..
- Ruderman 1994,1997
- Field, Olshausen 1996
- Huang,Mumford 1999,2000
- Buccigrossi,Simoncelli 1999
- These properties (e.g. scale-invariance, sparsity, heavy tails) can be exploited for image compression.

Quantifying the power of cues

- Bayes Risk
- Mutual information

Mutual information

where x is a cue and y is indicator of being in same segment

Distribution of length

- Decompose contours at high curvature extrema

Distribution of Length

Slope = 2.05 in Log-Log Plot

I.e, frequency 1 / ( length )^2

( for region area it’s roughly 1/area )

Scale invariance of contour statistics

- Chi-square distance

- Creating a dataset of human segmented images
- Measuring ecological statistics of various Gestalt grouping factors
- Using these measurements to calibrate and validate approaches to grouping

Jitendra Malik, Serge Belongie, Jianbo Shi, Thomas Leung

U.C. Berkeley

Edge-based image segmentation

- Edge detection by gradient operators
- Linking by dynamic programming, voting, relaxation, …

Montanari 71, Parent&Zucker 89, Guy&Medioni 96, Shaashua&Ullman 88

Williams&Jacobs 95, Geiger&Kumaran 96, Heitger&von der Heydt 93

- Natural for encoding curvilinear grouping

- Hard decisions often made prematurely

- Produce meaningless clutter in textured regions

Region-based image segmentation

- 1970s produced region growing, split-and-merge, etc...
- 1980s led to approaches based on a global criterion for image segmentation
- Markov Random Fields e.g. Geman&Geman 84
- Variational approaches e.g. Mumford&Shah 89
- Expectation-Maximization e.g. Ayer&Sawhney 95, Weiss 97
- Global method, but computational complexity precludes exact MAP estimation
- Curvilinear grouping not easily enforced
- Unable to handle line-drawings
- Problems due to local minima

Our Approach

- Global decision good, local bad
- Formulate as hierarchical graph partitioning
- Efficient computation
- Draw on ideas from spectral graph theory to define an eigenvalue problem which can be solved for finding segmentation.
- Develop suitable encoding of visual cues in terms of graph weights.

Image Segmentation as Graph Partitioning

Build a weighted graph G=(V,E) from image

V: image pixels

E: connections between pairs of nearby pixels

Partition graph so that similarity within group is large and similarity between groups is small -- Normalized Cuts [Shi&Malik 97]

Normalized Cuts as a Spring-Mass system

- Each pixel is a point mass; each connection is a spring:
- Fundamental modes are generalized eigenvectors of

Some Terminology for Graph Partitioning

- How do we bipartition a graph:

Normalized Cut, A measure of dissimilarity

- Minimum cut is not appropriate since it favors cutting small pieces.
- Normalized Cut, Ncut:

Normalized Cut and Normalized Association

- Minimizing similarity between the groups, and maximizing similarity within the groups can be achieved simultaneously.

Solving the Normalized Cut problem

- Exact discrete solution to Ncut is NP-complete even on regular grid,
- [Papadimitriou’97]
- Drawing on spectral graph theory, good approximation can be obtained by solving a generalized eigenvalue problem.

Normalized Cut As Generalized Eigenvalue problem

- Rewriting Normalized Cut in matrix form:

Normalized Cut As Generalized Eigenvalue problem

- after simplification, we get

Normalized Cut As Generalized Eigenvalue problem

- The eigenvector with the second smallest eigenvalue of the generalized eigensystem:
- is the solution to the constrained Raleigh quotient:

Interpretation as a Dynamical System

- The equivalent spring-mass system:
- The generalized eigenvectors are the fundamental modes of oscillation.

Computational Aspects

- Solving for the generalized eigensystem:
- (D-W) is of size , but it is sparse with O(N) nonzero entries, where N is the number of pixels.
- Using Lanczos algorithm.

Overall Procedure

- Construct a weighted graph G=(V,E) from an image
- Connect each pair of pixels, and assign graph edge weight,
- Solve for the smallest few eigenvectors,
- Recursively subdivide if Ncut value is below a pre-specified value.

Normalized Cuts Approach

- Global decision good, local bad
- Formulate as hierarchical graph partitioning
- Efficient computation
- Draw on ideas from spectral graph theory to define an eigenvalue problem which can be solved for finding segmentation.
- Develop suitable encoding of visual cues in terms of graph weights.

Cue Integration

- based on Texton histograms
- based on Intervening contour

Filters for Texture Description

- Elongated directional Gaussian derivatives
- 2nd derivative and Hilbert transform
- L1 normalized for scale invariance
- 6 orientations, 3 scales
- Zero mean

Textons

- K-means on vectors of filter responses

Benefits of the Texton Representation

- Discrete point sets well suited to tools of computational geometry, point process statistics
- Defining Local Scale Selection
- Measuring Texture Similarity

Intervening Contours

as and are more likely to belong to the same region than are and .

Estimating for contour cue

Image

Orientation Energy

- Estimate where is the maximum orientation energy along segment ij

Orientation Energy

- Gaussian 2nd derivative and its Hilbert pair
- Can detect combination of bar and edge features; also insensitive to linear shading [Perona&Malik 90]
- Multiple scales

Challenges of Cue Integration

- Contour cue tends to fragment textured regions
- Texture cue tends to create 1D regions from contours

Contour as a problem for texture processing

Segmentation based on Gaussian Mixture Model EM

Cue Integration

- Gate contour vs. texture cue based on region-boundary vs. region-interior label
- Compute boundary vs. interior label using statistical test on region uniformity
- Multiply to get combined weight:

Motion Segmentation with Normalized Cuts

- Networks of spatial-temporal connections:

Motion Segmentation with Normalized Cuts

- Motion “proto-volume” in space-time
- Group correspondence

Results

- video

Framework for Recognition

(1) Segmentation

PixelsSegments

(2) Association

SegmentsRegions

(3) Matching

RegionsPrototypes

~10 views/object. Matching tolerant to pose/illumination changes, intra-category variation, error in previous steps

Over-segmentation necessary; Under-segmentation fatal

Enumerate: # of size k regions in image with n segments is ~(4**k)*n/k

Download Presentation

Connecting to Server..