slide1
Download
Skip this Video
Download Presentation
“ Pixels that Sound ” Find pixels that correspond (correlate !?) to sound

Loading in 2 Seconds...

play fullscreen
1 / 30

“ Pixels that Sound ” Find pixels that correspond (correlate !?) to sound - PowerPoint PPT Presentation


  • 77 Views
  • Uploaded on

34. “ Pixels that Sound ” Find pixels that correspond (correlate !?) to sound. Kidron, Schechner, Elad, CVPR 2005. 47. Audio-Visual Analysis: Applications. Lip reading – detection of lips (or person) Slaney, Covell (2000) Bregler, Konig (1994) Analysis and synthesis of music from motion

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about '“ Pixels that Sound ” Find pixels that correspond (correlate !?) to sound' - lana-kane


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1
34

“ Pixels that Sound ”

Find pixels that correspond (correlate !?) to sound

Kidron, Schechner, Elad, CVPR 2005

audio visual analysis applications
47

Audio-Visual Analysis: Applications

  • Lip reading – detection of lips (or person)
    • Slaney, Covell (2000)
    • Bregler, Konig (1994)
  • Analysis and synthesis of music from motion
    • Murphy, Andersen, Jensen (2003)
  • Source separation based on vision
    • Li, Dimitrova, Li, Sethi (2003)
    • Smaragdis, Casey (2003)
    • Nock, Iyengar, Neti (2002)
    • Fisher, Darrell, Freeman, Viola (2001)
    • Hershey, Movellan (1999)
  • Tracking
  • Vermaak, Gangnet, Blake, Pérez (2001)
  • Biological systems
    • Gutfreund, Zheng, Knudsen (2002)
problem different modalities
47

audio-visual analysis

microphone

camera

Problem: Different Modalities

Audio data

44.1 KHz, few bands

Not stereophonic

Visual data

25 frames/sec

Each frame: 576 x 720 pixels

Kidron, Schechner, Elad, Pixels that Sound

slide4
54

Not

Typical

  • Cluster of pixels -
  • linear superposition
  • Canonical Correlation Analysis (CCA)
  • Smaragdis, Casey (2003)
  • Li, Dimitrova, Li, Sethi (2003)
  • Slaney, Covell (2000)

Ill-posed

(lack of data)

  • Mutual Information (MI)
  • Fisher et. al. (2001)
  • Cutler, Davis (2000)
  • Bregler,Konig (1994)

highly complex

Previous Work

  • Pointwise correlation
    • Nock, Iyengar, Neti (2002)
    • Hershey, Movellan (1999)
slide5
49

Pixel #2

Band #2

Band #1

Pixel #1

Pixel #3

CCA

Optimal

Optimal visual components

Projection

Projection

Video

Audio

Kidron, Schechner, Elad, Pixels that Sound

visual projection
40

Visual Projection

v

  • Video features
  • Pixels intensity
  • Transform coeff (wavelet)
  • Image differences

1D

variable

3

40

120

52

68

74

36

859

Projection

audio projection
41

Audio Projection

a

1D

variable

  • Audio features
  • Average energy per frame
  • Transform coeffs per frame

Projection

canonical correlation
42

Audio

Video

Canonical Correlation

Representation

Projections

(per time window)

Random variables

(time dependent)

Correlation coefficient

cca formulation
43

Canonical

Correlation

Largest Eigenvalue

equivalent to

Corresponding Eigenvectors

  • yield an eigenvalue problem:
      • Knutsson, Borga, Landelius (1995)

CCA Formulation

Projections

visual data
51

t

(frames)

Spatial Location

(pixels intensities)

Visual Data

Kidron, Schechner, Elad, Pixels that Sound

rank deficiency
44

t

(frames)

Spatial Location

(pixels intensities)

=

Rank Deficiency

Kidron, Schechner, Elad, Pixels that Sound

ill posedness
46

Impossible to invert !!!

Ill-Posedness

  • Prior solutions:
  • Use many more frames  poor temporal resolution.
  • Aggressive spatial pruning  poor spatial resolution.
  • Trivial regularization
a general problem
47

Large number of weights

AGeneral Problem

Small amount of data

The problem is ILL-POSED

Over fitting is likely

slide15
48

Minimizing

Maximizing

An Equivalent Problem

slide16
49

A has a single column, and

Known

data

Minimizing

Single Audio Band

(The denominator is non-zero)

slide17
52

Full correlation if

a(1)

a(2)

a(ti)

a(30)

=

Time

a

V

Underdetermined system !

Kidron, Schechner, Elad, Pixels that Sound

end

slide18
52

“Out of clutter, find simplicity.

From discord, find harmony.”

Albert Einstein

Detected correlated pixels

end

slide19
53
  • Non-convex
  • Exponential complexity

minimum

-norm

Sparse Solution

slide20
54
  • Sparse
  • Convex
  • Polynomial complexity

minimum

-norm

in common situations

The -norm criterion

Donoho, Elad (2005)

slide21
55

-norm (pseudo-inverse, SVD, QR)

Solving using

Energy spread

minimum

-norm

The Minimum Norm Solution

slide22
56

Audio-visual events

No parameters to tweak

Maximum correlation: Eigenproblem

Minimum objective function G

Linear programming

Fully correlated

Sparse

Polynomial

slide23
57
  • Convex
  • Linear

-ball

Multiple Audio Bands - Solution

The optimization problem:

Non-convex constraint

slide24
58

Optimization over each face is:

S2

S1

S3

S4

No parameters to tweak

Multiple Audio Bands

  • Each face: linear programming
slide25
Frame 9

Frame 42

Frame 68

Frame 115

Frame 146

Frame 169

Sharp & Dynamic, Despite Distraction

slide26
Frame 51

Frame 106

Frame 83

Frame 177

Performing in Audio Noise

  • Sparse
  • Localization on the proper elements
  • False alarm – temporally inconsistent
  • Handling dynamics
slide27
56

–norm: Energy Spread

Frame 146

Frame 83

Movie #1

Movie #2

slide28
57

–norm: Localization

Frame 146

Frame 83

Movie #1

Movie #2

slide29
The “Chorus Ambiguity”

Synchronized talk

Who’s talking?

  • Possible solutions:
  • Left
  • Right
  • Both

Not unique (ambiguous)

slide30
feature 2

feature 2

Both

feature 1

feature 1

-norm

-norm

The “Chorus Ambiguity”

ad