34
This presentation is the property of its rightful owner.
Sponsored Links
1 / 30

“ Pixels that Sound ” Find pixels that correspond (correlate !?) to sound PowerPoint PPT Presentation


  • 53 Views
  • Uploaded on
  • Presentation posted in: General

34. “ Pixels that Sound ” Find pixels that correspond (correlate !?) to sound. Kidron, Schechner, Elad, CVPR 2005. 47. Audio-Visual Analysis: Applications. Lip reading – detection of lips (or person) Slaney, Covell (2000) Bregler, Konig (1994) Analysis and synthesis of music from motion

Download Presentation

“ Pixels that Sound ” Find pixels that correspond (correlate !?) to sound

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Pixels that sound find pixels that correspond correlate to sound

34

“ Pixels that Sound ”

Find pixels that correspond (correlate !?) to sound

Kidron, Schechner, Elad, CVPR 2005


Audio visual analysis applications

47

Audio-Visual Analysis: Applications

  • Lip reading – detection of lips (or person)

    • Slaney, Covell (2000)

    • Bregler, Konig (1994)

  • Analysis and synthesis of music from motion

    • Murphy, Andersen, Jensen (2003)

  • Source separation based on vision

    • Li, Dimitrova, Li, Sethi (2003)

    • Smaragdis, Casey (2003)

    • Nock, Iyengar, Neti (2002)

    • Fisher, Darrell, Freeman, Viola (2001)

    • Hershey, Movellan (1999)

  • Tracking

  • Vermaak, Gangnet, Blake, Pérez (2001)

  • Biological systems

    • Gutfreund, Zheng, Knudsen (2002)


Problem different modalities

47

audio-visual analysis

microphone

camera

Problem: Different Modalities

Audio data

44.1 KHz, few bands

Not stereophonic

Visual data

25 frames/sec

Each frame: 576 x 720 pixels

Kidron, Schechner, Elad, Pixels that Sound


Pixels that sound find pixels that correspond correlate to sound

54

Not

Typical

  • Cluster of pixels -

  • linear superposition

  • Canonical Correlation Analysis (CCA)

  • Smaragdis, Casey (2003)

  • Li, Dimitrova, Li, Sethi (2003)

  • Slaney, Covell (2000)

Ill-posed

(lack of data)

  • Mutual Information (MI)

  • Fisher et. al. (2001)

  • Cutler, Davis (2000)

  • Bregler,Konig (1994)

highly complex

Previous Work

  • Pointwise correlation

    • Nock, Iyengar, Neti (2002)

    • Hershey, Movellan (1999)


Pixels that sound find pixels that correspond correlate to sound

49

Pixel #2

Band #2

Band #1

Pixel #1

Pixel #3

CCA

Optimal

Optimal visual components

Projection

Projection

Video

Audio

Kidron, Schechner, Elad, Pixels that Sound


Visual projection

40

Visual Projection

v

  • Video features

  • Pixels intensity

  • Transform coeff (wavelet)

  • Image differences

1D

variable

3

40

120

52

68

74

36

859

Projection


Audio projection

41

Audio Projection

a

1D

variable

  • Audio features

  • Average energy per frame

  • Transform coeffs per frame

Projection


Canonical correlation

42

Audio

Video

Canonical Correlation

Representation

Projections

(per time window)

Random variables

(time dependent)

Correlation coefficient


Cca formulation

43

Canonical

Correlation

Largest Eigenvalue

equivalent to

Corresponding Eigenvectors

  • yield an eigenvalue problem:

    • Knutsson, Borga, Landelius (1995)

CCA Formulation

Projections


Visual data

51

t

(frames)

Spatial Location

(pixels intensities)

Visual Data

Kidron, Schechner, Elad, Pixels that Sound


Rank deficiency

44

t

(frames)

Spatial Location

(pixels intensities)

=

Rank Deficiency

Kidron, Schechner, Elad, Pixels that Sound


Estimation of covariance

45

Estimation of Covariance

Rank deficient


Ill posedness

46

Impossible to invert !!!

Ill-Posedness

  • Prior solutions:

  • Use many more frames  poor temporal resolution.

  • Aggressive spatial pruning  poor spatial resolution.

  • Trivial regularization


A general problem

47

Large number of weights

AGeneral Problem

Small amount of data

The problem is ILL-POSED

Over fitting is likely


Pixels that sound find pixels that correspond correlate to sound

48

Minimizing

Maximizing

An Equivalent Problem


Pixels that sound find pixels that correspond correlate to sound

49

A has a single column, and

Known

data

Minimizing

Single Audio Band

(The denominator is non-zero)


Pixels that sound find pixels that correspond correlate to sound

52

Full correlation if

a(1)

a(2)

a(ti)

a(30)

=

Time

a

V

Underdetermined system !

Kidron, Schechner, Elad, Pixels that Sound

end


Pixels that sound find pixels that correspond correlate to sound

52

“Out of clutter, find simplicity.

From discord, find harmony.”

Albert Einstein

Detected correlated pixels

end


Pixels that sound find pixels that correspond correlate to sound

53

  • Non-convex

  • Exponential complexity

minimum

-norm

Sparse Solution


Pixels that sound find pixels that correspond correlate to sound

54

  • Sparse

  • Convex

  • Polynomial complexity

minimum

-norm

in common situations

The -norm criterion

Donoho, Elad (2005)


Pixels that sound find pixels that correspond correlate to sound

55

-norm (pseudo-inverse, SVD, QR)

Solving using

Energy spread

minimum

-norm

The Minimum Norm Solution


Pixels that sound find pixels that correspond correlate to sound

56

Audio-visual events

No parameters to tweak

Maximum correlation: Eigenproblem

Minimum objective function G

Linear programming

Fully correlated

Sparse

Polynomial


Pixels that sound find pixels that correspond correlate to sound

57

  • Convex

  • Linear

-ball

Multiple Audio Bands - Solution

The optimization problem:

Non-convex constraint


Pixels that sound find pixels that correspond correlate to sound

58

Optimization over each face is:

S2

S1

S3

S4

No parameters to tweak

Multiple Audio Bands

  • Each face: linear programming


Pixels that sound find pixels that correspond correlate to sound

Frame 9

Frame 42

Frame 68

Frame 115

Frame 146

Frame 169

Sharp & Dynamic, Despite Distraction


Pixels that sound find pixels that correspond correlate to sound

Frame 51

Frame 106

Frame 83

Frame 177

Performing in Audio Noise

  • Sparse

  • Localization on the proper elements

  • False alarm – temporally inconsistent

  • Handling dynamics


Pixels that sound find pixels that correspond correlate to sound

56

–norm: Energy Spread

Frame 146

Frame 83

Movie #1

Movie #2


Pixels that sound find pixels that correspond correlate to sound

57

–norm: Localization

Frame 146

Frame 83

Movie #1

Movie #2


Pixels that sound find pixels that correspond correlate to sound

The “Chorus Ambiguity”

Synchronized talk

Who’s talking?

  • Possible solutions:

  • Left

  • Right

  • Both

Not unique (ambiguous)


Pixels that sound find pixels that correspond correlate to sound

feature 2

feature 2

Both

feature 1

feature 1

-norm

-norm

The “Chorus Ambiguity”


  • Login