34
Download
1 / 30

“ Pixels that Sound ” Find pixels that correspond (correlate !?) to sound - PowerPoint PPT Presentation


  • 77 Views
  • Uploaded on

34. “ Pixels that Sound ” Find pixels that correspond (correlate !?) to sound. Kidron, Schechner, Elad, CVPR 2005. 47. Audio-Visual Analysis: Applications. Lip reading – detection of lips (or person) Slaney, Covell (2000) Bregler, Konig (1994) Analysis and synthesis of music from motion

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' “ Pixels that Sound ” Find pixels that correspond (correlate !?) to sound' - lana-kane


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

34

“ Pixels that Sound ”

Find pixels that correspond (correlate !?) to sound

Kidron, Schechner, Elad, CVPR 2005


Audio visual analysis applications

47

Audio-Visual Analysis: Applications

  • Lip reading – detection of lips (or person)

    • Slaney, Covell (2000)

    • Bregler, Konig (1994)

  • Analysis and synthesis of music from motion

    • Murphy, Andersen, Jensen (2003)

  • Source separation based on vision

    • Li, Dimitrova, Li, Sethi (2003)

    • Smaragdis, Casey (2003)

    • Nock, Iyengar, Neti (2002)

    • Fisher, Darrell, Freeman, Viola (2001)

    • Hershey, Movellan (1999)

  • Tracking

  • Vermaak, Gangnet, Blake, Pérez (2001)

  • Biological systems

    • Gutfreund, Zheng, Knudsen (2002)


Problem different modalities

47

audio-visual analysis

microphone

camera

Problem: Different Modalities

Audio data

44.1 KHz, few bands

Not stereophonic

Visual data

25 frames/sec

Each frame: 576 x 720 pixels

Kidron, Schechner, Elad, Pixels that Sound


54

Not

Typical

  • Cluster of pixels -

  • linear superposition

  • Canonical Correlation Analysis (CCA)

  • Smaragdis, Casey (2003)

  • Li, Dimitrova, Li, Sethi (2003)

  • Slaney, Covell (2000)

Ill-posed

(lack of data)

  • Mutual Information (MI)

  • Fisher et. al. (2001)

  • Cutler, Davis (2000)

  • Bregler,Konig (1994)

highly complex

Previous Work

  • Pointwise correlation

    • Nock, Iyengar, Neti (2002)

    • Hershey, Movellan (1999)


49

Pixel #2

Band #2

Band #1

Pixel #1

Pixel #3

CCA

Optimal

Optimal visual components

Projection

Projection

Video

Audio

Kidron, Schechner, Elad, Pixels that Sound


Visual projection

40

Visual Projection

v

  • Video features

  • Pixels intensity

  • Transform coeff (wavelet)

  • Image differences

1D

variable

3

40

120

52

68

74

36

859

Projection


Audio projection

41

Audio Projection

a

1D

variable

  • Audio features

  • Average energy per frame

  • Transform coeffs per frame

Projection


Canonical correlation

42

Audio

Video

Canonical Correlation

Representation

Projections

(per time window)

Random variables

(time dependent)

Correlation coefficient


Cca formulation

43

Canonical

Correlation

Largest Eigenvalue

equivalent to

Corresponding Eigenvectors

  • yield an eigenvalue problem:

    • Knutsson, Borga, Landelius (1995)

CCA Formulation

Projections


Visual data

51

t

(frames)

Spatial Location

(pixels intensities)

Visual Data

Kidron, Schechner, Elad, Pixels that Sound


Rank deficiency

44

t

(frames)

Spatial Location

(pixels intensities)

=

Rank Deficiency

Kidron, Schechner, Elad, Pixels that Sound


Estimation of covariance

45

Estimation of Covariance

Rank deficient


Ill posedness

46

Impossible to invert !!!

Ill-Posedness

  • Prior solutions:

  • Use many more frames  poor temporal resolution.

  • Aggressive spatial pruning  poor spatial resolution.

  • Trivial regularization


A general problem

47

Large number of weights

AGeneral Problem

Small amount of data

The problem is ILL-POSED

Over fitting is likely


48

Minimizing

Maximizing

An Equivalent Problem


49

A has a single column, and

Known

data

Minimizing

Single Audio Band

(The denominator is non-zero)


52

Full correlation if

a(1)

a(2)

a(ti)

a(30)

=

Time

a

V

Underdetermined system !

Kidron, Schechner, Elad, Pixels that Sound

end


52

“Out of clutter, find simplicity.

From discord, find harmony.”

Albert Einstein

Detected correlated pixels

end


53

  • Non-convex

  • Exponential complexity

minimum

-norm

Sparse Solution


54

  • Sparse

  • Convex

  • Polynomial complexity

minimum

-norm

in common situations

The -norm criterion

Donoho, Elad (2005)


55

-norm (pseudo-inverse, SVD, QR)

Solving using

Energy spread

minimum

-norm

The Minimum Norm Solution


56

Audio-visual events

No parameters to tweak

Maximum correlation: Eigenproblem

Minimum objective function G

Linear programming

Fully correlated

Sparse

Polynomial


57

  • Convex

  • Linear

-ball

Multiple Audio Bands - Solution

The optimization problem:

Non-convex constraint


58

Optimization over each face is:

S2

S1

S3

S4

No parameters to tweak

Multiple Audio Bands

  • Each face: linear programming


Frame 9

Frame 42

Frame 68

Frame 115

Frame 146

Frame 169

Sharp & Dynamic, Despite Distraction


Frame 51

Frame 106

Frame 83

Frame 177

Performing in Audio Noise

  • Sparse

  • Localization on the proper elements

  • False alarm – temporally inconsistent

  • Handling dynamics


56

–norm: Energy Spread

Frame 146

Frame 83

Movie #1

Movie #2


57

–norm: Localization

Frame 146

Frame 83

Movie #1

Movie #2


The “Chorus Ambiguity”

Synchronized talk

Who’s talking?

  • Possible solutions:

  • Left

  • Right

  • Both

Not unique (ambiguous)


feature 2

feature 2

Both

feature 1

feature 1

-norm

-norm

The “Chorus Ambiguity”


ad