Cross modal visual auditory denoising
This presentation is the property of its rightful owner.
Sponsored Links
1 / 76

Cross-Modal (Visual-Auditory) Denoising PowerPoint PPT Presentation


  • 46 Views
  • Uploaded on
  • Presentation posted in: General

1. Cross-Modal (Visual-Auditory) Denoising. Dana Segev Yoav Y. Schechner Michael Elad. Technion – Israel Institute of Technology. Motivation. Noisy digits sequence. Digits sequence. Denoised by state of the art algorithm of Cohen & Berdugo. Segev, Schechner, Elad, Cross-Modal Denoising.

Download Presentation

Cross-Modal (Visual-Auditory) Denoising

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Cross modal visual auditory denoising

1

Cross-Modal (Visual-Auditory) Denoising

Dana Segev

Yoav Y. Schechner

Michael Elad

Technion – Israel Institute of Technology


Motivation

Motivation

Noisy digits sequence

Digits sequence

Denoised by state of the art algorithm of Cohen & Berdugo

Segev, Schechner, Elad, Cross-Modal Denoising


Motivation1

Motivation

  • Use one modality to denoise another?

  • Use video to denoise

  • a soundtrack?

Segev, Schechner, Elad, Cross-Modal Denoising


Cross modal visual auditory denoising

a

Noise

  • Very intense

  • Non-stationary

  • Unknown

  • Unseen source.

Single microphone

Segev, Schechner, Elad, Cross-Modal Denoising


Cross modal visual auditory denoising

denoised audio

Cross-modal

Example-Based

very noisy audio

Input

time (sec)

video

Algorithm

Output

For human and machine hearing

Segev, Schechner, Elad, Cross-Modal Denoising


Intuition

Intuition

Segev, Schechner, Elad, Cross-Modal Denoising


Intuition1

Intuition

Segev, Schechner, Elad, Cross-Modal Denoising


Intuition2

Intuition

I

E

Training xample set

nput test set

Segev, Schechner, Elad, Cross-Modal Denoising


Speech examples extraction

Speech Examples Extraction

Segev, Schechner, Elad, Cross-Modal Denoising


Speech examples extraction1

Speech Examples Extraction

~syllable

(0.25 sec)

Segev, Schechner, Elad, Cross-Modal Denoising


Music segments extraction

Music Segments Extraction

lophone

Xylophone

Segev, Schechner, Elad, Cross-Modal Denoising


Music segments extraction1

Music Segments Extraction

lophone

Xylophone

Sound

Segev, Schechner, Elad, Cross-Modal Denoising


Principle

Principle

...

...

Examples

Segev, Schechner, Elad, Cross-Modal Denoising


Principle1

Principle

...

...

Examples

Segev, Schechner, Elad, Cross-Modal Denoising


Audio only

Audio Only

...

...

Examples

Segev, Schechner, Elad, Cross-Modal Denoising


Audio only1

Audio Only

...

...

Examples

Segev, Schechner, Elad, Cross-Modal Denoising


Cross modal denoising

Cross-Modal Denoising

  • Cross-modal representation.

  • Generating multimodal features.

  • Learning feature statistics.

  • Cross-modal pattern recognition.

  • Rendering a denoised signal.

Segev, Schechner, Elad, Cross-Modal Denoising


Feature space creation

Feature-space Creation

time (sec)

Input video

Video feature-space

Input audio

Audio feature-space

Segev, Schechner, Elad, Cross-Modal Denoising


Feature space creation1

Feature-space Creation

time (sec)

Audio-video feature-space

Input

audio-video

Segev, Schechner, Elad, Cross-Modal Denoising


Feature space creation2

Feature-space Creation

Audio-video examples

feature-space

Training

audio-video

time (sec)

Segev, Schechner, Elad, Cross-Modal Denoising


Distance measure

Distance-measure

Feature-space

Segev, Schechner, Elad, Cross-Modal Denoising


Distance measure1

Distance-measure

Feature-space

Segev, Schechner, Elad, Cross-Modal Denoising


Distance measure2

Distance-measure

Feature-space

Segev, Schechner, Elad, Cross-Modal Denoising


Distance measure3

Distance-measure

Nearest Neighbor

Feature-space

Segev, Schechner, Elad, Cross-Modal Denoising


Distance measure4

Distance-measure

Nearest Neighbor

Feature-space

Segev, Schechner, Elad, Cross-Modal Denoising


Distance measure5

Distance-measure

...

...

Examples

Segev, Schechner, Elad, Cross-Modal Denoising


Distance measure6

Distance-measure

...

...

Examples

Segev, Schechner, Elad, Cross-Modal Denoising


Cross modal visual auditory denoising

Rendering a denoised signal

Noisy audio

Clean segment

Clean segment

Clean segment

Segev, Schechner, Elad, Cross-Modal Denoising


Cross modal visual auditory denoising

Rendering a denoised signal

Noisy audio

Clean segment

Clean segment

Clean segment

Denoised

Segev, Schechner, Elad, Cross-Modal Denoising


Distance measure7

Distance-measure

...

...

Examples

Segev, Schechner, Elad, Cross-Modal Denoising


Cross modal association

Cross-Modal Association

Examples

...

...

...

...

Input

Segev, Schechner, Elad, Cross-Modal Denoising


Cross modal association1

Cross-Modal Association

Examples

...

...

...

...

Input

Segev, Schechner, Elad, Cross-Modal Denoising


Cross modal association2

Cross-Modal Association

Examples

...

...

...

...

...

...

...

...

...

...

Input

Segev, Schechner, Elad, Cross-Modal Denoising


Cross modal association3

Cross-Modal Association

Examples

...

...

...

...

...

...

...

...

...

...

Input

Segev, Schechner, Elad, Cross-Modal Denoising


Cross modal visual auditory denoising

Bartender experiment

Segev, Schechner, Elad, Cross-Modal Denoising


Cross modal association4

Cross-Modal Association

Examples

...

...

...

...

...

...

...

...

...

...

Input

Segev, Schechner, Elad, Cross-Modal Denoising


Cross modal denoising1

Cross-Modal Denoising

  • Cross-modal representation.

  • Generating multimodal features.

  • Learning feature statistics.

  • Cross-modal pattern recognition (NN).

  • Rendering a denoised signal.

Segev, Schechner, Elad, Cross-Modal Denoising


Feature statistics as a prior

Feature Statistics as a Prior

Feature-space

Segev, Schechner, Elad, Cross-Modal Denoising


Feature statistics as a prior1

Feature Statistics as a Prior

Feature-space

For the k-th example

segment:

Segev, Schechner, Elad, Cross-Modal Denoising


Feature statistics as a prior2

Feature Statistics as a Prior

bi -

fif -

ty-

two

Feature-space

For the k-th example

segment:

bi

ty

ar

fif

two

Segev, Schechner, Elad, Cross-Modal Denoising


Feature statistics as a prior3

Feature Statistics as a Prior

Next

cluster

bi

ty

fif

two

ar

1

bi

1

1

1

ty

1

fif

1

Feature-space

1

2

1

two

bi

1

ar

Current cluster

ty

ar

fif

two

Segev, Schechner, Elad, Cross-Modal Denoising


Feature statistics as a prior4

Feature Statistics as a Prior

Syllable consecutive probability

Next

cluster

bi

ty

fif

two

ar

53

23

bi

26

5

1

12

60

43

17

6

ty

22

4

1

fif

5

3

6

2

13

12

21

two

9

7

2

7

11

ar

=

Current cluster

Number of examples in training set

The probability for transition

between clusters

Segev, Schechner, Elad, Cross-Modal Denoising


Cross modal visual auditory denoising

Feature Statistics as a Prior

Hidden Markov Model

fif

fif

Time

delay

two

two

bi

ty

ty

bi

P

Segev, Schechner, Elad, Cross-Modal Denoising


Cross modal visual auditory denoising

Feature Statistics as a Prior

Audio noise

fif

fif

Time

delay

two

two

bi

ty

ty

bi

P

Segev, Schechner, Elad, Cross-Modal Denoising


Cross modal visual auditory denoising

Feature Statistics as a Prior

Hidden Markov Model

Audio noise

fif

fif

+

Time

delay

two

two

bi

ty

ty

bi

P

Segev, Schechner, Elad, Cross-Modal Denoising


Cross modal association5

Cross-Modal Association

Examples

...

...

...

...

Input

Segev, Schechner, Elad, Cross-Modal Denoising


Cross modal association6

Cross-Modal Association

Examples

...

...

...

...

...

...

...

...

...

...

Input

Segev, Schechner, Elad, Cross-Modal Denoising


Cross modal association7

Cross-Modal Association

Examples

...

...

...

...

...

...

...

...

...

...

Input

Segev, Schechner, Elad, Cross-Modal Denoising


Cross modal association8

Cross-Modal Association

Input video

Segev, Schechner, Elad, Cross-Modal Denoising


Cross modal association9

Cross-Modal Association

Input video

Segev, Schechner, Elad, Cross-Modal Denoising


Cross modal association10

Cross-Modal Association

Vector of indices

Input video

Segev, Schechner, Elad, Cross-Modal Denoising


Cross modal association11

Cross-Modal Association

A Cost function

A Data term

A Data term

A Regularization term

A Regularization term

Segev, Schechner, Elad, Cross-Modal Denoising


Cross modal association12

Cross-Modal Association

A Cost function

A Data term

A Data term

A Regularization term

A Regularization term

Optimally vector of indices

Segev, Schechner, Elad, Cross-Modal Denoising


Cross modal association13

Cross-Modal Association

Examples

...

...

...

...

...

Dynamic Programming

...

...

...

...

Complexity:

...

Complexity:

  • nodes

Input

  • edges

Segev, Schechner, Elad, Cross-Modal Denoising


Cross modal association14

Cross-Modal Association

Examples

...

...

...

...

...

...

...

...

...

...

Input

Segev, Schechner, Elad, Cross-Modal Denoising


Cross modal association15

Cross-Modal Association

Examples

...

...

...

...

...

...

...

...

...

...

Input

Segev, Schechner, Elad, Cross-Modal Denoising


Cross modal association16

Cross-Modal Association

Examples

...

...

...

...

...

...

...

...

...

...

Input

Segev, Schechner, Elad, Cross-Modal Denoising


Cross modal denoising2

Cross-Modal Denoising

  • Cross-modal representation.

  • Generating multimodal features.

  • Learning feature statistics.

  • Cross-modal pattern recognition.

  • Rendering a denoised signal.

Segev, Schechner, Elad, Cross-Modal Denoising


Cross modal visual auditory denoising

Audio Features

Visual Features

  • Sensitivity to sound

  • perception.

  • Dimension reduction

  • Focusing on the motion

  • of interest

  • Dimension reduction

Requirements

Speech

Features

MFCCs

DCT coefficients

Music

Features

Spectrogram of each segment

The spatial trajectory

of a hitting rod

Segev, Schechner, Elad, Cross-Modal Denoising


Audio features

Audio Features

MFCCs – Mel-frequency Ceptral Coefficients

Signal spectrum

Audio signal

Mel-frequency filter bank

log(.)

DCT

MFCCs

Segev, Schechner, Elad, Cross-Modal Denoising


Cross modal visual auditory denoising

Audio Features

Spectrogram of each segment

Spectrogram

Xylophne signal

Spectrogram

accumulation

Segev, Schechner, Elad, Cross-Modal Denoising


Visual features

Visual Features

The given movie

speech

. . .

Segev, Schechner, Elad, Cross-Modal Denoising


Visual features1

Visual Features

Locking on the object of interest

speech

. . .

Segev, Schechner, Elad, Cross-Modal Denoising


Visual features2

Visual Features

Extracting global motion by tracking

speech

. . .

Segev, Schechner, Elad, Cross-Modal Denoising


Visual features3

Visual Features

Extracting global motion by tracking

speech

. . .

Segev, Schechner, Elad, Cross-Modal Denoising


Visual features4

Visual Features

Extracting features

speech

DCT coefficients which highly represent motion between frames

Segev, Schechner, Elad, Cross-Modal Denoising


Visual features5

Visual Features

The given movie

Xylophone

. . .

Segev, Schechner, Elad, Cross-Modal Denoising


Visual features6

Visual Features

Locking on the object of interest

Xylophone

. . .

Segev, Schechner, Elad, Cross-Modal Denoising


Visual features7

Visual Features

Extracting global motion by tracking

Xylophone

Y

Z

. . .

X

Segev, Schechner, Elad, Cross-Modal Denoising


Visual features8

Visual Features

Extracting global motion by tracking

Xylophone

Z

Y

. . .

X

Segev, Schechner, Elad, Cross-Modal Denoising


Visual features9

Visual Features

Extracting features

Xylophone

Z

Y

Hitting rod spatial coordinates

X

Segev, Schechner, Elad, Cross-Modal Denoising


Experiments

Experiments

Speech

  • A corpus of a limited number of words and syllables:

  • Digits and bar beverages.

  • Video rate 25fps, Audio rate 8000Hz.

  • Kmeans clustering, 350 clusters.

  • Distance measurement l2 norm.

Xylophone

  • A corpus of a limited sounds.

  • Video rate 25fps, Audio rate 16000Hz

  • Distance measurement l2 norm.

Segev, Schechner, Elad, Cross-Modal Denoising


Cross modal visual auditory denoising

Xylophone

  • Training duration: 103 sec

  • Testing duration : 100 sec

Xylophone Melody:

SNR = 1

Music from song by

GNR: SNR = 0.9

Segev, Schechner, Elad, Cross-Modal Denoising


Experiments1

Experiments

Speech: Digits

  • Training duration: 60 sec

  • Testing duration : 240 sec

Noisy

Denoised

SNR = 0.07

Segev, Schechner, Elad, Cross-Modal Denoising


Experiments2

Experiments

Speech: Bartender

  • Training duration: 48 sec

  • Testing duration : 350 sec

Music from song by

Phil Collins

Male Speech

White Gaussian

SNR = 0.59

SNR = 0.3

SNR = 0.38

Segev, Schechner, Elad, Cross-Modal Denoising


Summary

denoised audio

Summary

very noisy audio

Input

video

time (sec)

  • Example-based

  • Hidden Markov Model

Algorithm

Output

For human and machine hearing

Segev, Schechner, Elad, Cross-Modal Denoising


  • Login