Computer science department
This presentation is the property of its rightful owner.
Sponsored Links
1 / 16

Computer Science Department PowerPoint PPT Presentation


  • 140 Views
  • Uploaded on
  • Presentation posted in: General

Computer Science Department. A Speech / Music Discriminator using RMS and Zero-crossings. Costas Panagiotakis and George Tziritas. Department of Computer Science University of Crete Heraklion Greece. Computer Science Department. EUSIPCO 2002, Toulouse France. 1. Presentation Organization.

Download Presentation

Computer Science Department

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Computer science department

Computer Science Department

A Speech / Music Discriminator using RMS and Zero-crossings

Costas Panagiotakis and George Tziritas

Department of Computer Science

University of Crete

HeraklionGreece


Computer science department1

Computer Science Department

EUSIPCO 2002, Toulouse France

1

Presentation Organization

  • I.Introduction

  • II.Segmentation

  • Classification

  • Results

  • Conclusion


Computer science department2

Computer Science Department

EUSIPCO 2002, Toulouse France

2

Introduction (1/3)

Input

Figure 1: Original Sound Signal (44100 or 22050 sample rate)

Output

Figure 2: Real time Segmentation and Classification (Speech,Music,Silence)


Computer science department3

Computer Science Department

EUSIPCO 2002, Toulouse France

3

Introduction (2/3)

Approaches

  • Features extraction (energy,frequency)

  • Feature based Segmentation and Classification

Basic purpose

  • Real time segmentation and classification

    • Algorithmic - computation constraints

    • Low feature number

  • Low change extraction error (20 msec)

  • Low minimum distance between two changes (1 sec)

  • High accuracy (95 %)


Computer science department4

Computer Science Department

Introduction (3/3)

Basic Features

  • Computed every 20 msec

  • Independent characteristics

Root Mean Square (RMS)

  • Signal energy

A =

  • Figure 3: RMS in music Figure 4: RMS in speech

Zero Crossings (ZC)

  • Mean frequency

  • Figure 5: ZCin music Figure 6: ZC in speech

EUSIPCO 2002, Toulouse France

4


Computer science department5

Computer Science Department

  • Figure 7: Histogram RMS in speech, approximation by χ2 distribution

  • Figure 8: Histogram RMS in speech, approximation by χ2 distribution

EUSIPCO 2002, Toulouse France

5

Segmentation (1/3)

Basic characteristics

RMS based

χ2 distribution fits well the RMS histograms

Γ( a + 1)

m : mean , s2 :variance

Two stage algorithm

  • Stage 1

    • 1 sec accuracy (low computation cost)

  • Stage 2

    • 20 msec accuracy (high computation cost)


Computer science department6

Computer Science Department

Frame i-1

Frame i

Frame i+1

Frame i+2

LOW

HIGH

EUSIPCO 2002, Toulouse France

6

Segmentation (2/3)

  • Stage 1

    • Partitioning in 1 sec frames (50 RMS values)

    • Change in Frame i  Frame i-1 and Frame i+1 have to differ

    • Computation of frame distance D (Matusita Distance) using frame similarity (p)

    • Frame i is candidate for Stage 2 (there is a change)

      • If D(i) > threshold and D(i) local maximal

p( p1 , p2 )

Change in frame i

RMS

time

1 sec frames

Distance


Computer science department7

Computer Science Department

EUSIPCO 2002, Toulouse France

7

Segmentation (3/3)

  • Stage 2

    • 20 msec accuracy

    • for each candidate frame (i) from stage 1

  • 1. move 2 successive frames (1 sec) located before and after frame (i)

  • 2. find the time instant where the 2 successive frames have the maximum Matusita distance in RMS distribution

    • Possible oversegmentation

    • Figure 11: The segmentation result and the RMS data

    • Figure 10: The RMS data and the distance D


    Computer science department8

    Computer Science Department

    Classification (1/4)

    • Basic purpose

    • Segment classification in one of following classes

      • Music

      • Speech

      • Silence

    • Main Algorithm

      • Hypothesis

        • Segmentation gives homogenous segments

      • Input

        • Basic characteristics RMS, ZC

      • Actual features computation of segment

      • Classification based on actual features values

    EUSIPCO 2002, Toulouse France

    8


    Computer science department9

    Computer Science Department

    Classification (2/4)

    Actual Features specification

    • Normalized RMSvariance, σ2Α

    • σ2Α =

      • Usually (86 %) σ2Α(music) < σ2Α (speech)

    • The probability of null ZC, ZC0

      • Always ZC0 (music) = 0 Usually (40%) ZC0(speech) > 0

    • Maximal mean frequency, max(ZC)

      • Almost always in speech max(ZC) < 2.4 kHz In 2% of the cases in music max(ZC) > 2.4 kHz

    EUSIPCO 2002, Toulouse France

    9


    Computer science department10

    Computer Science Department

    Classification (3/4)

    Actual Features specification

    • Joint RMS/ZC measure, Cz

      • Speech : High correlation RMS, ZC many void intervals  low RMS and ZC

      • Music : Essentially independent RMS, ZC

    • Void intervals frequency, Fu

    • Void intervals detection ( 20 msec ):

    • (RMS < T1) && (RMS < 0.1•max(RMS(i)) && (RMS < T2) || (ZC = 0)

    • Group neighborly silent intervals

    • Fu : frequency of grouped silent intervals

    • Always in speech Fu > 0.6

    • In at least 65% of music Fu < 0.6

    iA

    EUSIPCO 2002, Toulouse France

    10


    Computer science department11

    Computer Science Department

    A

    i A

    Silence segment check

    Silence

    Actual features check

    speech

    music

    ομιλία

    EUSIPCO 2002, Toulouse France

    11

    Classification (4/4)

    Silence segment recognition

    Segment is silence  E < Threshold

    • Decision making algorithm


    Computer science department12

    Computer Science Department

    EUSIPCO 2002, Toulouse France

    12

    • Data Data source

    • Segmentation performance

    Results

    • 11.328 sec speech

    • 3.131 sec music

    • 70% audio CDs

    • 15% WWW

    • 15% recordings

    • Actual features performance

    • 97% detection probability

    • Change accuracy ~ 0.2 sec

    Accuracy

    ZC0

    Cz

    σ2Α

    σ2Α, ZC0

    σ2Α Cz

    Cz σ2Α

    ZC0 σ2Α

    Fu σ2Α

    All

    Features

    Features


    Computer science department13

    Computer Science Department

    • Complexity

    Conclusion

    • Minimum complexity O(N)

    • Low computation cost

    • Summary

    • Real time segmentation and classification in three classes

    • Energy distribution (RMS) suffices for segmentation

    • RMS – ZC suffices for classification

    • Purpose : minimum cost and high performance

    • Future extension

    • Content-based indexing and retrieval audio signals

    • Pre-processing stage for speech recognition

    EUSIPCO 2002, Toulouse France

    13


    Computer science department14

    Computer Science Department

    Segmentation - Classification Demo


    Computer science department15

    Computer Science Department

    Sound Player Demo


  • Login