Computer science department
Download
1 / 16

Computer Science Department - PowerPoint PPT Presentation


  • 184 Views
  • Uploaded on

Computer Science Department. A Speech / Music Discriminator using RMS and Zero-crossings. Costas Panagiotakis and George Tziritas. Department of Computer Science University of Crete Heraklion Greece. Computer Science Department. EUSIPCO 2002, Toulouse France. 1. Presentation Organization.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Computer Science Department' - kyoko


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Computer science department
Computer Science Department

A Speech / Music Discriminator using RMS and Zero-crossings

Costas Panagiotakis and George Tziritas

Department of Computer Science

University of Crete

HeraklionGreece


Computer science department1
Computer Science Department

EUSIPCO 2002, Toulouse France

1

Presentation Organization

  • I. Introduction

  • II. Segmentation

  • Classification

  • Results

  • Conclusion


Computer science department2
Computer Science Department

EUSIPCO 2002, Toulouse France

2

Introduction (1/3)

Input

Figure 1: Original Sound Signal (44100 or 22050 sample rate)

Output

Figure 2: Real time Segmentation and Classification (Speech,Music,Silence)


Computer science department3
Computer Science Department

EUSIPCO 2002, Toulouse France

3

Introduction (2/3)

Approaches

  • Features extraction (energy,frequency)

  • Feature based Segmentation and Classification

Basic purpose

  • Real time segmentation and classification

    • Algorithmic - computation constraints

    • Low feature number

  • Low change extraction error (20 msec)

  • Low minimum distance between two changes (1 sec)

  • High accuracy (95 %)


Computer science department4
Computer Science Department

Introduction (3/3)

Basic Features

  • Computed every 20 msec

  • Independent characteristics

Root Mean Square (RMS)

  • Signal energy

A =

  • Figure 3: RMS in music Figure 4: RMS in speech

Zero Crossings (ZC)

  • Mean frequency

  • Figure 5: ZCin music Figure 6: ZC in speech

EUSIPCO 2002, Toulouse France

4


Computer science department5
Computer Science Department

  • Figure 7: Histogram RMS in speech, approximation by χ2 distribution

  • Figure 8: Histogram RMS in speech, approximation by χ2 distribution

EUSIPCO 2002, Toulouse France

5

Segmentation (1/3)

Basic characteristics

RMS based

χ2 distribution fits well the RMS histograms

Γ( a + 1)

m : mean , s2 :variance

Two stage algorithm

  • Stage 1

    • 1 sec accuracy (low computation cost)

  • Stage 2

    • 20 msec accuracy (high computation cost)


Computer science department6
Computer Science Department

Frame i-1

Frame i

Frame i+1

Frame i+2

LOW

HIGH

EUSIPCO 2002, Toulouse France

6

Segmentation (2/3)

  • Stage 1

    • Partitioning in 1 sec frames (50 RMS values)

    • Change in Frame i  Frame i-1 and Frame i+1 have to differ

    • Computation of frame distance D (Matusita Distance) using frame similarity (p)

    • Frame i is candidate for Stage 2 (there is a change)

      • If D(i) > threshold and D(i) local maximal

p( p1 , p2 )

Change in frame i

RMS

time

1 sec frames

Distance


Computer science department7
Computer Science Department

EUSIPCO 2002, Toulouse France

7

Segmentation (3/3)

  • Stage 2

    • 20 msec accuracy

    • for each candidate frame (i) from stage 1

  • 1. move 2 successive frames (1 sec) located before and after frame (i)

  • 2. find the time instant where the 2 successive frames have the maximum Matusita distance in RMS distribution

    • Possible oversegmentation

    • Figure 11: The segmentation result and the RMS data

    • Figure 10: The RMS data and the distance D


    Computer science department8
    Computer Science Department

    Classification (1/4)

    • Basic purpose

    • Segment classification in one of following classes

      • Music

      • Speech

      • Silence

    • Main Algorithm

      • Hypothesis

        • Segmentation gives homogenous segments

      • Input

        • Basic characteristics RMS, ZC

      • Actual features computation of segment

      • Classification based on actual features values

    EUSIPCO 2002, Toulouse France

    8


    Computer science department9
    Computer Science Department

    Classification (2/4)

    Actual Features specification

    • Normalized RMSvariance, σ2Α

    • σ2Α =

      • Usually (86 %) σ2Α(music) < σ2Α (speech)

    • The probability of null ZC, ZC0

      • Always ZC0 (music) = 0 Usually (40%) ZC0(speech) > 0

    • Maximal mean frequency, max(ZC)

      • Almost always in speech max(ZC) < 2.4 kHz In 2% of the cases in music max(ZC) > 2.4 kHz

    EUSIPCO 2002, Toulouse France

    9


    Computer science department10
    Computer Science Department

    Classification (3/4)

    Actual Features specification

    • Joint RMS/ZC measure, Cz

      • Speech : High correlation RMS, ZC many void intervals  low RMS and ZC

      • Music : Essentially independent RMS, ZC

    • Void intervals frequency, Fu

    • Void intervals detection ( 20 msec ):

    • (RMS < T1) && (RMS < 0.1•max(RMS(i)) && (RMS < T2) || (ZC = 0)

    • Group neighborly silent intervals

    • Fu : frequency of grouped silent intervals

    • Always in speech Fu > 0.6

    • In at least 65% of music Fu < 0.6

    iA

    EUSIPCO 2002, Toulouse France

    10


    Computer science department11
    Computer Science Department

    A

    i A

    Silence segment check

    Silence

    Actual features check

    speech

    music

    ομιλία

    EUSIPCO 2002, Toulouse France

    11

    Classification (4/4)

    Silence segment recognition

    Segment is silence  E < Threshold

    • Decision making algorithm


    Computer science department12
    Computer Science Department

    EUSIPCO 2002, Toulouse France

    12

    • Data Data source

    • Segmentation performance

    Results

    • 11.328 sec speech

    • 3.131 sec music

    • 70% audio CDs

    • 15% WWW

    • 15% recordings

    • Actual features performance

    • 97% detection probability

    • Change accuracy ~ 0.2 sec

    Accuracy

    ZC0

    Cz

    σ2Α

    σ2Α, ZC0

    σ2Α Cz

    Cz σ2Α

    ZC0 σ2Α

    Fu σ2Α

    All

    Features

    Features


    Computer science department13
    Computer Science Department

    • Complexity

    Conclusion

    • Minimum complexity O(N)

    • Low computation cost

    • Summary

    • Real time segmentation and classification in three classes

    • Energy distribution (RMS) suffices for segmentation

    • RMS – ZC suffices for classification

    • Purpose : minimum cost and high performance

    • Future extension

    • Content-based indexing and retrieval audio signals

    • Pre-processing stage for speech recognition

    EUSIPCO 2002, Toulouse France

    13


    Computer science department14
    Computer Science Department

    Segmentation - Classification Demo


    Computer science department15
    Computer Science Department

    Sound Player Demo


    ad