Age and Gender Classification using Modulation Cepstrum
1 / 8

Speaker Odyssey 2008 - PowerPoint PPT Presentation

  • Uploaded on

Age and Gender Classification using Modulation Cepstrum Jitendra Ajmera (presented by Christian Müller). Speaker Odyssey 2008. Previous work Characteristic acoustic features. Jitter and Shimmer (C. Müller et. al.) Phonetic cues (S. Schoetz) Cepstral coefficients.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Speaker Odyssey 2008' - palmer

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Age and Gender Classification using Modulation CepstrumJitendra Ajmera(presented by Christian Müller)

Speaker Odyssey 2008

Previous work characteristic acoustic features
Previous workCharacteristic acoustic features

  • Jitter and Shimmer (C. Müller et. al.)

  • Phonetic cues (S. Schoetz)

  • Cepstral coefficients

Motivation and intuition behind this work

Features such as cepstral coefficients characterize the exact content of the signal. Much of this

Information is not useful for age/gender classification, e.g. we can identify age/gender from a speech

in a foriegn language that we do not understand.

Therefore, features which characterize slowly varying temporal envelope should be more advantageous.

Mel cepstrum modulation spectrum features v tyagi et al





Cepstrum Computation



P Frames

Mel Cepstrum Modulation Spectrum features(V. Tyagi et. al.)

n: time instant

k: cepstral coefficient index

q: Modulation freuency index

P: Context Window Width (11 frames)

Experimental setup task

Experimental Setup.Task

  • 7 Target classes:

    • Children (<= 13 years)

    • Young Male (>13, <=20 years)

    • Young Female (>13, <=20 years)

    • Adult Male (>20, <=65 years)

    • Adult Female (>20, <=65 years)

    • Senior Male (> 65 years)

    • Senior Female (> 65 years)

Experimental setup dataset
Experimental SetupDataset

German SpeechDat Corpus

  • 4000 Native German Speakers

  • 80 speaker of each class were used for training, ~44 utterances each.

  • 20 speakers of each class were used for testing.

  • Data from different domain (VoiceClass, 660 utterances) was also used for testing.

  • Total ~6000 utterances used for testing.

A human-labelling experiment on a subset of test data yielded ~55%

Overall classification accuracy.


Both systems are based on GMM

(Gaussian Mixture Model) acoustic

model and maximumLikelihood classifier.

Both systems have equal dimension (21) of

Feature vectors and hence same number

Of parameters.


Performance of MCMS features as

function of duration and in/out-domain data.

Classification accuracy saturates at 3 modulation

frequencies (3-14 Hz) and starts dropping after 4

Modulation frequencies. This also explains why MFCC

Features perform worse than MCMS features.

Modulation Frequency response of first 3 MCMS filters.

These 3 filters provide complimentary

Information. For speech recognition, 7 filters (3-22 Hz)

provide best performance.


Thank You.