Merging segmental and rhythmic features for automatic language identification
This presentation is the property of its rightful owner.
Sponsored Links
1 / 1

MERGING SEGMENTAL AND RHYTHMIC FEATURES FOR AUTOMATIC LANGUAGE IDENTIFICATION PowerPoint PPT Presentation


  • 77 Views
  • Uploaded on
  • Presentation posted in: General

8. Frequency (kHz). 4. False rejection (%). False Alarm (%). 0. el. a. m. E. . E. t. . e. b. . n. Amplitude. 0. 0.2. 0.4. 0.6. 0.8. 1.0. Time (s). Vowel. Pause. Non Vowel. Rhythm Modeling. Vowel System Modeling. Vowel System Modeling. Vowel System Models.

Download Presentation

MERGING SEGMENTAL AND RHYTHMIC FEATURES FOR AUTOMATIC LANGUAGE IDENTIFICATION

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Merging segmental and rhythmic features for automatic language identification

8

Frequency (kHz)

4

False rejection (%)

False Alarm (%)

0

el

a

m

E

E

t

e

b



n

Amplitude

0

0.2

0.4

0.6

0.8

1.0

Time (s)

Vowel

Pause

Non Vowel

Rhythm Modeling

Vowel System Modeling

Vowel System Modeling

Vowel System Models

Mean Identification Rate: 79%

Discussion

Jérôme FARINAS1, François PELLEGRINO2, Jean-Luc ROUAS1 and Régine ANDRÉ-OBRECHT1

{farinas, rouas, [email protected]; [email protected]

MERGING SEGMENTAL AND RHYTHMIC FEATURES FOR AUTOMATIC LANGUAGE IDENTIFICATION

1Institut de Recherche en Informatique de Toulouse

UMR 5505 CNRS - Université Paul Sabatier - INP

31062 Toulouse Cedex 4 - France

2Laboratoire Dynamique du Langage

UMR 5596 CNRS - Université Lumière Lyon 2

69363 Lyon Cedex 7 - France

Vowel / Non Vowel Segmentation

  • Speech segmentation: statistical segmentation (André-Obrecht, 1988)

    • Shorts segments (bursts and transient parts of sounds)

    • Longer segments (steady parts of sounds)

  • Speech Activity Detection and Vowel detection

    • Spectral analysis of the signal

  • Vowel detection (Pellegrino & Obrecht, 2000)

    • Language and speaker independent algorithm

Vowel / Non Vowel Segmentation

signal

The speech signal is parsed in patterns matching the structure: Cn V (n integer, can be 0).

(For the above example: CCVV.CCV.CV.CCCV.CV)

Pseudo-syllable Segmentation

Acoustic Modeling

Each vowel segment is represented with a set of 8 Mel-Frequency Cepstral Coefficients and 8 delta-MFCC, augmented with the Energy and delta Energy of the segment. This parameter vector is extended with the duration of the underlying segment.

Example for a

.CCV. syllable:

  • 3 parameters are computed:

  • Global consonant cluster duration

  • Global vowel duration

  • Complexity of the consonantal cluster

  • With the same .CCV. example:

Pseudo-syllable Modeling

Rhythm Models

Vowel System Likelihoods

Rhythm Likelihoods

For each language, a Gaussian Mixture Model (GMM) is trained using the EM algorithm. The number of components of the model is computed using the LBG-Rissanen algorithm. During the test, the decision lays on a Maximum Likelihood procedure.

Merging

A simple statistical merging is performed by adding the log-likelihoods of both the Rhythm model and the VSM for each language.

Decision Rule

L*

Rhythm Modeling

Merging

Mean Identification Rate: 83%

Mean Identification Rate: 70%

We propose two algorithms dedicated to Automatic Language Identification. Experiments, performed with cross-validation, show that it is possible to achieve an efficient rhythmic modeling (78% of correct identification) in a way that requires no a priori knowledge of the rhythmic structure of the processed languages. Besides, the Vowel System Model reaches 70% of correct identification. With these read data, merging the two approaches improves the identification rate up to 83%.


  • Login