MERGING SEGMENTAL AND RHYTHMIC FEATURES FOR AUTOMATIC LANGUAGE IDENTIFICATION - PowerPoint PPT Presentation

Merging segmental and rhythmic features for automatic language identification
Download
1 / 1

  • 96 Views
  • Uploaded on
  • Presentation posted in: General

8. Frequency (kHz). 4. False rejection (%). False Alarm (%). 0. el. a. m. E. . E. t. . e. b. . n. Amplitude. 0. 0.2. 0.4. 0.6. 0.8. 1.0. Time (s). Vowel. Pause. Non Vowel. Rhythm Modeling. Vowel System Modeling. Vowel System Modeling. Vowel System Models.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

MERGING SEGMENTAL AND RHYTHMIC FEATURES FOR AUTOMATIC LANGUAGE IDENTIFICATION

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Merging segmental and rhythmic features for automatic language identification

8

Frequency (kHz)

4

False rejection (%)

False Alarm (%)

0

el

a

m

E

E

t

e

b



n

Amplitude

0

0.2

0.4

0.6

0.8

1.0

Time (s)

Vowel

Pause

Non Vowel

Rhythm Modeling

Vowel System Modeling

Vowel System Modeling

Vowel System Models

Mean Identification Rate: 79%

Discussion

Jérôme FARINAS1, François PELLEGRINO2, Jean-Luc ROUAS1 and Régine ANDRÉ-OBRECHT1

{farinas, rouas, obrecht}@irit.fr; pellegrino@univ-lyon2.fr

MERGING SEGMENTAL AND RHYTHMIC FEATURES FOR AUTOMATIC LANGUAGE IDENTIFICATION

1Institut de Recherche en Informatique de Toulouse

UMR 5505 CNRS - Université Paul Sabatier - INP

31062 Toulouse Cedex 4 - France

2Laboratoire Dynamique du Langage

UMR 5596 CNRS - Université Lumière Lyon 2

69363 Lyon Cedex 7 - France

Vowel / Non Vowel Segmentation

  • Speech segmentation: statistical segmentation (André-Obrecht, 1988)

    • Shorts segments (bursts and transient parts of sounds)

    • Longer segments (steady parts of sounds)

  • Speech Activity Detection and Vowel detection

    • Spectral analysis of the signal

  • Vowel detection (Pellegrino & Obrecht, 2000)

    • Language and speaker independent algorithm

Vowel / Non Vowel Segmentation

signal

The speech signal is parsed in patterns matching the structure: Cn V (n integer, can be 0).

(For the above example: CCVV.CCV.CV.CCCV.CV)

Pseudo-syllable Segmentation

Acoustic Modeling

Each vowel segment is represented with a set of 8 Mel-Frequency Cepstral Coefficients and 8 delta-MFCC, augmented with the Energy and delta Energy of the segment. This parameter vector is extended with the duration of the underlying segment.

Example for a

.CCV. syllable:

  • 3 parameters are computed:

  • Global consonant cluster duration

  • Global vowel duration

  • Complexity of the consonantal cluster

  • With the same .CCV. example:

Pseudo-syllable Modeling

Rhythm Models

Vowel System Likelihoods

Rhythm Likelihoods

For each language, a Gaussian Mixture Model (GMM) is trained using the EM algorithm. The number of components of the model is computed using the LBG-Rissanen algorithm. During the test, the decision lays on a Maximum Likelihood procedure.

Merging

A simple statistical merging is performed by adding the log-likelihoods of both the Rhythm model and the VSM for each language.

Decision Rule

L*

Rhythm Modeling

Merging

Mean Identification Rate: 83%

Mean Identification Rate: 70%

We propose two algorithms dedicated to Automatic Language Identification. Experiments, performed with cross-validation, show that it is possible to achieve an efficient rhythmic modeling (78% of correct identification) in a way that requires no a priori knowledge of the rhythmic structure of the processed languages. Besides, the Vowel System Model reaches 70% of correct identification. With these read data, merging the two approaches improves the identification rate up to 83%.


  • Login