Automatic detection based phone recognition on timit
This presentation is the property of its rightful owner.
Sponsored Links
1 / 19

Automatic Detection-based Phone Recognition on TIMIT PowerPoint PPT Presentation


  • 114 Views
  • Uploaded on
  • Presentation posted in: General

Automatic Detection-based Phone Recognition on TIMIT. Based on Chen and Wang in ISCSLP’08 and Interspeech’09. Hung-Shin Lee ( 李鴻欣 ). 12 July, 2011 @ IIS, Academia Sinica. Detection-Based ASR. Human SR. Knowledge Detection. Integration. Knowledge (Higher Level). DB ASR.

Download Presentation

Automatic Detection-based Phone Recognition on TIMIT

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Automatic detection based phone recognition on timit

Automatic Detection-based Phone Recognition on TIMIT

Based on Chen and Wang in ISCSLP’08 and Interspeech’09

Hung-Shin Lee (李鴻欣)

12 July, 2011 @ IIS, Academia Sinica


Detection based asr

Detection-Based ASR

Human SR

Knowledge

Detection

Integration

Knowledge

(Higher Level)

DB ASR

Detectors

Integrator

Results

  • Phone

  • Syllable

  • Word

  • Sentence

  • Semantic info

  • HMM

  • CRF

  • Phonological attr.

  • Prosodic attr.

  • Acoustic attr.


Phonological systems

Phonological Systems


Phonological feature detection 1

Phonological Feature Detection (1)

9 frames

0

1

0

1

.

.

.

0

1

MLP (Detectors)

13 MFCCs

SPE_14

posterior

probability

hiddenlayer

input layer

i-4

i

i+4

quantization

0

1

1

.

.

0

1

GP_11

time-delay

recurrent


Phonological feature detection 2

Phonological Feature Detection (2)

9 frames

6 MV Features

13 MFCCs

0

1

0

0

MLP (Centrality)

0

1

0

0

1

0

0

.

.

.

.

.

.

.

.

.

0

1

0

i-4

i

i+4

MLP (Front-Back)

1

0

0

MV_29

time-delay

0

1

0

MLP (Roundness)


Conditional random field crf integrator

Conditional Random Field (CRF) Integrator

  • General Chain CRF

λj, μk : feature function weight parameters

state feature function

transition feature function

yi-1

yi

Output (phone)

Y

.

.

.

.

.

.

.

.

.

Input (phonological features)

X

xi-1

xi

xi+1


Crf integrator training issues

CRF Integrator – Training Issues

  • Required Label for CRF Training

    • Phone: y

    • Phonological features: x

Oracle-data trained CRF

Phonological features

OT

CRF

Mapping

phones → phonological features

Phone labels

Training Data

Phone labels

Phonological features

(with errors)

Speech

DT

CRF

Detectors

MLP

Detected-data trained CRF


Experiments

Experiments

  • Corpus: TIMIT

    • No SA1, SA2

    • Training set (3296 utts), Dev set (400 utts)

    • Test set (1344 utts)

  • Phone set: TIMIT61

    • Evaluation: CMU/MIT 39

  • Baseline

    • CI-HMM

  • Toolkits

    • Nico Toolkit (for MLP), CRF++ (for CRF)


Results 1

Results (1)

Model:OT CRF

Test:OD Features

Model:OT/DT CRF

Test:DD Features


Results 2

Results (2)

System Fusion


System fusion with crf

System Fusion with CRF

yi-1

yi

Combined Results (Phone)

Y

.

.

.

.

.

.

.

.

.

SPE Sys.

MV Sys.

Phone Sequence

X

GP Sys.

HMM Sys.

xi-1

xi

xi+1


Two types of afdt imperfection

Two Types of AFDTImperfection

Phone

h# n eh ow kcl k w eh ae eh s tcl t ix n

AF(A)

AF(A’)

AF asynchrony

AFDT errors


Crf training 1

Phone

AFs

CRF Training (1)

Phone y

Detected Errors

t

Phone y

t

AFDT

Mapping Table

AFs x

Oracle Data Training

AFs x

Detected Data Training


Crf training 2

CRF Training (2)

AF Sequence

Phone y

t

AFDT

AFs x

Aligned Data Training


Results 3

Results (3)

27.97 % acc. drops on the introduction of AF asynchrony

Detection Error causes further 7.99 % acc. drop


Af asynchrony compensation

72Dim

Windows + DCTs

MLP

Right Context

72Dim

144Dim

MLP

Left Context

Windows + DCTs

MLP

23 dim Mel

72Dim

310ms

AF Asynchrony Compensation

  • AF asynchrony is caused by context variation

  • We can reduce AF asynchrony by letting our systems learn context variation directly – Long-Term information


Results 4

Results (4)


Conclusions

Conclusions

  • A well-designed phonological feature system is important

    • AF asynchrony minimization training and AF-phone synchronization could also be investigated

  • Oracle Trained CRF is able to retrieve more phonological information from speech

    • High phone correction rate (but sensitive to detection error)

    • Helpful for combination

  • Detection-Based ASR is promising

    • A front-end detector is a major issue


Af and phone alignment using afdt

t

t

t

t

t

AF and Phone Alignment Using AFDT

phone sequence

AF sequence


  • Login