automatic detection based phone recognition on timit
Download
Skip this Video
Download Presentation
Automatic Detection-based Phone Recognition on TIMIT

Loading in 2 Seconds...

play fullscreen
1 / 19

Automatic Detection-based Phone Recognition on TIMIT - PowerPoint PPT Presentation


  • 167 Views
  • Uploaded on

Automatic Detection-based Phone Recognition on TIMIT. Based on Chen and Wang in ISCSLP’08 and Interspeech’09. Hung-Shin Lee ( 李鴻欣 ). 12 July, 2011 @ IIS, Academia Sinica. Detection-Based ASR. Human SR. Knowledge Detection. Integration. Knowledge (Higher Level). DB ASR.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Automatic Detection-based Phone Recognition on TIMIT' - hermione


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
automatic detection based phone recognition on timit

Automatic Detection-based Phone Recognition on TIMIT

Based on Chen and Wang in ISCSLP’08 and Interspeech’09

Hung-Shin Lee (李鴻欣)

12 July, 2011 @ IIS, Academia Sinica

detection based asr
Detection-Based ASR

Human SR

Knowledge

Detection

Integration

Knowledge

(Higher Level)

DB ASR

Detectors

Integrator

Results

  • Phone
  • Syllable
  • Word
  • Sentence
  • Semantic info
  • HMM
  • CRF
  • Phonological attr.
  • Prosodic attr.
  • Acoustic attr.
phonological feature detection 1
Phonological Feature Detection (1)

9 frames

0

1

0

1

.

.

.

0

1

MLP (Detectors)

13 MFCCs

SPE_14

posterior

probability

hiddenlayer

input layer

i-4

i

i+4

quantization

0

1

1

.

.

0

1

GP_11

time-delay

recurrent

phonological feature detection 2
Phonological Feature Detection (2)

9 frames

6 MV Features

13 MFCCs

0

1

0

0

MLP (Centrality)

0

1

0

0

1

0

0

.

.

.

.

.

.

.

.

.

0

1

0

i-4

i

i+4

MLP (Front-Back)

1

0

0

MV_29

time-delay

0

1

0

MLP (Roundness)

conditional random field crf integrator
Conditional Random Field (CRF) Integrator
  • General Chain CRF

λj, μk : feature function weight parameters

state feature function

transition feature function

yi-1

yi

Output (phone)

Y

.

.

.

.

.

.

.

.

.

Input (phonological features)

X

xi-1

xi

xi+1

crf integrator training issues
CRF Integrator – Training Issues
  • Required Label for CRF Training
    • Phone: y
    • Phonological features: x

Oracle-data trained CRF

Phonological features

OT

CRF

Mapping

phones → phonological features

Phone labels

Training Data

Phone labels

Phonological features

(with errors)

Speech

DT

CRF

Detectors

MLP

Detected-data trained CRF

experiments
Experiments
  • Corpus: TIMIT
    • No SA1, SA2
    • Training set (3296 utts), Dev set (400 utts)
    • Test set (1344 utts)
  • Phone set: TIMIT61
    • Evaluation: CMU/MIT 39
  • Baseline
    • CI-HMM
  • Toolkits
    • Nico Toolkit (for MLP), CRF++ (for CRF)
results 1
Results (1)

Model: OT CRF

Test: OD Features

Model: OT/DT CRF

Test: DD Features

results 2
Results (2)

System Fusion

system fusion with crf
System Fusion with CRF

yi-1

yi

Combined Results (Phone)

Y

.

.

.

.

.

.

.

.

.

SPE Sys.

MV Sys.

Phone Sequence

X

GP Sys.

HMM Sys.

xi-1

xi

xi+1

two types of afdt imperfection
Two Types of AFDTImperfection

Phone

h# n eh ow kcl k w eh ae eh s tcl t ix n

AF(A)

AF(A’)

AF asynchrony

AFDT errors

crf training 1

Phone

AFs

CRF Training (1)

Phone y

Detected Errors

t

Phone y

t

AFDT

Mapping Table

AFs x

Oracle Data Training

AFs x

Detected Data Training

crf training 2
CRF Training (2)

AF Sequence

Phone y

t

AFDT

AFs x

Aligned Data Training

results 3
Results (3)

27.97 % acc. drops on the introduction of AF asynchrony

Detection Error causes further 7.99 % acc. drop

af asynchrony compensation

72Dim

Windows + DCTs

MLP

Right Context

72Dim

144Dim

MLP

Left Context

Windows + DCTs

MLP

23 dim Mel

72Dim

310ms

AF Asynchrony Compensation
  • AF asynchrony is caused by context variation
  • We can reduce AF asynchrony by letting our systems learn context variation directly – Long-Term information
conclusions
Conclusions
  • A well-designed phonological feature system is important
    • AF asynchrony minimization training and AF-phone synchronization could also be investigated
  • Oracle Trained CRF is able to retrieve more phonological information from speech
    • High phone correction rate (but sensitive to detection error)
    • Helpful for combination
  • Detection-Based ASR is promising
    • A front-end detector is a major issue
af and phone alignment using afdt

t

t

t

t

t

AF and Phone Alignment Using AFDT

phone sequence

AF sequence

ad