Zurich, 29-01-2007. 2. Context. Deviant' pronunciation (e.g., pathology, non-natives)
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
1. Using speech technology for pronunciation assessment and training Helmer Strik Centre for Language and Speech Technology (CLST) Radboud University Nijmegen, the Netherlands
2. Zurich, 29-01-2007 2 Context ‘Deviant’ pronunciation (e.g., pathology, non-natives)
& speech technology – Applications :
AAC (Augmentative & Alternative Communication)
Training (therapy, learning)
CAPT: Computer Assisted Pronunciation Training
3. Zurich, 29-01-2007 3 Overview
4. Zurich, 29-01-2007 4 CAPT: Background and problem Computer Assisted Pronunciation Training (CAPT)
can provide automatic, instantaneous, individual feedback on pronunciation in a private environment
But ASR-based CAPT suffers from limitations.
Is it effective in improving L2 pronunciation?
Very few studies with different results.
5. Zurich, 29-01-2007 5 CAPT: Goal of this study To study the effectiveness and possible advantage of automatic feedback provided by an ASR-based CAPT system.
6. Zurich, 29-01-2007 6 ASR-based CAPT system: Dutch CAPT Target users
adult learners of Dutch with different L1's
improving segmental quality in pronunciation
7. Zurich, 29-01-2007 7 Dutch CAPT: feedback Content: focus on problematic phonemes
Common across speakers of various L1’s
Robust for automatic detection
11 ‘targeted phonemes’ : 9 vowels and 2 consonants
8. Zurich, 29-01-2007 8 Video (from Nieuwe Buren)
9. Zurich, 29-01-2007 9
10. Zurich, 29-01-2007 10 Video: dialogue
11. Zurich, 29-01-2007 11
12. Zurich, 29-01-2007 12
13. Zurich, 29-01-2007 13
14. Zurich, 29-01-2007 14
15. Zurich, 29-01-2007 15 Dutch CAPT Gender-specific, Dutch & English version.
4 units, each containing:
1 video (from Nieuwe Buren) with real-life + amusing situations
+ ca. 30 exercises based on video: dialogues, question-answer, minimal pairs, word repetition
Sequential, constrained navigation: min. one attempt needed to proceed to next exercise, maximum 3
16. Zurich, 29-01-2007 16 Method: participants & training Regular teacher-fronted lessons: 4-6 hrs per week
Experimental group (EXP): n=15 (10 F, 5 M) Dutch CAPT
Control group 1 (NiBu): n=10 (4 F, 6 M) reduced version of Nieuwe Buren
Control group 2 (noXT): n=5 (3 F, 2 M) no extra training
Extra training: 4 weeks x 1 session 30’ – 60’
1 class – 1 type of training
17. Zurich, 29-01-2007 17 Method: testing 3 analyses:
Participants’ evaluations: questionnaires on system’s usability, accessibility, usefulness etc.
Global segmental quality: 6 experts rated stimuli on 10-point scale (pretest/posttest, phonetically balanced sentences)
In-depth analysis of segmental errors: expert annotations
18. Zurich, 29-01-2007 18 Results: participants’ evaluations Positive reactions
Enjoyed working with the system
Believed in the usefulness of the system
19. Zurich, 29-01-2007 19 Results: reliability global ratings Cronbach’s a:
Intrarater: 0.94 – 1.00
Interrater: 0.83 - 0.96
20. Zurich, 29-01-2007 20 Results: Global ratings
21. Zurich, 29-01-2007 21
22. Zurich, 29-01-2007 22 Results: Global ratings
23. Zurich, 29-01-2007 23 In-depth analysis segm. quality
24. Zurich, 29-01-2007 24 Conclusions Global ratings are appropriate measure because CAPT should ultimately improve overall pronunciation quality.
Fine-grained analyses also useful.
Participants enjoyed Dutch CAPT.
ASR-CAPT seems efficacious in improving pronunciation of targeted phonemes.
25. Zurich, 29-01-2007 25 Video: pronouncing words
26. Zurich, 29-01-2007 26 Possible improvements Increase sample size (more participants)
Increase training intensity (more training)
Match training groups: L1’s, proficiency, etc.
Give feedback on more phonemes
More targeted systems for fixed L1-L2 pairs.
Give feedback on suprasegmentals
Improve error detection?
27. Zurich, 29-01-2007 27 Error detection
Detection of pronunciation errors
Goodness Of Pronunciation (GOP)
Silke Witt & Steve Young
Truong et al.
Goal: improve error detection
28. Zurich, 29-01-2007 28
29. Zurich, 29-01-2007 29 GOP: Accuracy 15 participants
2174 target phones
30. Zurich, 29-01-2007 30 Acoustic-phonetic approach
Selection of segmental pronunciation errors:
/A/ mispronounced as /a:/ (man - maan)
/Y/ mispronounced as /u/ or /y/ (tut – toet or tuut)
/x/ mispronounced as /k/ or /g/ (gat – kat or /g/at) Before we started, we first selected a number of pronunciation errors which we were going to address in this study. A survey was carried out on an annotated non-native speech database, and we selected pronunciation errors by their frequency, and we selected only gross errors.
FrequentBefore we started, we first selected a number of pronunciation errors which we were going to address in this study. A survey was carried out on an annotated non-native speech database, and we selected pronunciation errors by their frequency, and we selected only gross errors.
31. Zurich, 29-01-2007 31 Here are some examples of amplitude and ROR contours of the two sounds. At the top: amplitude, at the bottom: ROR contour. At the left: fricative /x/, at the right: plosive /k/. We see indeed a gradual rise of amplitude for the fricative and an abrupt rise of amplitude for the plosive. The abrupt rise of amplitude is clearly visible as a high peak in the ROR contour in the case of the plosive; in the case of the fricative this high peak is missing. We are mainly going to use these amplitude differences to discriminate /x/ from /k/.Here are some examples of amplitude and ROR contours of the two sounds. At the top: amplitude, at the bottom: ROR contour. At the left: fricative /x/, at the right: plosive /k/. We see indeed a gradual rise of amplitude for the fricative and an abrupt rise of amplitude for the plosive. The abrupt rise of amplitude is clearly visible as a high peak in the ROR contour in the case of the plosive; in the case of the fricative this high peak is missing. We are mainly going to use these amplitude differences to discriminate /x/ from /k/.
32. Zurich, 29-01-2007 32 Here we can see what kind of measurement we have taken to train the classifiers. First, the height of the highest ROR peak. Then 4 amplitude measurements: 1 before the peak, and 3 after the peak (chosen rather arbitrarily), and also duration which was added optionally.Here we can see what kind of measurement we have taken to train the classifiers. First, the height of the highest ROR peak. Then 4 amplitude measurements: 1 before the peak, and 3 after the peak (chosen rather arbitrarily), and also duration which was added optionally.
33. Zurich, 29-01-2007 33 Results method II (LDA) /x/ vs /k/
34. Zurich, 29-01-2007 34 Error detection GOP:
One general method for all sounds
Error specific knowledge is not used
Error specific knowledge is used
How to generalize? (artic. + other features)
Other approaches, e.g. post. prob’s (ANN)?
35. Zurich, 29-01-2007 35
36. Zurich, 29-01-2007 36 Error detection Pronunciation errors
11 ‘problematic sounds’ : 9 V + 2 C
Goal: give feedback on more sounds
maak / maakt / maken
Goal: also give feedback on morpho-syntactic aspects
37. Zurich, 29-01-2007 37 GOP GOP has been applied in the exp. system.
The exp. system was effective.
Correct vs. errors
Pros & cons