Using speech technology for pronunciation assessment and training Helmer Strik Centre for Language and Speech Technol

1. Using speech technology for pronunciation assessment and trainingHelmer StrikCentre for Language and Speech Technology (CLST)Radboud University Nijmegen, the Netherlands

2. Zurich, 29-01-2007 2 Context �Deviant� pronunciation (e.g., pathology, non-natives) & speech technology � Applications : AAC (Augmentative & Alternative Communication) Improve communication Interactive tools Reading, listening Assessment Diagnosis, monitoring Training (therapy, learning) CAPT: Computer Assisted Pronunciation Training

3. Zurich, 29-01-2007 3 Overview Contents : CAPT Error detection

4. Zurich, 29-01-2007 4 CAPT: Background and problem Computer Assisted Pronunciation Training (CAPT) ASR-based CAPT: can provide automatic, instantaneous, individual feedback on pronunciation in a private environment But ASR-based CAPT suffers from limitations. Is it effective in improving L2 pronunciation? Very few studies with different results.

5. Zurich, 29-01-2007 5 CAPT: Goal of this study To study the effectiveness and possible advantage of automatic feedback provided by an ASR-based CAPT system.

6. Zurich, 29-01-2007 6 ASR-based CAPT system: Dutch CAPT Target users adult learners of Dutch with different L1's (e.g. immigrants) Pedagogical goal improving segmental quality in pronunciation

7. Zurich, 29-01-2007 7 Dutch CAPT: feedback Content: focus on problematic phonemes Criteria Common across speakers of various L1�s Perceptually salient Frequent Persistent Robust for automatic detection Result: 11 �targeted phonemes� : 9 vowels and 2 consonants

8. Zurich, 29-01-2007 8 Video (from Nieuwe Buren)

9. Zurich, 29-01-2007 9

10. Zurich, 29-01-2007 10 Video: dialogue

11. Zurich, 29-01-2007 11

12. Zurich, 29-01-2007 12

13. Zurich, 29-01-2007 13

14. Zurich, 29-01-2007 14

15. Zurich, 29-01-2007 15 Dutch CAPT Gender-specific, Dutch & English version. 4 units, each containing: 1 video (from Nieuwe Buren) with real-life + amusing situations + ca. 30 exercises based on video: dialogues, question-answer, minimal pairs, word repetition Sequential, constrained navigation: min. one attempt needed to proceed to next exercise, maximum 3

16. Zurich, 29-01-2007 16 Method: participants & training Regular teacher-fronted lessons: 4-6 hrs per week Experimental group (EXP): n=15 (10 F, 5 M) Dutch CAPT Control group 1 (NiBu): n=10 (4 F, 6 M) reduced version of Nieuwe Buren Control group 2 (noXT): n=5 (3 F, 2 M) no extra training Extra training: 4 weeks x 1 session 30� � 60� 1 class � 1 type of training

17. Zurich, 29-01-2007 17 Method: testing 3 analyses: Participants� evaluations: questionnaires on system�s usability, accessibility, usefulness etc. Global segmental quality: 6 experts rated stimuli on 10-point scale (pretest/posttest, phonetically balanced sentences) In-depth analysis of segmental errors: expert annotations

18. Zurich, 29-01-2007 18 Results: participants� evaluations Positive reactions Enjoyed working with the system Believed in the usefulness of the system

19. Zurich, 29-01-2007 19 Results: reliability global ratings Cronbach�s a: Intrarater: 0.94 � 1.00 Interrater: 0.83 - 0.96

20. Zurich, 29-01-2007 20 Results: Global ratings

21. Zurich, 29-01-2007 21

22. Zurich, 29-01-2007 22 Results: Global ratings

23. Zurich, 29-01-2007 23 In-depth analysis segm. quality

24. Zurich, 29-01-2007 24 Conclusions Global ratings are appropriate measure because CAPT should ultimately improve overall pronunciation quality. Fine-grained analyses also useful. Participants enjoyed Dutch CAPT. ASR-CAPT seems efficacious in improving pronunciation of targeted phonemes.

25. Zurich, 29-01-2007 25 Video: pronouncing words

26. Zurich, 29-01-2007 26 Possible improvements Increase sample size (more participants) Increase training intensity (more training) Match training groups: L1�s, proficiency, etc. Give feedback on more phonemes More targeted systems for fixed L1-L2 pairs. Give feedback on suprasegmentals Improve error detection?

27. Zurich, 29-01-2007 27 Error detection Detection of pronunciation errors Goodness Of Pronunciation (GOP) Silke Witt & Steve Young Acoustic-phonetic approaches Truong et al. Goal: improve error detection

28. Zurich, 29-01-2007 28

29. Zurich, 29-01-2007 29 GOP: Accuracy 15 participants 2174 target phones Results

30. Zurich, 29-01-2007 30 Acoustic-phonetic approach Selection of segmental pronunciation errors: /A/ mispronounced as /a:/ (man - maan) /Y/ mispronounced as /u/ or /y/ (tut � toet or tuut) /x/ mispronounced as /k/ or /g/ (gat � kat or /g/at) Before we started, we first selected a number of pronunciation errors which we were going to address in this study. A survey was carried out on an annotated non-native speech database, and we selected pronunciation errors by their frequency, and we selected only gross errors. FrequentBefore we started, we first selected a number of pronunciation errors which we were going to address in this study. A survey was carried out on an annotated non-native speech database, and we selected pronunciation errors by their frequency, and we selected only gross errors. Frequent

31. Zurich, 29-01-2007 31 Here are some examples of amplitude and ROR contours of the two sounds. At the top: amplitude, at the bottom: ROR contour. At the left: fricative /x/, at the right: plosive /k/. We see indeed a gradual rise of amplitude for the fricative and an abrupt rise of amplitude for the plosive. The abrupt rise of amplitude is clearly visible as a high peak in the ROR contour in the case of the plosive; in the case of the fricative this high peak is missing. We are mainly going to use these amplitude differences to discriminate /x/ from /k/.Here are some examples of amplitude and ROR contours of the two sounds. At the top: amplitude, at the bottom: ROR contour. At the left: fricative /x/, at the right: plosive /k/. We see indeed a gradual rise of amplitude for the fricative and an abrupt rise of amplitude for the plosive. The abrupt rise of amplitude is clearly visible as a high peak in the ROR contour in the case of the plosive; in the case of the fricative this high peak is missing. We are mainly going to use these amplitude differences to discriminate /x/ from /k/.

32. Zurich, 29-01-2007 32 Here we can see what kind of measurement we have taken to train the classifiers. First, the height of the highest ROR peak. Then 4 amplitude measurements: 1 before the peak, and 3 after the peak (chosen rather arbitrarily), and also duration which was added optionally.Here we can see what kind of measurement we have taken to train the classifiers. First, the height of the highest ROR peak. Then 4 amplitude measurements: 1 before the peak, and 3 after the peak (chosen rather arbitrarily), and also duration which was added optionally.

33. Zurich, 29-01-2007 33 Results method II (LDA)/x/ vs /k/

34. Zurich, 29-01-2007 34 Error detection GOP: One general method for all sounds Error specific knowledge is not used Acoustic-phonetic approach Error specific knowledge is used Works well How to generalize? (artic. + other features) Combination? Other approaches, e.g. post. prob�s (ANN)?

35. Zurich, 29-01-2007 35

36. Zurich, 29-01-2007 36 Error detection Pronunciation errors 11 �problematic sounds� : 9 V + 2 C Goal: give feedback on more sounds Morpho-syntactic errors maak / maakt / maken Ik maak Hij/zij maakt Wij maken Goal: also give feedback on morpho-syntactic aspects

37. Zurich, 29-01-2007 37 GOP GOP has been applied in the exp. system. The exp. system was effective. Evaluate GOP Correct vs. errors Patterns Pros & cons Improve

Using speech technology for pronunciation assessment and training Helmer Strik Centre for Language and Speech Technol

Using speech technology for pronunciation assessment and training Helmer Strik Centre for Language and Speech Technol

Presentation Transcript

Speech Technology for Language Learning

Using Speech Recognition for Speech Therapy

Speech and Language Technology For Dialog-based CALL

Speech and Language

Speech and language

Speech and Language

Speech technology for language tutoring Helmer Strik, Ambra Neri, Catia Cucchiarini

Speech and Language

Language and Speech Technology: Parsing

Language and Speech

Language and Speech Technology: Introduction

Technology for Speech, Language, and Voice Impairments

Speech and Language

Speech and Language

Language and Speech Technology

Speech and Language Therapy training Speech sounds

Speech Technology for Language Learning

Language and Speech Technology: Introduction

Speech and Language

Treatments For Speech, Language, and Motor Issues | Speech and Language Therapy Centre in Bangalore