1 / 25

Articulatory Talking Head driven by Automatic Speech Recognition INRIA, Parole Team

Articulatory Talking Head driven by Automatic Speech Recognition INRIA, Parole Team KTH, Centre for Speech Technology. recreate in real-time the articulatory movements of a speaker with an talking head, using the speech signal only. Applications: Communication help for HOH people

sally
Download Presentation

Articulatory Talking Head driven by Automatic Speech Recognition INRIA, Parole Team

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Articulatory Talking Head driven by Automatic Speech Recognition INRIA, Parole Team KTH, Centre for Speech Technology

  2. recreate in real-time the articulatory movements of a speaker with an talking head, using the speech signal only. Applications: Communication help for HOH people Second language learning Speech therapy Aim of the INRIA-KTH collaboration

  3. Articulation display is important to understand English pronunciation as well as a support for perception. In this demonstration: Voice Activation Detection is achieved (separation of dpeech and and non-speech) English phoneme recognition is performed English articulation is analysed Articulation is displayed Articulation

  4. New VAD is a combination GMM and automaton

  5. 21 18 24 27 15 12 9 3D Reconstruction The reconstruction was made using a semi-polar grid of 20 gridlines. One contour per image. A polygon mesh of 420 vertices and about 800 polygons was constructed.

  6. Qualisys optical motion tracking: 4 IR cameras 28 reflectors 3 reference reflectors on headmount C C C C V R C Rf Audio & video recorders V Movetrack Electromagnetic Articulograph: 6 coils; upper lip, upper & lower incisors, three tongue coils: 8, 20 and 52 mm from the tip. Models have been adapted to English

  7. Prosody is important for message understanding. It is present both in speech sound and in facial expressions and gestures. Some prosody information is extracted from the signal as: Fundamental frequency (F0) Energy Speech rate and displayed with the talking head. Prosody

  8. Pitch(F0): Comb filters estimation

  9. F0 comparison between French and Native Speaker Please note that the F0 and narrow band spectogram scales are different

  10. Speech rate can be computed as the average number of phonemes produced by second. We define it as a ratio between: the average duration of the produced phonems The average duration of the same phonems in the phonem recognizer trainning database. Speech rate

  11. Speech rate

  12. The teacher, the learner or the speech therapist speaks The talking head reproduces what has been uttered showing articulators The talking head shows what should have been articulated. This is a first step towards an interactive learning loop. Usage Scenario

  13. A French student pronounces an English sentence…

  14. The student and the teacher can have a closer look at the articulation and prosody…

  15. The teacher can pronouce the sentence as it should be…

  16. The student and the teacher can watch together the correct articulation and prosody…

  17. And of course the teacher can give more detailed explanations and advices…

  18. Thank you.

More Related