1 / 40

Un androïde doué de parole A speech-gifted android

Un androïde doué de parole A speech-gifted android. Institut de la Communication Parlée. Laplace. Robotic tools (theories, algorithms, paradigms) applied to a human cognitive system (speech) instead of a human “artefact” (a “robot”). The goal of the project.

edita
Download Presentation

Un androïde doué de parole A speech-gifted android

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Un androïde doué de paroleA speech-gifted android Institut de la Communication Parlée Laplace

  2. Robotic tools (theories, algorithms, paradigms) applied to a human cognitive system (speech) instead of a human “artefact” (a “robot”) The goal of the project Or: study speech as a robotic system (a speaking android)

  3. Speech: not an information processing system, but a sensori-motor system plugged on language This system deals with control, learning, inversion, adaptation, multisensoriality, communication …hence robotics!

  4. " In studying human intelligence, three common conceptual errors often occur:lreliance on monolithic internal models, lon monolithic control,land on general purpose processing. A modern understanding of cognitive science and neuroscience refutes these assumptions. « Cog » at MIT (R. Brooks) http://www.ai.mit.edu/projects/cog/methodology.html

  5. Our alternative methodology is based on evidence from cognitive science and neuroscience which focus on four alternative attributes which we believe are critical attributes of human intelligence: embodiment and physical coupling, multimodal integration, developmental organization, and social interaction.

  6. Talking Cog, a speaking android ICP: Speech modelling, speech robotics Laplace: Bayesian Robotics Austin: Speech ontogenesis

  7. « Talking Cog » articulatory model Tongue tip Tongue dorsum Lip protrusion Lip separation Tongue body Jaw height Larynx height

  8. [ i ] [ u ] [ a ]

  9. 50 0 -50 5000 0 1000 2000 3000 4000 « Talking Cog » sensors Formants Audition F2 F1 F3 F4 Vision Touch

  10. « Talking Cog » growth

  11. Learning: Bayesien inference A sensori-motor agent (M, P) learning sensori-motor relationships through active exploration p (M, P) (M) motor (P) perceptual

  12. Acquire controls from percepts : p (M / P) ? Perceptual input (target ?) (M) motor (P) perceptual

  13. Regularise percepts from actions : p (P / M) ? Incomplete perceptual input (M) motor (P) perceptual

  14. Predict one modality from another one : p (P2 / P1) P1 : orosensorial P2 : audio

  15. Coherently fuse two or more modalities : p (M / P1, P2) s1 (P1) (M) s2 (P2)

  16. The route towards adult speech: learning control 0 mth : imitation of the three major speech gestures 4 mth : vocalisation, imitation 7 mth : jaw cycles (babbling) Later: control of carried articulators (lips, tongue) for vowels and consonants Exploration & imitation

  17. First experiment: simulating exploration from 4 to 7 months Phonetic data (sounds and formants) on 4- and 7-months babies ’ vocalisations

  18. True data Max. acoustical space F1 F2 F1 Acoustical framing F2 F1 F2 Acoustical framing

  19. Pre-babbling (4 months) Babbling Onset (7 months) F1 High FrontBack Low High FrontBack Low F2 F2 Central Mid-high Central High-Low Results Black: android capacities Color: infant productions

  20. Various sub-models: Which one is the best? Articulatory framing

  21. Real Distribution Theoretical Distribution Comparison F1 F1 F2 F2 P(M/f1f2) Selection of the BEST M Method

  22. Too restricted Too wide The best !

  23. Pre-babbling (4 months) Babbling Onset (7 months) Lips and tongue Lips and tongue + Jaw (J) F1 + J F2 F2 Results

  24. Conclusion I • 1.Acoustical framing: • cross-validation of the data and model 2. Articulatory framing: articulatory abilities / exploration 4 months: Tongue dorsum / body+ Lips 7 months: idem + Jaw 3. More on early sensori-motor maps

  25. From visuo-motor imitation at 0 months to audiovisuo-motor imitation at 4 months Second experiment: simulating imitation at 4 months

  26. Hearing/seing Adult speech [a] 3 - 5 months babies [a i u] Early vocal imitation [Kuhl & Meltzoff, 1996] About 60% « good responses »

  27. Questions 1. Is the imitation process visual, auditory, audio-visual? 2. How much exploration is necessary for imitation? 3. Is it possible to reproduce the experimental pattern of performances ?

  28. INVERSION Lip area Al Al_i Al_a Al_u 4 mths-model Lips - Tongue Categorisation f1 f2 i a u Testing visual imitation

  29. Experimental data Simulation data Total Total Productions Total Total u i a Al Experimental data do not concord with visual imitation response profiles Visual imitation: simulation results

  30. Articulatory inputs Lh TbTd Xh, Yh Intermediary control variables Xh Yh Al Al Vocal tract F1 F2 Auditory outputs Testing audio imitation The three intermediary control variables correspond to crucial parameters for control, connected to orosensorial channels, and able to simplify the control for the 7-parameters articulatory model

  31. Joint probability : P ( Lh  Tb  Td  Xh  Yh  Al  F1  F2 ) Parametrisation and decomposition Articulatory variables : Lh, Tb & Td -> Gaussian Control variables : Xh, Yh & Al -> Laplace Auditory variables : F1 & F2 -> Gaussian P (Xh  Yh  Al ) = P (Xh) * P (Yh) * P(Al) P (Lh  Tb  Td  F1  F2 / Xh  Yh  Al) = P (Lh  Tb  Td / Xh  Yh  Al)* P (F1  F2/ Xh  Yh  Al) P (Lh / Xh  Yh  Al) = P (Lh / Al) P (Tb  Td / Xh  Yh  Al) = P (Tb  Td / Xh  Yh)

  32. Dependance Structure P ( Lh  Tb  Td  Xh  Yh  Al  F1  F2 ) = P (Xh) * P (Yh) * P(Al) * P (Lh / Al)* P(Tb / Xh Yh)*P(Td / Xh Yh  Tb) * P (F1 / Xh Yh  Al) * P (F2 / Xh  Yh  Al) Learning Description of the sensori-motor behaviour

  33. The challenge From the exploration defined by Exp. 1, what is the amount of data (self-vocalisations) necessary for learning enough to produce 60% correct responses in Exp. 2? The idea If your amount of learning data is small, the discretisation of your control space should be rough

  34. Inversion results RMS Audio error (F1, F2) of the inversion process (Bark) Size of the control space 4 4 32 32 256 256 2048 2048 Size of the learning space

  35. Optimal learning space size vs. control space size random 4 32 256 2048 Size of the learning space Size of the control space

  36. INVERSION F12_i F12_a F12_u F1 4 mths-model Lips - Tongue F2 Categorisation f1 f2 i a u Simulating audio-motor imitation Audio targets [i a u] i u a

  37. Simulation results 2048 256 32 4 infants

  38. Results Réalité Cibles Audio-Visuels Simulations Cibles Auditives connues Productions

  39. Conclusion II 1. 10 to 30 vocalisations are enough for an infant to learn to produce 60% good vocalisations in the audio-imitation paradigm! 2.Three major factors intervene in the baby android performances : learning size, control size, and variance distribution in the learning set (not shown here)

  40. Final conclusions and perspectives 1. Some of the exploration and imitation of human babies reproduced by their android cousins (Feasibility / Understanding) 2.The developmental path must be further explored, and the baby android must be questioned about what it really learned, and what it can do at the output of the learning process

More Related