1 / 150

Continuous acoustic detail affects spoken word recognition

Continuous acoustic detail affects spoken word recognition. Implications for cognition, development and language disorders. . Bob McMurray University of Iowa Dept. of Psychology. Collaborators. Richard Aslin Michael Tanenhaus David Gow J. Bruce Tomblin. Joe Toscano

Download Presentation

Continuous acoustic detail affects spoken word recognition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Continuous acoustic detail affects spoken word recognition Implications for cognition, development and language disorders. Bob McMurray University of Iowa Dept. of Psychology

  2. Collaborators Richard Aslin Michael Tanenhaus David Gow J. Bruce Tomblin Joe Toscano Cheyenne Munson Dana Subik Julie Markant

  3. Why Speech and Word Recognition • Interface between perception and cognition. • Basic Categories - Meaning • Continuous Input -> Discrete representations. • Meaningful stimuli are almost always temporal. • Music - Visual Scenes (across saccades) • Language • We understand the: • Cognitive processes (word recognition) • Perceptual processes (speech perception) • Ecology of the input (phonetics) • 4) Speech is important: disordered language.

  4. Divisions, Divisions… Perception (& Action) Cognition Speech Perception Word Recognition, Sentence Processing Psychology Phonology, The Lexicon Linguistics Phonetics Speech / Language Pathology Speech, Hearing Language

  5. Divisions, Divisions… Divisions useful for framing research and focusing questions. But: Divisions between domains of study can become… Implicit models of cognitive processing.

  6. Divisions in Spoken Language Understanding • Speech Perception • Categorization of acoustic input into sublexical units. Acoustic Sublexical Units /la/ /ip/ /a/ /b/ /l/ /p/ • Word Recognition • Identification of target word from active sublexical units. Lexicon

  7. Divisions yield processes • Speech Perception • Pattern Recognition • Normalization Processes • Stream Segregation Acoustic Sublexical Units /la/ /ip/ /a/ /b/ /l/ /p/ • Word Recognition • Competition • Activation • Constraint Satisfaction Lexicon

  8. Processes yield models • Speech Perception • Extract invariant phonemes and features. • Discard continuous variation. Acoustic Sublexical Units /la/ /ip/ /a/ /b/ /l/ /p/ • Word Recognition • Identify single • referent. • Ignore competitors. Lexicon Reduce Variance Reduce Continuous Variance

  9. The Variance Reduction Model Words Remove variance Phonemes (etc) Remove variance Variance Reduction Model (VRM)Understanding speech is a process of progressively extracting invariant, discrete representations from variable, continuous input. Continuous speech cues play a minimal role in word recognition (and probably wouldn’t be helpful anyways).

  10. Temporal Integration Variance Reduction Mechanisms The VRM might apply if speech were static. “Goon” Goal:Identify /u/ Signal: Low F1, F2, High F3 Noise: Initially: F2 decreasing Later: F2 increasing Presence of anti-formant

  11. Temporal Integration But the dynamic properties make it more difficult. Gone. Maybe in STM? Hasn’t happened yet. “Goon” Goal:Identify /u/ Signal: Low F1, F2, High F3 Noise: Initially: F2 decreasing Later: F2 increasing Presence of anti-formant

  12. Temporal Integration Variance Utilization Mechanisms Prior /g/ Upcoming /n/ But the dynamic properties make it more difficult. Gone. Maybe in STM? Hasn’t happened yet. “Goon” Goal:Identify /u/ Signal: Low F1, F2, High F3 Signal': Initially: F2 decreasing Later: F2 increasing Presence of anti-formant

  13. Goals Words Remove variance Phonemes (etc) Remove variance • Replace the Variance Reduction Model with the Variance Utilization Model. 2) Normal lexical activation processes can serve as variance utilization mechanisms. 3) Speculatively (and not so speculatively) examine the consequences for: • Temporal Integration / Short Term Memory. • Development • Non-normal Development

  14. Outline • Review • Origins of the VRM. • Spoken Word Recognition. • 2) Empirical Test 3) The VUM • Lexical Locus • Temporal Integration • SLI proposal 4) Developmental Consequences • Empirical Tests • Computational Model • CI proposal

  15. Word Recognition X basic bakery bakery X ba… kery barrier X X bait barricade X baby • Online Spoken Word Recognition • Information arrives sequentially • Fundamental Problem: At early points in time, signal is temporarily ambiguous. • Later arriving information disambiguates the word.

  16. Word Recognition • Current models of spoken word recognition • Immediacy:Hypotheses formed from the earliest moments of input. • Activation Based: Lexical candidates (words) receive activation to the degree they match the input. • Parallel Processing: Multiple items are active in parallel. • Competition: Items compete with each other for recognition.

  17. Word Recognition Input: b... u… tt… e… r time beach butter bump putter dog

  18. Word Recognition These processes have been well defined for a phonemic representation of the input. c A g n I S  n • Considerably less ambiguity if we consider subphonemic information. • Bonus: processing dynamics may solve problems in speech perception. Example: subphonemic effects of motor processes.

  19. Coarticulation n n ee t c k Any action reflects future actions as it unfolds. Example:Coarticulation Articulation (lips, tongue…) reflectscurrent, futureandpastevents. Subtle subphonemic variation in speech reflects temporal organization. Sensitivity to theseperceptualdetails might yield earlier disambiguation. Lexical activation could retain these perceptual details.

  20. Review: These processes have largely been ignored because of a history of evidence that perceptual variability gets discarded. Example:Categorical Perception

  21. Categorical Perception B 100 100 Discrimination % /p/ Discrimination ID (%/pa/) 0 0 B VOT P • Sharp identification of tokens on a continuum. P • Discrimination poor within a phonetic category. Subphonemic variation in VOT is discarded in favor of adiscretesymbol (phoneme).

  22. Categorical Perception Evidence against the strong form of Categorical Perception from psychophysical-type tasks: • Discrimination Tasks • Pisoni and Tash (1974) • Pisoni & Lazarus (1974) • Carney, Widin & Viemeister (1977) • Training • Samuel (1977) • Pisoni, Aslin, Perey & Hennessy (1982) • Goodness Ratings • Miller (1997) • Massaro & Cohen (1983)

  23. Variance Reduction Model Words Remove variance Phonemes (etc) Remove variance CP enabled a fundamental independence of speech perception & spoken word recognition. Evidence against CP seen as supporting VRM (auditory vs. phonological processing mode). Critical Prediction: continuous variation in the signal should not affect word recognition.

  24. Experiment 1 ? Does within-category acoustic detail systematically affect higher level language? Is there a gradient effect of subphonemic detail on lexical activation?

  25. McMurray, Aslin & Tanenhaus (2002) A gradient relationshipwould yield systematic effects of subphonemic information on lexical activation. If this gradiency is useful for temporal integration, it must be preserved over time. Need a design sensitive to bothacoustic detailand detailedtemporal dynamicsof lexical activation.

  26. Acoustic Detail Use a speech continuum—more steps yields a better picture acoustic mapping. KlattWorks:generate synthetic continua from natural speech. • 9-step VOT continua (0-40 ms) • 6 pairs of words. • beach/peach bale/pale bear/pear • bump/pump bomb/palm butter/putter • 6 fillers. • lamp leg lock ladder lip leaf • shark shell shoe ship sheep shirt

  27. Acoustic Detail

  28. Temporal Dynamics How do we tap on-line recognition? With an on-line task:Eye-movements Subjects hear spoken language and manipulate objects in a visual world. Visual world includes set of objects with interesting linguistic properties. abeach,, a peachand some unrelated items. Eye-movements to each object are monitored throughout the task. Tanenhaus, Spivey-Knowlton, Eberhart & Sedivy, 1995

  29. Temporal Dynamics Why use eye-movements and visual world paradigm? • Relatively naturaltask. • Eye-movements generated veryfast(within 200ms of first bit of information). • Eye movementstime-lockedto speech. • Subjectsaren’t awareof eye-movements. • Fixation probability maps ontolexical activation..

  30. Task A moment to view the items

  31. Task

  32. Task Bear Repeat 1080 times

  33. Identification Results 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 5 10 15 20 25 30 35 40 High agreement across subjects and items for category boundary. proportion /p/ B VOT (ms) P By subject:17.25 +/- 1.33ms By item: 17.24 +/- 1.24ms

  34. Eye-Movement Analysis 200 ms Trials 1 2 3 4 5 % fixations Time Target = Bear Competitor = Pear Unrelated = Lamp, Ship

  35. Eye-Movement Results 0.9 VOT=0 Response= VOT=40 Response= 0.8 0.7 0.6 0.5 Fixation proportion 0.4 0.3 0.2 0.1 0 0 400 800 1200 1600 2000 0 400 800 1200 1600 Time (ms) More looks to competitor than unrelated items.

  36. Eye-Movement Results target Fixation proportion Fixation proportion time time • Given that • the subject heard bear • clicked on “bear”… How often was the subject looking at the “pear”? Categorical Results Gradient Effect target target competitor competitor competitor competitor

  37. Eye-Movement Results 20 ms 25 ms 30 ms 10 ms 15 ms 35 ms 40 ms 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 0 400 800 1200 1600 0 400 800 1200 1600 2000 Response= Response= VOT VOT 0 ms 5 ms Competitor Fixations Time since word onset (ms) Long-lasting gradient effect: seen throughout the timecourse of processing.

  38. Eye-Movement Results 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0 5 10 15 20 25 30 35 40 Area under the curve: Clear effects of VOT B: p=.017* P: p<.001*** Linear Trend B: p=.023* P: p=.002*** Response= Response= Looks to Competitor Fixations Looks to Category Boundary VOT (ms)

  39. Eye-Movement Results 0.08 0.07 0.06 0.05 0.04 0.03 0.02 0 5 10 15 20 25 30 35 40 Unambiguous Stimuli Only Clear effects of VOT B: p=.014* P: p=.001*** Linear Trend B: p=.009** P: p=.007** Response= Response= Looks to Competitor Fixations Looks to Category Boundary VOT (ms)

  40. Summary Subphonemic acoustic differences in VOT have gradient effect on lexical activation. • Gradient effect of VOT on looks to the competitor. • Effect holds even for unambiguous stimuli. • Seems to be long-lasting. Consistent with growing body of work using priming (Andruski, Blumstein & Burton, 1994; Utman, Blumstein & Burton, 2000; Gow, 2001, 2002).

  41. Extensions P L Bear B Sh Basic effect has been extended to other phonetic cues. - general property of word recognition… • Voicing (b/p)1 • Laterality (l/r), Manner (b/w), Place (d/g)1 • Vowels (i/I, /)2 • Natural Speech (VOT)3 X Metalinguistic Tasks3 1McMurray, Clayards, Tanenhaus & Aslin (2004) 2McMurray & Toscano (in prep) 3McMurray, Aslin, Tanenhaus, Spivey and Subik (submitted)

  42. Lexical Sensitivity 0.1 Response=P Looks to B 0.08 0.06 Competitor Fixations Response=B Looks to B 0.04 Category Boundary 0.02 0 0 5 10 15 20 25 30 35 40 VOT (ms) Basic effect has been extended to other phonetic cues. - general property of word recognition… • Voicing (b/p)1 • Laterality (l/r), Manner (b/w), Place (d/g)1 • Vowels (i/I, /)2 • Natural Speech (VOT)3 X Metalinguistic Tasks3 1McMurray, Clayards, Tanenhaus & Aslin (2004) 2McMurray & Toscano (in prep) 3McMurray, Aslin, Tanenhaus, Spivey and Subik (submitted)

  43. Lexical Sensitivity 0.1 Response=P Looks to B 0.08 0.06 Competitor Fixations Response=B Looks to B 0.04 Category Boundary 0.02 0 0 5 10 15 20 25 30 35 40 VOT (ms) Basic effect has been extended to other phonetic cues. - general property of word recognition… • Voicing (b/p) • Laterality (l/r), Manner (b/w), Place (d/g) • Vowels (i/I, /) • Natural Speech (VOT) X Metalinguistic Tasks 1McMurray, Clayards, Tanenhaus & Aslin (2004) 2McMurray & Toscano (in prep) 3McMurray, Aslin, Tanenhaus, Spivey and Subik (submitted)

  44. The Variance Utilization Model Word recognition is systematically sensitiveto subphonemic acoustic detail. 2) Acoustic detail is represented as gradations in activation across the lexicon. • Normal word recognition processes do the work of. • Maintaining detail • Sharpening categories • Anticipating upcoming material • Resolving prior ambiguity.

  45. The Variance Utilization Model b/p bump pump dump bun bumper bomb Input: b... u… m… p… time Gradations phonetic cues preserved as relative lexical activation.

  46. The Variance Utilization Model b/d bump pump dump bun bumper bomb Input: b... u… m… p… time Gradations phonetic cues preserved as relative lexical activation.

  47. The Variance Utilization Model bump pump dump bun bumper bomb Input: b... u… m… p… time Vowel length Non-phonemic distinctions preserved. (e.g. vowel length: Gow & Gordon, 1995; Salverda, Dahan & McQueen 2003)

  48. The Variance Utilization Model bump pump dump bun bumper bomb Input: b... u… m… p… time n/m n/m info lost Material only retained until it is no longer needed. Words are a conveniently sized unit.

  49. The Variance Utilization Model bump pump dump bun bumper bomb Input: b... u… m… p… time No need for explicit short-term memory: lexical activation persists over time.

  50. The Variance Utilization Model bump pump dump bun bumper bomb Input: b... u… m… p… time Lexical competition: Perceptual warping (ala CP) results from natural competition processes.

More Related