A brain’s-eye-view of speech perception
Download
1 / 97

Colleagues : Allen Braun, NIH Greg Hickok, UC Irvine Jonathan Simon, Univ. Maryland - PowerPoint PPT Presentation


  • 89 Views
  • Uploaded on

A brain’s-eye-view of speech perception David Poeppel Cognitive Neuroscience of Language Lab Department of Linguistics and Department of Biology Neuroscience and Cognitive Science Program University of Maryland College Park. Colleagues : Allen Braun, NIH Greg Hickok, UC Irvine

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Colleagues : Allen Braun, NIH Greg Hickok, UC Irvine Jonathan Simon, Univ. Maryland' - leo-dorsey


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

A brain’s-eye-view of speech perception

David Poeppel

Cognitive Neuroscience of Language Lab

Department of Linguistics and Department of Biology

Neuroscience and Cognitive Science Program

University of Maryland College Park

Colleagues:

  • Allen Braun, NIH

  • Greg Hickok, UC Irvine

  • Jonathan Simon, Univ. Maryland

Students:

  • Anthony Boemio

  • Maria Chait

  • Huan Luo

  • Virginie van Wassenhove


“chair”

“uncomfortable”

“lunch”

“soon”

encoding ?

Is this a hard problem?

Yes!

If it could be solved straightforwardly

(e.g. by machine), Mark Liberman

would be in Tahiti having cold beers.

representation ?


Outline

(1) Fractionating the problem in space:

Towards a functional anatomy of speech perception

Fractionating the problem in time:

Towards a functional physiology of speech perception

- A hypothesis about the quantization of time

- Psychophysical evidence for temporal integration

- Imaging evidence


interface with

lexical items,

word recognition


interface with

lexical items,

word recognition

hypothesis about storage:

distinctive features

[-voice] [+voice] [+voice]

[+labial] [+high] [+labial]

[-round] [+round] [-round]

[….] [….] [….]


production,

articulation of

speech

interface with

lexical items,

word recognition

hypothesis about storage:

distinctive features

[-voice] [+voice] [+voice]

[+labial] [+high] [+labial]

[-round] [+round] [-round]

[….] [….] [….]


production,

articulation of

speech

hypothesis about production:

distinctive features

[-voice] [+voice]

[+labial] [+high]

[….] [….]

interface with

lexical items,

word recognition

hypothesis about storage:

distinctive features

[-voice] [+voice] [+voice]

[+labial] [+high] [+labial]

[-round] [+round] [-round]

[….] [….] [….]


production,

articulation of

speech

FEATURES

analysis of auditory

signal  spectro-temporal rep. 

FEATURES

interface with

lexical items,

wordrecognition

FEATURES


Unifying concept:

distinctive feature

auditory-motor interface

coordinate transform

from acoustic to

articulatory space

production,

articulation of

speech

analysis of auditory

signal  spectro-temporal rep. 

FEATURES

auditory-lexical interface

interface with

lexical items,

word recognition


coordinate transform

from acoustic to

articulatory space

production,

articulation of

speech

analysis of auditory

signal  spectro-temporal rep. 

FEATURES

interface with

lexical items,

word recognition


pIFG/dPM (left)

articulatory-based

speech codes

Area Spt (left)

auditory-motor interface

STG (bilateral)

acoustic-phonetic

speech codes

pMTG (left)

sound-meaning interface

Hickok & Poeppel (2000), Trends in Cognitive Sciences

Hickok & Poeppel (in press), Cognition


Indefrey & Levelt, in press, Cognition

Meta-analysis of neuroimaging data, perception/production overlap

Shared neural correlates

of word production and

perception processes

Bilat mid/post STG

L anterior STG

L mid/post MTG

L post IFG

  • MTG and IFG overlap when controlling for the overt/covert

  • distinction across tasks

  • Hypothesized functions:

  • lexical selection (MTG)

  • lexical phon. code retr. (MTG)

  • post-lexical syllabification (IFG)



Possible subregions of inferior frontal gyrus burton 2001
Possible Subregions of Inferior Frontal GyrusBurton (2001)

Auditory Studies

Burton et al. (2000), Demonet et al. (1992, 1994), Fiez et al, (1995), Zatorre et al., (1992, 1996)

Visual Studies

Sergent et al. (1992, 1993), Poldrack et al., (1999), Paulesu et al. (1993, 1996), Sergent et al., 1993, Shaywitz et al. (1995)


Auditory lexical decision versus FM/sweeps (a), CP/syllables (b), and rest (c)

(a)

(b)

(c)

D. Poeppel et al. (in press)

z=+6

z=+9

z=+12


fMRI (yellow blobs) and MEG (red dots) recordings of speech perception

show pronounced bilateral activation of left and right temporal cortices

T. Roberts & D. Poeppel

(in preparation)


Binder et al. 2000 perception


pIFG/dPM (left) perception

articulatory-based

speech codes

Area Spt (left)

auditory-motor interface

STG (bilateral)

acoustic-phonetic

speech codes

pMTG (left)

sound-meaning interface

Hickok & Poeppel (2000), Trends in Cognitive Sciences

Hickok & Poeppel (in press), Cognition


Outline perception

(1) Fractionating the problem in space:

Towards a functional anatomy of speech perception

Fractionating the problem in time:

Towards a functional physiology of speech perception

- A hypothesis about the quantization of time

- Psychophysical evidence for temporal integration

- Imaging evidence




Phenomena at the scale of formant transitions, subsegmental cues

“short stuff” -- order of magnitude 20-50ms

Phenomena at the scale of syllables (tonality and prosody)

“long stuff” -- order of magnitude 150-250ms

Acoustic and articulatory phonetic phenomena

occur on different time scales

fine

structure

envelope


Does different granularity in time matter? cues

Segmental and subsegmental information

serial order in speech fool/flu

carp/crap

bat/tab

Supra-segmental information

prosody Sleep during lecture!

Sleep during lecture?


The local/global distinction can be conceptualized cuesas a multi-resolution analysis in time

Further processing

Binding process

Supra-segmental information

(time ~200ms)

Segmental information

(time ~20-50ms)

syllabicity

metrics

tone

features, segments


Outline cues

(1) Fractionating the problem in space:

Towards a functional anatomy of speech perception

Fractionating the problem in time:

Towards a functional physiology of speech perception

- A hypothesis about the quantization of time

- Psychophysical evidence for temporal integration

- Imaging evidence


Temporal integration windows cues

Psychophysical and electrophysiologic evidence suggests

that perceptual information is integrated and analysed in

temporal integration windows (v. Bekesy 1933; Stevens and

Hall 1966; Näätänen 1992; Theunissen and Miller 1995; etc).

The importance of the concept of a temporal integration

window is that it suggests the discontinuous processing

of information in the time domain. The CNS, on this view,

treats time not as a continuous variable but as a series of

temporal windows, and extracts data from a given window.

arrow of time, physics

arrow of time, Central Nervous System


25ms cues

short temporal

integration

windows

long temporal

integration

windows

200ms

Asymmetric sampling/quantization of the speech waveform

This p a p er i s h ar d tp u b l i sh


Two spectrograms of the same word illustrate how different cues

analysis windows highlight different aspects of the sounds.

(a) high time resolution - each glottal pulse visible as vertical striation

(b) high frequency resolution - each harmonic visible as horizontal stripe

(a)

High time,

low frequ.-

resolution

(b)

Low time,

high frequ.-

resolution


Hypothesis: Asymmetric Sampling in Time (AST) cues

Left temporal cortical areas preferentially extract

information over 25ms temporal integration windows.

Right hemisphere areas preferentially integrate over

long, 150-250ms integration windows.

By assumption, the auditory input signal

has a neural representation that is bilaterally symmetric

(e.g. at the level of core); beyond the initial representation,

the signal is elaborated asymmetrically in the time

domain.

Another way to cocneptualize the AST proposal is to say that

the sampling rate of non-primary auditory areas is

different, with LH sampling at high frequencies (~40Hz)

and RH sampling at low frequencies (4-10Hz).


e.g. cues

e.g.

intonation contours

formant transitions

b. Functional lateralization

Analyses

requiring high

temporal

resolution

Analyses

requiring high

spectral

resolution

LH

RH

a. Physiological lateralization

Symmetric representation of spectro-temporal receptive fields in primary auditory cortex

Temporally asymmetric elaboration of perceptual representations in non-primary cortex

LH

RH

Proportion of

neuronal ensembles

25

[40Hz 4Hz]

250

25

[40Hz 4Hz]

250

Size of temporal integration windows (ms)

[Associated oscillatory frequency (Hz)]


Asymmetric sampling in time (AST) characteristics cues

• AST is an example of functional segregation, a standard concept.

• AST is an example of multi-resolution analysis, a signal processing

strategy common in other cortical domains (cf. visual areas MT and V4

which, among other differences, have phasic versus tonic firing properties,

respectively).

• AST speaks to the “granularity” of perceptual representations:

the model suggests that there exist basic perceptual representations

that correspond to the different temporal windows (e.g. featural info is

equally basic to the envelope of syllables, on this view).

• The AST model connects in plausible ways to the local versus global

distinction: there are multiple representations of a given signal

on different scales (cf. wavelets)

Global ==> ‘large-chunk’ analysis, e.g., syllabic level

Local ==> ‘small-chunk’ analysis, e.g., subsegmental level


e.g. cues

e.g.

intonation contours

formant transitions

a. Physiological lateralization

Symmetric representation of spectro-temporal receptive fields in primary auditory cortex

Temporally asymmetric elaboration of perceptual representations in non-primary cortex

LH

RH

Proportion of

neuronal ensembles

25

[40Hz 4Hz]

250

25

[40Hz 4Hz]

250

Size of temporal integration windows (ms)

[Associated oscillatory frequency (Hz)]

b. Functional lateralization

Analyses

requiring high

temporal

resolution

Analyses

requiring high

spectral

resolution

LH

RH


Outline cues

(1) Fractionating the problem in space:

Towards a functional anatomy of speech perception

Fractionating the problem in time:

Towards a functional physiology of speech perception

- A hypothesis about the quantization of time

AST model

- Psychophysical evidence for temporal integration

- Imaging evidence


Perception of fm sweeps huan luo mike gordon anthony boemio david poeppel
Perception of FM sweeps cuesHuan Luo, Mike Gordon, Anthony Boemio,David Poeppel


Fm sweep example
FM Sweep Example cues

waveform

80msec, from 3-2 kHz, linear FM sweep

spectrogram


The rationale
The rationale cues

  • Important cues for speech perception:

    Formant transition in speech sounds

    (For example, F2 direction can distinguish /ba/ from /da/)

  • Importance in tone languages

  • Vertebrate auditory system is well equipped to analyze FM signals.


Tone languages
Tone languages cues

  • For example, Chinese, Thai…

  • The direction of FM (of the fundamental frequency) is important in the language to make lexical distinctions.

  • (Four tones in Chinese)

    /Ma 1/, /Ma 2/ , /Ma 3/, /Ma 4/


Questions
Questions cues

  • How good are we at discriminating these signals?

    determine the threshold of the duration of stimuli (corresponding to rate) for the detection of FM direction

    Any performance difference between UP and DOWN detection?

  • Will language experience affect the performance of such a basic perceptual ability?


Stimuli
Stimuli cues

  • Linearly frequency modulated

  • Frequency range studied: 2-3 kHz (0.5 oct)

  • Two directions (Up / Down )

  • Changing FM rate (frequency range/time) by changing duration. For each frequency range, frequency span is kept constant (slow / Fast )

  • Stimuli duration: from 5msec(100 oct/sec) to 640 msec (0.8 oct/sec)

Tasks

• Detection and discrimination of UP versus DOWN

• 2 AFC, 2IFC, 3IFC


  • English speakers cues

  • • 3 frequency ranges relevant to speech

  • (approximately F1, F2, F3 ranges)

  • • single-interval 2-AFC

  • Two main findings:

  • • threshold for UP at 20ms

  • • UP better than DOWN

2-3 kHz

1-1.5 kHz

600-900Hz

Gordon & Poeppel (2001), JASA-ARLO


2IFC cues

  • To eliminate the possibility of bias strategy subjects can use

  • To see whether the asymmetric performance of English subjects is due to their “Up preference bias”

Same duration of the two sounds, so the only difference is direction

Interval 1

Interval 2

UP

Down

Which interval (1 or 2) contains certain direction sound?


Results for chinese subjects
Results for Chinese Subjects cues

no significant difference

Threshold for both UP and DOWN is about

20 msec


Results for English Subjects cues

No difference now between UP and DOWN

Threshold for both at 20msec

No difference between Chinese and English subjects now.


3IFC cues

Standard Interval 1 Interval 2

UP

UP

Down

Choose which interval contains DIFFERENT among the three sounds (different quality rather than only direction)


3 IFC versus 2 IFC cues

No difference between Chinese and English subjects

Threshold confirmed at 20ms


Conclusion
Conclusion cues

  • Importance of 20 msec as the threshold for discrimination of FM sweeps

    - corresponds to temporal order threshold determined by Hirsh 1959

    - consistent with Schouten 1985, 1989 testing FM sweeps

    - this basic threshold arguably reflects the shortest integration window that generates robust auditory percepts.


Click trains cues

Anthony Boemio & David Poeppel




Auditory visual integration the mcgurk effect virginie van wassenhove ken grant david poeppel
Auditory visual integration: the McGurk effect cuesVirginie van Wassenhove, Ken Grant,David Poeppel


Mcgurk effect
McGurk Effect cues

  • Audiovisual (AV) token

  • Visual (V) token

  • Auditory (A) token


Identification task 3afc a p v k
Identification Task (3AFC) A cuespVk

TWI

True bimodal responses

Response rate as a function of SOA (ms) in the ApVk McGurk pair.

Mean responses (N=21) and standard errors. Fusion rate (open red squares) and corrected fusion rate (filled red squares, dotted line) are /ta/ responses, visually driven responses (open green triangles) are /ka/, and auditorily driven responses (filled blue circles) are /pa/. A negative value in corrected fusion rate is interpreted as a visually dominated error response /ta/.


Simultaneity judgment task 2afc a p v k vs a t v t and a b v g vs a d v d
Simultaneity Judgment Task (2AFC) cuesApVk vs. AtVt and AbVg vs. AdVd

Simultaneity judgment task.

Simultaneity judgment as a function of SOA (ms) in both incongruent and congruent conditions (ApVk and AtVt N=21; AbVg and AdVd N=18). The congruent conditions (open symbols) are associated with broader and higher simultaneity judgment profile than the incongruent conditions (filled symbols).


Temporal window of integration twi across tasks and bimodal speech stimuli

Stimulus cues

Task

A Lead, Left Boundary (ms)

A Lag, Right Boundary (ms)

Plateau Center (ms)

Window Size (ms)

ApVk

ID

-25

+136

+56

161

S

-44

+117

+37

161

AtVt

S

-80

+125

+23

205

AbVg

ID

-34

+174

+70

208

S

-37

+122

+43

159

AdVd

S

-74

+131

+29

205

Temporal Window of Integration (TWI) across Tasks and Bimodal Speech Stimuli


Outline cues

(1) Fractionating the problem in space:

Towards a functional anatomy of speech perception

Fractionating the problem in time:

Towards a functional physiology of speech perception

- A hypothesis about the quantization of time

•AST model

- Psychophysical evidence for temporal integration

• FM sweeps and click trains: 20-30ms integration

• AV processing in McGurk: 200ms integration

- Imaging evidence


Binding of temporal quanta in speech processing

Binding of Temporal Quanta in Speech Processing cues

Maria Chait, Steven Greenberg, Takayuki Arai, David Poeppel


Multi Resolution Analysis Hypothesis cues

“SYLLABLE”

Binding process

Supra- segmental information

(t.s ~300 ms)

(Sub)-segmental information

(t.s ~30 ms)

stress

tone

syllabicity

feature


Signal Processing cues:

5045-6000 Hz

Low Pass E14 (0-3 Hz)

E14×FS14

E14, FS14

S_low

Low Pass E2 (0-3 Hz)

265-315Hz

E2, FS2

E2×FS2

Low Pass E1 (0-3 Hz)

0-265Hz

E1, FS1

E1×FS1

Filtering

Computing the Envelope and fine Structure

Low Pass Filter

Multiply E by FS

5045-6000 Hz

High Pass E14 (22- Hz)

E14×FS14

Original

E14, FS14

S_high

High Pass E2 (22- Hz)

265-315Hz

E2, FS2

E2×FS2

High Pass E1 (22- Hz)

0-265Hz

E1, FS1

E1×FS1


  • 0-6 khz cues

  • 14 channels

  • spaced in 1/3 octave steps along the cochlear frequency map.

  • Every two neighboring channels are separated by 50hz


Envelope extraction
Envelope Extraction cues

Amplitude

Time


Original Envelope cues

Low Passed Envelope

High Passed Envelope


Original cues

High Passed

Low Passed


Evidence
Evidence cues:

  • Comodulation masking release

  • Ahissar et al. (2001) - Phase locking in the auditory cortex to the envelope of sentence stimuli.

  • Shannon (1995)

  • Drullman (1994):

Effect of low pass filtering the envelope on speech reception:

*severe reduction at 0-2Hz cutoff frequencies

*marginal contribution of frequencies above 16Hz

Effect of High Pass filtering the envelope:

*reduction in speech intelligibility for cutoff frequencies above 64Hz

*no reduction in sentence intelligibility when only frequencies below 4Hz are reduced


Experiment 1

Presented Dichotically cues

Experiment 1

Stimuli:

- 53 Sentences from the IEEE corpus.

- Nonsense Syllables (CUNY)

8 Blocks – 2(voiced/voiceless)*2 vowels(/a/,/i/) *2(CV/VC)

- 3 manipulations

0-3 Hz Low Pass

22-40 Hz Band Pass

0-3 and 22-40 Hz

Each subject hears all 53 sentences but only one manipulation

per sentence. A practice block of 26 sentences precedes

the experiment.

Task:

  • Sentences: subjects asked to write down what they heard as precisely as they can

  • Syllables: 7-alternative forced choice


Results
Results cues

high-pass


Results1
Results cues

low-pass

high-pass


Results2
Results cues

high-pass

plus

low-pass?

low-pass

high-pass


Results3
Results cues

low-pass

high-pass

high-pass

plus

low-pass?

Result reflects the interaction between

information carried on the short and

long time scales.


Outline cues

(1) Fractionating the problem in space:

Towards a functional anatomy of speech perception

Fractionating the problem in time:

Towards a functional physiology of speech perception

- A hypothesis about the quantization of time

•AST model

- Psychophysical evidence for temporal integration

• FM sweeps and click trains: 20-30ms integration

• AV processing in McGurk: 200ms integration

• Interaction of temporal windows

- Imaging evidence


  • fMRI study of temporal cues

  • structure in concatenated FMs

  • Anthony Boemio, Allen Braun,

  • Steven Fromm, David Poeppel



Stimulus Properties cues

Spectrograms

PSDs

Ampl. vs. Time

FM Stimulus

TONE Stimulus

CNST Stimulus

1

1E-10

100

Frequency (Hz)

1E4

0

Time (sec)

1

All 13 stimuli have nearly identical long-term spectra and RMS power over the entire 9-second stimulus duration. Stimuli differ only in segment duration which was determined by drawing from a Gaussian distribution (previous panel), with means of 12, 25, 45, 85, 160, and 300ms.


fMRI cues

• Single-trial sparse acquisition paradigm

(clustered volume acqu.)

• 1.5T GE Signa, echo-planar sequence

• 11.4s TR (9s signal,

2.4s volume), TE 40ms

• 24 reps/condition

• SPM 99 random-effects

Model, p<0.05 corrected


SPM 99 cuesCohort AnalysisFMs-CNST Categorical Contrasts (p < 0.05 corr.)


Hemodynamic response/stimulus model cuesNot all segment transitions are equal.

Segment

Segment Transition

FM/TONE

CNST

Only 1second

of stimuli are

shown for

clarity

acquisition

Including the segment transitions and segments themselves, but assuming that transitions between long segments contribute more to the response than shorter ones produces the observed activation vs. segment-duration relation (left).

threshold set by

categorical contrast

to CNST stimulus-–

anything below this

level will be zero in

the SPM



Asymmetric sampling in time (AST) hypothesis cues

predicts electrophysiological asymmetries

in specific frequency bands,

gamma (25-55Hz) and theta (3-8Hz) ….

… because the hypothesized temporal quantization

is reflected as oscillatory activity.

LH

RH

Sensitivity of

neuronal ensembles

25 250

[40Hz 4Hz]

25 250

[40Hz 4Hz]

Size of temporal integration windows (ms)

[Associated oscillatory frequency (Hz)]


Flow chart

Gamma BandPass Filter cues

Theta BandPass Filter

Flow chart

Gamma for LH

LH

RMS

Gamma for RH

Theta for LH

RH

RMS

Theta for RH


Multi-taper cues

spectral analysis


Result
Result cues


Power ratio in specific frequency bands
Power ratio in specific cuesfrequency bands

(P(L)/(P(L)+P(R)))

  • The difference is much greater in Theta band (low frequency band) and RH activation in Theta band is greater than LH



Outline cues

(1) Fractionating the problem in space:

Towards a functional anatomy of speech perception

Fractionating the problem in time:

Towards a functional physiology of speech perception

- A hypothesis about the quantization of time

•AST model

- Psychophysical evidence for temporal integration

• FM sweeps and click trains: 20-30ms integration

• AV processing in McGurk: 200ms integration

• Interaction of temporal windows

- Imaging evidence

•fMRI: temporal sensitivity and lateralization

• MEG spectral lateralization


pIFG/dPM (left) cues

articulatory-based

speech codes

Area Spt (left)

auditory-motor interface

STG (bilateral)

acoustic-phonetic

speech codes

pMTG (left)

sound-meaning interface

Hickok & Poeppel (2000), Trends in Cognitive Sciences

Hickok & Poeppel (in press), Cognition


e.g. cues

e.g.

intonation contours

formant transitions

Asymmetric sampling in time (AST) builds on

anatomical symmetry but permits functional asymmetry

a. Physiological lateralization

Symmetric representation of spectro-temporal receptive fields in primary auditory cortex

Temporally asymmetric elaboration of perceptual representations in non-primary cortex

LH

RH

Proportion of

neuronal ensembles

25

[40Hz 4Hz]

250

25

[40Hz 4Hz]

250

Size of temporal integration windows (ms)

[Associated oscillatory frequency (Hz)]

b. Functional lateralization

Analyses

requiring high

temporal

resolution

Analyses

requiring high

spectral

resolution

LH

RH


Conclusion cues

The input signal (e.g. speech) must interface with

higher-order symbolic representations of different types

(e.g. segmental representations relevant to lexical access

and supra-segmental representations relevant to

interpretation).

These higher-order representation categories appear to be

lateralized (e.g. segmental phonology/LH, phrasal prosody/RH).

The timing-based asymmetry provides a possible cortical

‘logistical’ or ‘administrative’ device that helps create

representations of the appropriate granularity.

If this is on the right track, syllable is - at least for perception -

as elementary a unit as feature/segment. Both are basic.


Long-term memory: cues

Abstract lexical repr.

contextual

information

Recoding

Synthesis

Where do the candidates

for synthesis come from?

acoustic-phonetic

manifestations of

words

Peripheral auditory

processing

Segmentation and

labeling

MATCHING

PROCESS

Lexical access

code

spectral

representation

BEST LEXICAL

CANDIDATE

Analysis

Analysis-by-synthesis I

Hypothesize- and

test models


Analysis-by-synthesis II cues

Analysis-by-synthesis model of lexical hypothesis generation and

verification (adapted and extended from Klatt, 1979)

analysis-by-synthesis

verification;

“internal forward model”

best- scoring

lexical candidates

peripheral

and central

‘neurogram’

partial

feature

matrix

lexical

hypotheses

spectral

analysis

segmental

analysis

lexical

search

synt./seman.

analysis

speech

waveform

predicted

subsequent items

acceptable

word string


frontal areas (articulatory codes) - l IFG, premotor cues

temporo-parietal areas?

auditory

cortex

pSTG?

MTG?

ITG?

Analysis-by-synthesis III

analysis-by-synthesis

verification;

“internal forward model”

best- scoring

lexical candidates

peripheral

and central

‘neurogram’

partial

feature

matrix

lexical

hypotheses

spectral

analysis

segmental

analysis

lexical

search

synt./seman.

analysis

speech

waveform

predicted

subsequent items

acceptable

word string


ad