A brain’s-eye-view of speech perception
This presentation is the property of its rightful owner.
Sponsored Links
1 / 97

Colleagues : Allen Braun, NIH Greg Hickok, UC Irvine Jonathan Simon, Univ. Maryland PowerPoint PPT Presentation


  • 57 Views
  • Uploaded on
  • Presentation posted in: General

A brain’s-eye-view of speech perception David Poeppel Cognitive Neuroscience of Language Lab Department of Linguistics and Department of Biology Neuroscience and Cognitive Science Program University of Maryland College Park. Colleagues : Allen Braun, NIH Greg Hickok, UC Irvine

Download Presentation

Colleagues : Allen Braun, NIH Greg Hickok, UC Irvine Jonathan Simon, Univ. Maryland

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Colleagues allen braun nih greg hickok uc irvine jonathan simon univ maryland

A brain’s-eye-view of speech perception

David Poeppel

Cognitive Neuroscience of Language Lab

Department of Linguistics and Department of Biology

Neuroscience and Cognitive Science Program

University of Maryland College Park

Colleagues:

  • Allen Braun, NIH

  • Greg Hickok, UC Irvine

  • Jonathan Simon, Univ. Maryland

Students:

  • Anthony Boemio

  • Maria Chait

  • Huan Luo

  • Virginie van Wassenhove


Colleagues allen braun nih greg hickok uc irvine jonathan simon univ maryland

“chair”

“uncomfortable”

“lunch”

“soon”

encoding ?

Is this a hard problem?

Yes!

If it could be solved straightforwardly

(e.g. by machine), Mark Liberman

would be in Tahiti having cold beers.

representation ?


Colleagues allen braun nih greg hickok uc irvine jonathan simon univ maryland

Outline

(1) Fractionating the problem in space:

Towards a functional anatomy of speech perception

Fractionating the problem in time:

Towards a functional physiology of speech perception

- A hypothesis about the quantization of time

- Psychophysical evidence for temporal integration

- Imaging evidence


Colleagues allen braun nih greg hickok uc irvine jonathan simon univ maryland

interface with

lexical items,

word recognition


Colleagues allen braun nih greg hickok uc irvine jonathan simon univ maryland

interface with

lexical items,

word recognition

hypothesis about storage:

distinctive features

[-voice] [+voice] [+voice]

[+labial] [+high] [+labial]

[-round] [+round] [-round]

[….] [….] [….]


Colleagues allen braun nih greg hickok uc irvine jonathan simon univ maryland

production,

articulation of

speech

interface with

lexical items,

word recognition

hypothesis about storage:

distinctive features

[-voice] [+voice] [+voice]

[+labial] [+high] [+labial]

[-round] [+round] [-round]

[….] [….] [….]


Colleagues allen braun nih greg hickok uc irvine jonathan simon univ maryland

production,

articulation of

speech

hypothesis about production:

distinctive features

[-voice] [+voice]

[+labial] [+high]

[….] [….]

interface with

lexical items,

word recognition

hypothesis about storage:

distinctive features

[-voice] [+voice] [+voice]

[+labial] [+high] [+labial]

[-round] [+round] [-round]

[….] [….] [….]


Colleagues allen braun nih greg hickok uc irvine jonathan simon univ maryland

production,

articulation of

speech

FEATURES

analysis of auditory

signal  spectro-temporal rep. 

FEATURES

interface with

lexical items,

wordrecognition

FEATURES


Colleagues allen braun nih greg hickok uc irvine jonathan simon univ maryland

Unifying concept:

distinctive feature

auditory-motor interface

coordinate transform

from acoustic to

articulatory space

production,

articulation of

speech

analysis of auditory

signal  spectro-temporal rep. 

FEATURES

auditory-lexical interface

interface with

lexical items,

word recognition


Colleagues allen braun nih greg hickok uc irvine jonathan simon univ maryland

coordinate transform

from acoustic to

articulatory space

production,

articulation of

speech

analysis of auditory

signal  spectro-temporal rep. 

FEATURES

interface with

lexical items,

word recognition


Colleagues allen braun nih greg hickok uc irvine jonathan simon univ maryland

pIFG/dPM (left)

articulatory-based

speech codes

Area Spt (left)

auditory-motor interface

STG (bilateral)

acoustic-phonetic

speech codes

pMTG (left)

sound-meaning interface

Hickok & Poeppel (2000), Trends in Cognitive Sciences

Hickok & Poeppel (in press), Cognition


Colleagues allen braun nih greg hickok uc irvine jonathan simon univ maryland

Indefrey & Levelt, in press, Cognition

Meta-analysis of neuroimaging data, perception/production overlap

Shared neural correlates

of word production and

perception processes

Bilat mid/post STG

L anterior STG

L mid/post MTG

L post IFG

  • MTG and IFG overlap when controlling for the overt/covert

  • distinction across tasks

  • Hypothesized functions:

  • lexical selection (MTG)

  • lexical phon. code retr. (MTG)

  • post-lexical syllabification (IFG)


Colleagues allen braun nih greg hickok uc irvine jonathan simon univ maryland

Scott & Johnsrude 2003


Possible subregions of inferior frontal gyrus burton 2001

Possible Subregions of Inferior Frontal GyrusBurton (2001)

Auditory Studies

Burton et al. (2000), Demonet et al. (1992, 1994), Fiez et al, (1995), Zatorre et al., (1992, 1996)

Visual Studies

Sergent et al. (1992, 1993), Poldrack et al., (1999), Paulesu et al. (1993, 1996), Sergent et al., 1993, Shaywitz et al. (1995)


Colleagues allen braun nih greg hickok uc irvine jonathan simon univ maryland

Auditory lexical decision versus FM/sweeps (a), CP/syllables (b), and rest (c)

(a)

(b)

(c)

D. Poeppel et al. (in press)

z=+6

z=+9

z=+12


Colleagues allen braun nih greg hickok uc irvine jonathan simon univ maryland

fMRI (yellow blobs) and MEG (red dots) recordings of speech perception

show pronounced bilateral activation of left and right temporal cortices

T. Roberts & D. Poeppel

(in preparation)


Colleagues allen braun nih greg hickok uc irvine jonathan simon univ maryland

Binder et al. 2000


Colleagues allen braun nih greg hickok uc irvine jonathan simon univ maryland

pIFG/dPM (left)

articulatory-based

speech codes

Area Spt (left)

auditory-motor interface

STG (bilateral)

acoustic-phonetic

speech codes

pMTG (left)

sound-meaning interface

Hickok & Poeppel (2000), Trends in Cognitive Sciences

Hickok & Poeppel (in press), Cognition


Colleagues allen braun nih greg hickok uc irvine jonathan simon univ maryland

Outline

(1) Fractionating the problem in space:

Towards a functional anatomy of speech perception

Fractionating the problem in time:

Towards a functional physiology of speech perception

- A hypothesis about the quantization of time

- Psychophysical evidence for temporal integration

- Imaging evidence


Colleagues allen braun nih greg hickok uc irvine jonathan simon univ maryland

The local/global distinction in vision is intuitively clear

Chuck Close


What information does the brain extract from speech signals

What information does the brain extract from speech signals?


Colleagues allen braun nih greg hickok uc irvine jonathan simon univ maryland

Phenomena at the scale of formant transitions, subsegmental cues

“short stuff” -- order of magnitude 20-50ms

Phenomena at the scale of syllables (tonality and prosody)

“long stuff” -- order of magnitude 150-250ms

Acoustic and articulatory phonetic phenomena

occur on different time scales

fine

structure

envelope


Colleagues allen braun nih greg hickok uc irvine jonathan simon univ maryland

Does different granularity in time matter?

Segmental and subsegmental information

serial order in speechfool/flu

carp/crap

bat/tab

Supra-segmental information

prosodySleep during lecture!

Sleep during lecture?


Colleagues allen braun nih greg hickok uc irvine jonathan simon univ maryland

The local/global distinction can be conceptualized as a multi-resolution analysis in time

Further processing

Binding process

Supra-segmental information

(time ~200ms)

Segmental information

(time ~20-50ms)

syllabicity

metrics

tone

features, segments


Colleagues allen braun nih greg hickok uc irvine jonathan simon univ maryland

Outline

(1) Fractionating the problem in space:

Towards a functional anatomy of speech perception

Fractionating the problem in time:

Towards a functional physiology of speech perception

- A hypothesis about the quantization of time

- Psychophysical evidence for temporal integration

- Imaging evidence


Colleagues allen braun nih greg hickok uc irvine jonathan simon univ maryland

Temporal integration windows

Psychophysical and electrophysiologic evidence suggests

that perceptual information is integrated and analysed in

temporal integration windows (v. Bekesy 1933; Stevens and

Hall 1966; Näätänen 1992; Theunissen and Miller 1995; etc).

The importance of the concept of a temporal integration

window is that it suggests the discontinuous processing

of information in the time domain. The CNS, on this view,

treats time not as a continuous variable but as a series of

temporal windows, and extracts data from a given window.

arrow of time, physics

arrow of time, Central Nervous System


Colleagues allen braun nih greg hickok uc irvine jonathan simon univ maryland

25ms

short temporal

integration

windows

long temporal

integration

windows

200ms

Asymmetric sampling/quantization of the speech waveform

This p a p er i s h ar d tp u b l i sh


Colleagues allen braun nih greg hickok uc irvine jonathan simon univ maryland

Two spectrograms of the same word illustrate how different

analysis windows highlight different aspects of the sounds.

(a) high time resolution - each glottal pulse visible as vertical striation

(b) high frequency resolution - each harmonic visible as horizontal stripe

(a)

High time,

low frequ.-

resolution

(b)

Low time,

high frequ.-

resolution


Colleagues allen braun nih greg hickok uc irvine jonathan simon univ maryland

Hypothesis: Asymmetric Sampling in Time (AST)

Left temporal cortical areas preferentially extract

information over 25ms temporal integration windows.

Right hemisphere areas preferentially integrate over

long, 150-250ms integration windows.

By assumption, the auditory input signal

has a neural representation that is bilaterally symmetric

(e.g. at the level of core); beyond the initial representation,

the signal is elaborated asymmetrically in the time

domain.

Another way to cocneptualize the AST proposal is to say that

the sampling rate of non-primary auditory areas is

different, with LH sampling at high frequencies (~40Hz)

and RH sampling at low frequencies (4-10Hz).


Colleagues allen braun nih greg hickok uc irvine jonathan simon univ maryland

e.g.

e.g.

intonation contours

formant transitions

b. Functional lateralization

Analyses

requiring high

temporal

resolution

Analyses

requiring high

spectral

resolution

LH

RH

a. Physiological lateralization

Symmetric representation of spectro-temporal receptive fields in primary auditory cortex

Temporally asymmetric elaboration of perceptual representations in non-primary cortex

LH

RH

Proportion of

neuronal ensembles

25

[40Hz 4Hz]

250

25

[40Hz 4Hz]

250

Size of temporal integration windows (ms)

[Associated oscillatory frequency (Hz)]


Colleagues allen braun nih greg hickok uc irvine jonathan simon univ maryland

Asymmetric sampling in time (AST) characteristics

• AST is an example of functional segregation, a standard concept.

• AST is an example of multi-resolution analysis, a signal processing

strategy common in other cortical domains (cf. visual areas MT and V4

which, among other differences, have phasic versus tonic firing properties,

respectively).

• AST speaks to the “granularity” of perceptual representations:

the model suggests that there exist basic perceptual representations

that correspond to the different temporal windows (e.g. featural info is

equally basic to the envelope of syllables, on this view).

• The AST model connects in plausible ways to the local versus global

distinction: there are multiple representations of a given signal

on different scales (cf. wavelets)

Global ==> ‘large-chunk’ analysis, e.g., syllabic level

Local ==> ‘small-chunk’ analysis, e.g., subsegmental level


Colleagues allen braun nih greg hickok uc irvine jonathan simon univ maryland

e.g.

e.g.

intonation contours

formant transitions

a. Physiological lateralization

Symmetric representation of spectro-temporal receptive fields in primary auditory cortex

Temporally asymmetric elaboration of perceptual representations in non-primary cortex

LH

RH

Proportion of

neuronal ensembles

25

[40Hz 4Hz]

250

25

[40Hz 4Hz]

250

Size of temporal integration windows (ms)

[Associated oscillatory frequency (Hz)]

b. Functional lateralization

Analyses

requiring high

temporal

resolution

Analyses

requiring high

spectral

resolution

LH

RH


Colleagues allen braun nih greg hickok uc irvine jonathan simon univ maryland

Outline

(1) Fractionating the problem in space:

Towards a functional anatomy of speech perception

Fractionating the problem in time:

Towards a functional physiology of speech perception

- A hypothesis about the quantization of time

AST model

- Psychophysical evidence for temporal integration

- Imaging evidence


Perception of fm sweeps huan luo mike gordon anthony boemio david poeppel

Perception of FM sweepsHuan Luo, Mike Gordon, Anthony Boemio,David Poeppel


Fm sweep example

FM Sweep Example

waveform

80msec, from 3-2 kHz, linear FM sweep

spectrogram


The rationale

The rationale

  • Important cues for speech perception:

    Formant transition in speech sounds

    (For example, F2 direction can distinguish /ba/ from /da/)

  • Importance in tone languages

  • Vertebrate auditory system is well equipped to analyze FM signals.


Tone languages

Tone languages

  • For example, Chinese, Thai…

  • The direction of FM (of the fundamental frequency) is important in the language to make lexical distinctions.

  • (Four tones in Chinese)

    /Ma 1/, /Ma 2/ , /Ma 3/, /Ma 4/


Questions

Questions

  • How good are we at discriminating these signals?

    determine the threshold of the duration of stimuli (corresponding to rate) for the detection of FM direction

    Any performance difference between UP and DOWN detection?

  • Will language experience affect the performance of such a basic perceptual ability?


Stimuli

Stimuli

  • Linearly frequency modulated

  • Frequency range studied: 2-3 kHz (0.5 oct)

  • Two directions (Up / Down )

  • Changing FM rate (frequency range/time) by changing duration. For each frequency range, frequency span is kept constant (slow / Fast )

  • Stimuli duration: from 5msec(100 oct/sec) to 640 msec (0.8 oct/sec)

Tasks

• Detection and discrimination of UP versus DOWN

• 2 AFC, 2IFC, 3IFC


Colleagues allen braun nih greg hickok uc irvine jonathan simon univ maryland

  • English speakers

  • • 3 frequency ranges relevant to speech

  • (approximately F1, F2, F3 ranges)

  • • single-interval 2-AFC

  • Two main findings:

  • • threshold for UP at 20ms

  • • UP better than DOWN

2-3 kHz

1-1.5 kHz

600-900Hz

Gordon & Poeppel (2001), JASA-ARLO


Colleagues allen braun nih greg hickok uc irvine jonathan simon univ maryland

2IFC

  • To eliminate the possibility of bias strategy subjects can use

  • To see whether the asymmetric performance of English subjects is due to their “Up preference bias”

Same duration of the two sounds, so the only difference is direction

Interval 1

Interval 2

UP

Down

Which interval (1 or 2) contains certain direction sound?


Results for chinese subjects

Results for Chinese Subjects

no significant difference

Threshold for both UP and DOWN is about

20 msec


Colleagues allen braun nih greg hickok uc irvine jonathan simon univ maryland

Results for English Subjects

No difference now between UP and DOWN

Threshold for both at 20msec

No difference between Chinese and English subjects now.


Colleagues allen braun nih greg hickok uc irvine jonathan simon univ maryland

3IFC

Standard Interval 1 Interval 2

UP

UP

Down

Choose which interval contains DIFFERENT among the three sounds (different quality rather than only direction)


Colleagues allen braun nih greg hickok uc irvine jonathan simon univ maryland

3 IFC versus 2 IFC

No difference between Chinese and English subjects

Threshold confirmed at 20ms


Conclusion

Conclusion

  • Importance of 20 msec as the threshold for discrimination of FM sweeps

    - corresponds to temporal order threshold determined by Hirsh 1959

    - consistent with Schouten 1985, 1989 testing FM sweeps

    - this basic threshold arguably reflects the shortest integration window that generates robust auditory percepts.


Colleagues allen braun nih greg hickok uc irvine jonathan simon univ maryland

Click trains

Anthony Boemio & David Poeppel


Click stimuli

Click Stimuli


Psychophysics

Psychophysics


Auditory visual integration the mcgurk effect virginie van wassenhove ken grant david poeppel

Auditory visual integration: the McGurk effectVirginie van Wassenhove, Ken Grant,David Poeppel


Mcgurk effect

McGurk Effect

  • Audiovisual (AV) token

  • Visual (V) token

  • Auditory (A) token


Identification task 3afc a p v k

Identification Task (3AFC) ApVk

TWI

True bimodal responses

Response rate as a function of SOA (ms) in the ApVk McGurk pair.

Mean responses (N=21) and standard errors. Fusion rate (open red squares) and corrected fusion rate (filled red squares, dotted line) are /ta/ responses, visually driven responses (open green triangles) are /ka/, and auditorily driven responses (filled blue circles) are /pa/. A negative value in corrected fusion rate is interpreted as a visually dominated error response /ta/.


Simultaneity judgment task 2afc a p v k vs a t v t and a b v g vs a d v d

Simultaneity Judgment Task (2AFC) ApVk vs. AtVt and AbVg vs. AdVd

Simultaneity judgment task.

Simultaneity judgment as a function of SOA (ms) in both incongruent and congruent conditions (ApVk and AtVt N=21; AbVg and AdVd N=18). The congruent conditions (open symbols) are associated with broader and higher simultaneity judgment profile than the incongruent conditions (filled symbols).


Temporal window of integration twi across tasks and bimodal speech stimuli

Stimulus

Task

A Lead, Left Boundary (ms)

A Lag, Right Boundary (ms)

Plateau Center (ms)

Window Size (ms)

ApVk

ID

-25

+136

+56

161

S

-44

+117

+37

161

AtVt

S

-80

+125

+23

205

AbVg

ID

-34

+174

+70

208

S

-37

+122

+43

159

AdVd

S

-74

+131

+29

205

Temporal Window of Integration (TWI) across Tasks and Bimodal Speech Stimuli


Colleagues allen braun nih greg hickok uc irvine jonathan simon univ maryland

Outline

(1) Fractionating the problem in space:

Towards a functional anatomy of speech perception

Fractionating the problem in time:

Towards a functional physiology of speech perception

- A hypothesis about the quantization of time

•AST model

- Psychophysical evidence for temporal integration

• FM sweeps and click trains: 20-30ms integration

• AV processing in McGurk: 200ms integration

- Imaging evidence


Binding of temporal quanta in speech processing

Binding of Temporal Quanta in Speech Processing

Maria Chait, Steven Greenberg, Takayuki Arai, David Poeppel


Colleagues allen braun nih greg hickok uc irvine jonathan simon univ maryland

Multi Resolution Analysis Hypothesis

“SYLLABLE”

Binding process

Supra- segmental information

(t.s ~300 ms)

(Sub)-segmental information

(t.s ~30 ms)

stress

tone

syllabicity

feature


Colleagues allen braun nih greg hickok uc irvine jonathan simon univ maryland

Signal Processing:

5045-6000 Hz

Low Pass E14 (0-3 Hz)

E14×FS14

E14, FS14

S_low

Low Pass E2 (0-3 Hz)

265-315Hz

E2, FS2

E2×FS2

Low Pass E1 (0-3 Hz)

0-265Hz

E1, FS1

E1×FS1

Filtering

Computing the Envelope and fine Structure

Low Pass Filter

Multiply E by FS

5045-6000 Hz

High Pass E14 (22- Hz)

E14×FS14

Original

E14, FS14

S_high

High Pass E2 (22- Hz)

265-315Hz

E2, FS2

E2×FS2

High Pass E1 (22- Hz)

0-265Hz

E1, FS1

E1×FS1


Colleagues allen braun nih greg hickok uc irvine jonathan simon univ maryland

  • 0-6 khz

  • 14 channels

  • spaced in 1/3 octave steps along the cochlear frequency map.

  • Every two neighboring channels are separated by 50hz


Envelope extraction

Envelope Extraction

Amplitude

Time


Colleagues allen braun nih greg hickok uc irvine jonathan simon univ maryland

Original Envelope

Low Passed Envelope

High Passed Envelope


Colleagues allen braun nih greg hickok uc irvine jonathan simon univ maryland

Original

High Passed

Low Passed


Evidence

Evidence:

  • Comodulation masking release

  • Ahissar et al. (2001) - Phase locking in the auditory cortex to the envelope of sentence stimuli.

  • Shannon (1995)

  • Drullman (1994):

Effect of low pass filtering the envelope on speech reception:

*severe reduction at 0-2Hz cutoff frequencies

*marginal contribution of frequencies above 16Hz

Effect of High Pass filtering the envelope:

*reduction in speech intelligibility for cutoff frequencies above 64Hz

*no reduction in sentence intelligibility when only frequencies below 4Hz are reduced


Experiment 1

Presented Dichotically

Experiment 1

Stimuli:

- 53 Sentences from the IEEE corpus.

- Nonsense Syllables (CUNY)

8 Blocks – 2(voiced/voiceless)*2 vowels(/a/,/i/) *2(CV/VC)

- 3 manipulations

0-3 Hz Low Pass

22-40 Hz Band Pass

0-3 and 22-40 Hz

Each subject hears all 53 sentences but only one manipulation

per sentence. A practice block of 26 sentences precedes

the experiment.

Task:

  • Sentences: subjects asked to write down what they heard as precisely as they can

  • Syllables: 7-alternative forced choice


Results

Results

high-pass


Results1

Results

low-pass

high-pass


Results2

Results

high-pass

plus

low-pass?

low-pass

high-pass


Results3

Results

low-pass

high-pass

high-pass

plus

low-pass?

Result reflects the interaction between

information carried on the short and

long time scales.


Colleagues allen braun nih greg hickok uc irvine jonathan simon univ maryland

Outline

(1) Fractionating the problem in space:

Towards a functional anatomy of speech perception

Fractionating the problem in time:

Towards a functional physiology of speech perception

- A hypothesis about the quantization of time

•AST model

- Psychophysical evidence for temporal integration

• FM sweeps and click trains: 20-30ms integration

• AV processing in McGurk: 200ms integration

• Interaction of temporal windows

- Imaging evidence


Colleagues allen braun nih greg hickok uc irvine jonathan simon univ maryland

  • fMRI study of temporal

  • structure in concatenated FMs

  • Anthony Boemio, Allen Braun,

  • Steven Fromm, David Poeppel


Colleagues allen braun nih greg hickok uc irvine jonathan simon univ maryland

Stimulus Properties


Colleagues allen braun nih greg hickok uc irvine jonathan simon univ maryland

Stimulus Properties

Spectrograms

PSDs

Ampl. vs. Time

FM Stimulus

TONE Stimulus

CNST Stimulus

1

1E-10

100

Frequency (Hz)

1E4

0

Time (sec)

1

All 13 stimuli have nearly identical long-term spectra and RMS power over the entire 9-second stimulus duration. Stimuli differ only in segment duration which was determined by drawing from a Gaussian distribution (previous panel), with means of 12, 25, 45, 85, 160, and 300ms.


Colleagues allen braun nih greg hickok uc irvine jonathan simon univ maryland

fMRI

• Single-trial sparse acquisition paradigm

(clustered volume acqu.)

• 1.5T GE Signa, echo-planar sequence

• 11.4s TR (9s signal,

2.4s volume), TE 40ms

• 24 reps/condition

• SPM 99 random-effects

Model, p<0.05 corrected


Colleagues allen braun nih greg hickok uc irvine jonathan simon univ maryland

SPM 99 Cohort AnalysisFMs-CNST Categorical Contrasts (p < 0.05 corr.)


Colleagues allen braun nih greg hickok uc irvine jonathan simon univ maryland

Hemodynamic response/stimulus modelNot all segment transitions are equal.

Segment

Segment Transition

FM/TONE

CNST

Only 1second

of stimuli are

shown for

clarity

acquisition

Including the segment transitions and segments themselves, but assuming that transitions between long segments contribute more to the response than shorter ones produces the observed activation vs. segment-duration relation (left).

threshold set by

categorical contrast

to CNST stimulus-–

anything below this

level will be zero in

the SPM


Colleagues allen braun nih greg hickok uc irvine jonathan simon univ maryland

  • MEG study of spectral responses

  • to complex sounds

  • David Poeppel, Huan Luo, Dana Ritter, Anthony Boemio,

  • Didier Depireux, Jonathan Simon


Colleagues allen braun nih greg hickok uc irvine jonathan simon univ maryland

Asymmetric sampling in time (AST) hypothesis

predicts electrophysiological asymmetries

in specific frequency bands,

gamma (25-55Hz) and theta (3-8Hz) ….

… because the hypothesized temporal quantization

is reflected as oscillatory activity.

LH

RH

Sensitivity of

neuronal ensembles

25250

[40Hz 4Hz]

25250

[40Hz 4Hz]

Size of temporal integration windows (ms)

[Associated oscillatory frequency (Hz)]


Flow chart

Gamma BandPass Filter

Theta BandPass Filter

Flow chart

Gamma for LH

LH

RMS

Gamma for RH

Theta for LH

RH

RMS

Theta for RH


Colleagues allen braun nih greg hickok uc irvine jonathan simon univ maryland

Multi-taper

spectral analysis


Result

Result


Power ratio in specific frequency bands

Power ratio in specific frequency bands

(P(L)/(P(L)+P(R)))

  • The difference is much greater in Theta band (low frequency band) and RH activation in Theta band is greater than LH


Distribution of spectral responses

Distribution of spectral responses


Colleagues allen braun nih greg hickok uc irvine jonathan simon univ maryland

Outline

(1) Fractionating the problem in space:

Towards a functional anatomy of speech perception

Fractionating the problem in time:

Towards a functional physiology of speech perception

- A hypothesis about the quantization of time

•AST model

- Psychophysical evidence for temporal integration

• FM sweeps and click trains: 20-30ms integration

• AV processing in McGurk: 200ms integration

• Interaction of temporal windows

- Imaging evidence

•fMRI: temporal sensitivity and lateralization

• MEG spectral lateralization


Colleagues allen braun nih greg hickok uc irvine jonathan simon univ maryland

pIFG/dPM (left)

articulatory-based

speech codes

Area Spt (left)

auditory-motor interface

STG (bilateral)

acoustic-phonetic

speech codes

pMTG (left)

sound-meaning interface

Hickok & Poeppel (2000), Trends in Cognitive Sciences

Hickok & Poeppel (in press), Cognition


Colleagues allen braun nih greg hickok uc irvine jonathan simon univ maryland

e.g.

e.g.

intonation contours

formant transitions

Asymmetric sampling in time (AST) builds on

anatomical symmetry but permits functional asymmetry

a. Physiological lateralization

Symmetric representation of spectro-temporal receptive fields in primary auditory cortex

Temporally asymmetric elaboration of perceptual representations in non-primary cortex

LH

RH

Proportion of

neuronal ensembles

25

[40Hz 4Hz]

250

25

[40Hz 4Hz]

250

Size of temporal integration windows (ms)

[Associated oscillatory frequency (Hz)]

b. Functional lateralization

Analyses

requiring high

temporal

resolution

Analyses

requiring high

spectral

resolution

LH

RH


Colleagues allen braun nih greg hickok uc irvine jonathan simon univ maryland

Conclusion

The input signal (e.g. speech) must interface with

higher-order symbolic representations of different types

(e.g. segmental representations relevant to lexical access

and supra-segmental representations relevant to

interpretation).

These higher-order representation categories appear to be

lateralized (e.g. segmental phonology/LH, phrasal prosody/RH).

The timing-based asymmetry provides a possible cortical

‘logistical’ or ‘administrative’ device that helps create

representations of the appropriate granularity.

If this is on the right track, syllable is - at least for perception -

as elementary a unit as feature/segment. Both are basic.


Colleagues allen braun nih greg hickok uc irvine jonathan simon univ maryland

Long-term memory:

Abstract lexical repr.

contextual

information

Recoding

Synthesis

Where do the candidates

for synthesis come from?

acoustic-phonetic

manifestations of

words

Peripheral auditory

processing

Segmentation and

labeling

MATCHING

PROCESS

Lexical access

code

spectral

representation

BEST LEXICAL

CANDIDATE

Analysis

Analysis-by-synthesis I

Hypothesize- and

test models


Colleagues allen braun nih greg hickok uc irvine jonathan simon univ maryland

Analysis-by-synthesis II

Analysis-by-synthesis model of lexical hypothesis generation and

verification (adapted and extended from Klatt, 1979)

analysis-by-synthesis

verification;

“internal forward model”

best- scoring

lexical candidates

peripheral

and central

‘neurogram’

partial

feature

matrix

lexical

hypotheses

spectral

analysis

segmental

analysis

lexical

search

synt./seman.

analysis

speech

waveform

predicted

subsequent items

acceptable

word string


Colleagues allen braun nih greg hickok uc irvine jonathan simon univ maryland

frontal areas (articulatory codes) - l IFG, premotor

temporo-parietal areas?

auditory

cortex

pSTG?

MTG?

ITG?

Analysis-by-synthesis III

analysis-by-synthesis

verification;

“internal forward model”

best- scoring

lexical candidates

peripheral

and central

‘neurogram’

partial

feature

matrix

lexical

hypotheses

spectral

analysis

segmental

analysis

lexical

search

synt./seman.

analysis

speech

waveform

predicted

subsequent items

acceptable

word string


  • Login