Multivariate models for fmri data
This presentation is the property of its rightful owner.
Sponsored Links
1 / 74

Multivariate models for fMRI data PowerPoint PPT Presentation


  • 107 Views
  • Uploaded on
  • Presentation posted in: General

Klaas Enno Stephan (with 90% of slides kindly contributed by Kay H. Brodersen ) Translational Neuromodeling Unit (TNU) Institute for Biomedical Engineering University of Zurich & ETH Zurich. Multivariate models for fMRI data. Why multivariate?.

Download Presentation

Multivariate models for fMRI data

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Multivariate models for fmri data

Klaas Enno Stephan

(with 90% of slides kindly contributed by Kay H. Brodersen)

Translational Neuromodeling Unit (TNU)Institute for Biomedical EngineeringUniversity of Zurich & ETH Zurich

Multivariate models for fMRI data


Why multivariate

Why multivariate?

Univariate approaches are excellent for localizing activations in individual voxels.

n.s.

*

v1

v2

v1

v2

reward

no reward


Why multivariate1

Why multivariate?

Multivariate approaches can be used to examine responses that are jointly encoded in multiple voxels.

n.s.

n.s.

v1

v2

v1

v2

v2

v1

orange juice

apple juice


Why multivariate2

Why multivariate?

modulatoryinput

t

drivinginput

t

Multivariate approaches can utilize ‘hidden’ quantities such as coupling strengths.

signal

signal

signal

observed BOLD signal

activity

activity

activity

hidden underlying neural activity and coupling strengths

Friston, Harrison & Penny (2003) NeuroImage; Stephan & Friston (2007) Handbook of Brain Connectivity; Stephan et al. (2008) NeuroImage


Overview

Overview

1Modelling principles

2Classification

3Multivariate Bayes

4Generative embedding


Overview1

Overview

1Modelling principles

2Classification

3Multivariate Bayes

4Generative embedding


Encoding vs decoding

Encoding vs. decoding

encoding model

decoding model

condition

stimulus

response

prediction error

context (cause or consequence)

BOLD signal


Regression vs classification

Regression vs. classification

  • Regression model

independentvariables

(regressors)

continuousdependent variable

  • Classification model

independentvariables

(features)

categoricaldependent variable

(label)

vs.


Univariate vs multivariate models

Univariate vs. multivariate models

A univariate model considers a single voxel at a time.

A multivariate model considers many voxels at once.

context

BOLD signal

context

BOLD signal

Spatial dependencies between voxels are only introduced afterwards, through random field theory.

Multivariate models enable inferences on distributed responses without requiring focal activations.


Prediction vs inference

Prediction vs. inference

The goal of prediction is to find a generalisable encoding or decoding function.

  • The goal of inference is to decide between competing hypotheses.

  • comparing a model that links distributed neuronal activity to a cognitive state with a model that does not

  • weighing the evidence forsparse vs. distributed coding

predicting a cognitive state using abrain-machine interface

predicting asubject-specific diagnostic status

predictive density

marginal likelihood (model evidence)


Goodness of fit vs complexity

Goodness of fit vs. complexity

  • Goodness of fit is the degree to which a model explains observed data.

  • Complexity is the flexibility of a model (including, but not limited to, its number of parameters).

1 parameter

4 parameters

9 parameters

truth

data

model

underfitting

optimal

overfitting

  • We wish to find the model that optimally trades off goodness of fit and complexity.

Bishop (2007) PRML


Overview2

Overview

1Modelling principles

2Classification

3Multivariate Bayes

4Generative embedding


Constructing a classifier

Constructing a classifier

A principled way of designing a classifier would be to adopt a probabilistic approach:

that which maximizes

In practice, classifiers differ in terms of how strictly they implement this principle.

  • Generative classifiers

  • use Bayes’ rule to estimate

  • Gaussian naïve Bayes

  • Linear discriminant analysis

  • Discriminative classifiers

  • estimate directlywithout Bayes’ theorem

  • Logistic regression

  • Relevance vector machine

  • Gaussian process classifier

  • Discriminant classifiers

  • estimate directly

  • Fisher’s linear discriminant

  • Support vector machine


Common types of fmri classification studies

Common types of fMRI classification studies

Searchlight approach

A sphere is passed across the brain. At each location, the classifier is evaluated using only the voxels in the current sphere map of t-scores.

Whole-brain approach

A constrained classifier is trained on whole-brain data. Its voxel weights are related to their empirical null distributions using a permutation test map of t-scores.

Nandy & Cordes (2003) MRMKriegeskorte et al. (2006) PNAS

Mourao-Miranda et al. (2005) NeuroImage


Support vector machine svm

Support vector machine (SVM)

Linear SVM

Nonlinear SVM

v2

v1

Vapnik (1999) Springer; Schölkopf et al. (2002) MIT Press


Stages in a classification analysis

Stages in a classification analysis

Feature extraction

Classification using cross-validation

Performance evaluation

Bayesian mixed-effects inference

mixed effects


Feature extraction for trial by trial classification

Feature extraction for trial-by-trial classification

We can obtain trial-wise estimates of neural activity by filtering the data with a GLM.

data

design matrix

coefficients

Boxcar regressor for trial 2

Estimate of this coefficient reflects activity on trial 2


Cross validation

Cross-validation

The generalization ability of a classifier can be estimated using a resampling procedure known as cross-validation. One example is 2-fold cross-validation:

examples

1

?

training example

2

?

test examples

?

3

?

...

...

99

?

100

?

1

2

folds

performance evaluation


Cross validation1

Cross-validation

A more commonly used variant is leave-one-out cross-validation.

examples

1

?

training example

2

?

test example

?

3

?

...

...

...

...

...

...

99

?

100

?

1

98

99

2

100

folds

performance evaluation


Performance evaluation

Performance evaluation

Single-subject study with trials

The most common approach is to assess how likely the obtained number of correctly classified trials could have occurred by chance.

subject

Binomial test

In MATLAB:

p = 1 - binocdf(k,n,pi_0)

number of correctly classified trials

total number of trials

chance level (typically 0.5)

binomial cumulative density function

trial 1

trial

-

+

-

-

+


Performance evaluation1

Performance evaluation

population

subject 1

subject 2

subject 3

subject 4

subject

trial 1

trial

-

+

-

-

+


Performance evaluation2

Performance evaluation

Group study with subjects, trials each

In a group setting, we must account for both within-subjects (fixed-effects) and between-subjects (random-effects) variance components.

Binomial test on concatenated data

Binomial test on averaged data

t-test onsummary statistics

Bayesian mixed-effects inference

available for MATLAB and R

fixed effects

fixed effects

random effects

mixed effects

sample mean of sample accuracieschance level (typically 0.5)

sample standard deviationcumulative Student’s -distribution

Brodersen, Mathys, Chumbley, Daunizeau, Ong, Buhmann, Stephan (2012) JMLR

Brodersen, Daunizeau, Mathys, Chumbley, Buhmann, Stephan (2013) NeuroImage


Research questions for classification

Research questions for classification

Overall classification accuracy

Spatial deployment of discriminative regions

accuracy

80%

100 %

50 %

Healthy or ill?

Left or right button?

Truthorlie?

55%

classification task

Temporal evolution of discriminability

Model-based classification

accuracy

Participant indicates decision

100 %

{ group 1, group 2 }

50 %

Accuracy rises above chance

within-trial time

Pereira et al. (2009) NeuroImage, Brodersen et al. (2009) The New Collection


Potential problems

Potential problems

  • Multivariate classification studies conduct group tests on single-subject summary statistics that

    • discard the sign or direction of underlying effects

    • do not necessarily take into account confounding effects (e.g. correlation of task conditions with difficulty etc.)

  • Therefore, in some analyses confounds rather than distributed representations may have produced positive results.

Todd et al. 2013, NeuroImage


Potential problems1

Potential problems

  • Simulation: Experiment condition (rule A vs. rule B) does not affect voxel activity, but difficulty does. Moreover, experiment condition and difficulty are confounded at the individual-subject level in random directions across 500 subjects.

Todd et al. 2013, NeuroImage


Potential problems2

Potential problems

  • Empiricalexample: searchlightanalysis: fMRI study on rule representations (flexible stimulus–responsemappings

  • Standard MVPA: rule representations in prefrontal regions.

  • GLM: no significant results.

  • Controlling for a variable that is confounded with rule at the individual-subject level but not the group level (reaction time differences across rules) eliminates the MVPA results.

Todd et al. 2013, NeuroImage


Overview3

Overview

1Modelling principles

2Classification

3Multivariate Bayes

4Generative embedding


Multivariate bayes

Multivariate Bayes

SPM brings multivariate analyses into the conventional inference framework of hierarchical Bayesian models and their inversion.

Mike West


Multivariate bayes1

Multivariate Bayes

Multivariate analyses in SPM rest on the central notion that inferences about how the brain represents things can be reduced to model comparison.

vs.

decoding model

some cause or consequence

sparse coding in orbitofrontal cortex

distributed coding in prefrontal cortex


From encoding to decoding

From encoding to decoding

Encoding model: GLM

Decoding model: MVB


Multivariate bayes2

Multivariate Bayes

Friston et al. 2008 NeuroImage


Mvb details

MVB: details

  • Encoding model:

    • neuronal activity = linear mixtureofcauses:

      • A=X𝛽

    • BOLD datamatrixis a temporal convolutionofunderlying neuronal activity:

      • 𝑌=𝑇A+𝐺𝛾+𝜀

  • Decoding model:

    • mental stateis a linear mixtureofvoxel-wiseactivity: 𝑋=A𝛽

      • thusA=X𝛽-1

    • substitutioninto 𝑌 gives:

      • Y=𝑇X𝛽-1 +𝐺𝛾+𝜀 ⇒ Y𝛽=𝑇X+𝐺𝛾𝛽+𝜀𝛽 ⇒ 𝑇X= Y𝛽-𝐺𝛾𝛽-𝜀𝛽

  • Importantly, wecanremovetheinfluenceofconfoundsbypremultiplyingwiththeappropriate residual formingmatrix:

    • RTX= RY𝛽+𝜍


Mvb details1

MVB: details

  • Tomakeinversiontractable:

    • imposingpriors on 𝛽 (i.e., assumptionabouthow mental statesarerepresentedbyvoxel-wiseactivity)

  • Scientific inquiry:

    • modelselection: comparing different priorstoidentifywhich neuronal representationof mental statesismost plausible


Specifying the prior for mvb

Specifying the prior for MVB

To make the ill-posed regression problem tractable, MVB uses a prior on voxel weights. Different priors reflect different anatomical and/or coding hypotheses.

For example:

patterns

Voxel 2 is allowed to play a role.

Voxel 3 is allowed to play a role, but only if its neighbours play similar roles.

voxels

Friston et al. 2008 NeuroImage


Example decoding motion from visual cortex

Example: decoding motion from visual cortex

photic

motion

attention

const

MVB can be illustrated using SPM’s attention-to-motion example dataset.

This dataset is based on a simple block design. There are three experimental factors:

  • photic– display shows random dots

  • motion– dots are moving

  • attention– subjects asked to pay attention

During these scans, for example, subjects were passively viewing moving dots.

scans

Büchel& Friston 1999 Cerebral CortexFriston et al. 2008 NeuroImage


Multivariate bayes in spm

Multivariate Bayes in SPM

Step 1

After having specified and estimated a model, use the Results button.

Step 2

Select the contrast to be decoded.


Multivariate bayes in spm1

Multivariate Bayes in SPM

Step 3

Pick a region of interest.


Multivariate bayes in spm2

Multivariate Bayes in SPM

Step 5

Here, the region of interest is specified as a sphere around the cursor. The spatial prior implements a sparse coding hypothesis.

anatomical hypothesis

coding hypothesis

Step 4

Multivariate Bayes can be invoked from within the Multivariate section.


Multivariate bayes in spm3

Multivariate Bayes in SPM

Step 6

Results can be displayed using the BMS button.


Model evidence and voxel weights

Model evidence and voxel weights

log BF = 3


Summary research questions for mvb

Summary: research questions for MVB

How does the brain represent things?

Evaluating competing coding hypotheses

Where does the brain represent things?

Evaluating competing anatomical hypotheses


Overview4

Overview

1Modelling principles

2Classification

3Multivariate Bayes

4Generative embedding


Model based classification

Model-based classification

A

step 1 —

modelling

step 2 —

embedding

A → B

A → C

B → B

B → C

B

C

measurements from an individual subject

subject-specificgenerative model

subject representation in model-based feature space

1

A

step 3 —

classification

step 4 —

evaluation

step 5 —

interpretation

accuracy

B

C

0

discriminability of groups?

jointly discriminative

connection strengths?

classification model

Brodersen, Haiss, Ong, Jung, Tittgemeyer, Buhmann, Weber, Stephan (2011) NeuroImageBrodersen, Schofield, Leff, Ong, Lomakina, Buhmann, Stephan (2011) PLoS Comput Biol


Model based classification model specification

Model-based classification: model specification

planumtemporale

planumtemporale

Heschl’sgyrus(A1)

Heschl’sgyrus(A1)

L

R

medialgeniculatebody

medialgeniculatebody

y = –26 mm

stimulus input

anatomicalregions of interest


Model based classification1

Model-based classification

*

100%

90%

80%

balanced accuracy

patients

controls

70%

60%

n.s.

n.s.

50%

a

c

s

p

m

e

z

o

f

l

r

activation-

based

correlation-

based

model-

based


Generative embedding

Generative embedding

patients

controls

Voxel-based activity space

Model-based parameter space

generative embedding

Voxel 3

Parameter 3

Voxel 2

Parameter 2

Voxel 1

Parameter 1

classification accuracy75%

classification accuracy98%


Model based classification interpretation

Model-based classification: interpretation

PT

PT

HG(A1)

HG(A1)

L

R

MGB

MGB

stimulus input


Model based classification interpretation1

Model-based classification: interpretation

PT

PT

HG(A1)

HG(A1)

L

R

MGB

MGB

stimulus input

highly discriminative

somewhat discriminative

not discriminative


Model based clustering

Model-based clustering

42 patients diagnosedwith schizophrenia

fMRI data acquired during working-memory task & modelled using a three-region DCM

WM

PC

dLPFC

41 healthy controls

VC

stimulus

Deserno, Sterzer, Wüstenberg, Heinz, & Schlagenhauf (2012) J Neurosci


Model based clustering1

Model-based clustering

model selection

interpretation

validation

Brodersen et al. 2014, NeuroImage: Clinical


Summary

Summary

Classification

  • to assess whether a cognitive state is linked to patterns of activity

  • to visualize the spatial deployment of discriminative activity

    Multivariate Bayes

  • to evaluate competing anatomical hypotheses

  • to evaluate competing coding hypotheses

    Generative embedding

  • to assess whether groups differ in terms of model parameter estimates (connectivity)

  • to generate mechanistic subgroup hypotheses


Thank you

Thankyou


Further reading

Further reading

Classification

  • Pereira, F., Mitchell, T., & Botvinick, M. (2009). Machine learning classifiers and fMRI: A tutorial overview. NeuroImage, 45(1, Supplement 1), S199-S209.  

  • O'Toole, A. J., Jiang, F., Abdi, H., Penard, N., Dunlop, J. P., & Parent, M. A. (2007). Theoretical, Statistical, and Practical Perspectives on Pattern-based Classification Approaches to the Analysis of Functional Neuroimaging Data. Journal of Cognitive Neuroscience, 19(11), 1735-1752.

  • Haynes, J., & Rees, G. (2006). Decoding mental states from brain activity in humans. Nature Reviews Neuroscience, 7(7), 523-534.

  • Norman, K. A., Polyn, S. M., Detre, G. J., & Haxby, J. V. (2006). Beyond mind-reading: multi-voxel pattern analysis of fMRI data. Trends in Cognitive Sciences, 10(9), 424-30.

  • Brodersen, K. H., Haiss, F., Ong, C., Jung, F., Tittgemeyer, M., Buhmann, J., Weber, B., et al. (2010). Model-based feature construction for multivariate decoding. NeuroImage (2010).

  • Brodersen, K. H., Ong, C., Stephan, K. E., Buhmann, J. (2010). The balanced accuracy and its posterior distribution. ICPR, 3121-3124.

    Multivariate Bayes

  • Friston, K., Chu, C., Mourao-Miranda, J., Hulme, O., Rees, G., Penny, W., et al. (2008). Bayesian decoding of brain images. NeuroImage, 39(1), 181-205.  


Why multivariate3

Why multivariate?

Multivariate approaches can exploit a sampling bias in voxelized imagesto reveal interesting activity on a subvoxel scale.

Boynton (2005) Nature Neuroscience


Why multivariate4

Why multivariate?

The last 10 years have seen a notable increase in decoding analyses in neuroimaging.

PET

prediction

Lautrup et al. (1994) Supercomputing in Brain Research

Haxby et al. (2001) Science


Spatial deployment of informative regions

Spatial deployment of informative regions

Which brain regions are jointly informative of a cognitive state of interest?

Searchlight approach

A sphere is passed across the brain. At each location, the classifier is evaluated using only the voxels in the current sphere map of t-scores.

Whole-brain approach

A constrained classifier is trained on whole-brain data. Its voxel weights are related to their empirical null distributions using a permutation test map of t-scores.

Nandy & Cordes (2003) MRMKriegeskorte et al. (2006) PNAS

Mourao-Miranda et al. (2005) NeuroImage


Issues to be aware of as researcher or reviewer

Issues to be aware of (as researcher or reviewer)

  • Classification induces constraints on the experimental design.

    • When estimating trial-wise Beta values, we need longer ITIs (typically 8 – 15 s).

    • At the same time, we need many trials (typically 100+).

    • Classes should be balanced. If they are imbalanced, we can resample the training set, constrain the classifier, or report the balanced accuracy.

  • Construction of examples

    • Estimation of Beta images is the preferred approach.

    • Covariates should be included in the trial-by-trial design matrix.

  • Temporal autocorrelation

    • In trial-by-trial classification, exclude trials around the test trial from the training set.

  • Avoiding double-dipping

    • Any feature selection and tuning of classifier settings should be carried out on the training set only.

  • Performance evaluation

    • Do random-effects or mixed-effects inference.

    • Correct for multiple tests.


Performance evaluation3

Performance evaluation

  • Evaluating the performance of a classification algorithm critically requires a measure of the degree to which unseen examples have been identified with their correct class labels.

  • The procedure of averaging across accuracies obtained on individual cross-validation folds is flawed in two ways. First, it does not allow for the derivation of a meaningful confidence interval. Second,it leads to an optimistic estimate when a biased classifier is tested on an imbalanced dataset.

  • Both problems can be overcome by replacing the conventional point estimate of accuracy by an estimate of the posterior distribution of the balanced accuracy.

Brodersen, Ong, Buhmann, Stephan (2010) ICPR


Pattern characterization

Pattern characterization

Example – decoding the identity of the person speaking to the subject in the scanner

voxel 1

voxel 2

...

fingerprint plot

(one plot per class)

Formisano et al. (2008) Science


Lessons from the neyman pearson lemma

Lessons from the Neyman-Pearson lemma

Is there a link between and ?

To test for a statistical dependency between a contextual variable and the BOLD signal , we compare

  • : there is no dependency

  • : there is some dependency

    Which statistical test?

  • define a test size (the probability of falsely rejecting , i.e., specificity),

  • choose the test with the highest power (the probability of correctly rejecting , i.e., sensitivity).

The Neyman-Pearson lemma

The most powerful test of size is:to reject when the likelihood ratio exceeds a criticial value ,

with chosen such that

The null distribution of the likelihood ratio can be determined non-parametrically or under parametric assumptions.

This lemma underlies both classical statistics and Bayesian statistics (where is known as a Bayes factor).

Neyman & Person (1933) Phil Trans Roy Soc London


Lessons from the neyman pearson lemma1

Lessons from the Neyman-Pearson lemma

In summary

  • Inference about how the brain represents things reduces to model comparison.

  • To establish that a link exists between some context and activity , the direction of the mapping is not important.

  • Testing the accuracy of a classifier is not based on and is therefore suboptimal.

Neyman & Person (1933) Phil Trans Roy Soc London

Kass & Raftery (1995) J Am Stat Assoc

Friston et al. (2009) NeuroImage


Temporal evolution of discriminability

Temporal evolution of discriminability

Example – decoding which button the subject pressed

classificationaccuracy

motor cortex

decision

response

frontopolar cortex

Soon et al. (2008) Nature Neuroscience


Identification inferring a representational space

Identification / inferring a representational space

Approach

  • estimation of an encoding model

  • nearest-neighbour classification or voting

Mitchell et al. (2008) Science


Reconstruction optimal decoding

Reconstruction / optimal decoding

Approach

  • estimation of an encoding model

  • model inversion

Paninski et al. (2007) Progr Brain Res

  • Pillow et al. (2008) Nature

Miyawaki et al. (2009) Neuron


Recent mvb studies

Recent MVB studies


Specifying the prior for mvb1

Specifying the prior for MVB

1st level – spatial coding hypothesis

patterns

Voxel 2 is allowed to play a role.

Voxel 3 is allowed to play a role, but only if its neighbours play weaker roles.

voxels

2nd level – pattern covariance structure

Thus: and


Inverting the model

Inverting the model

Pattern 3 makes an important positive contribution / is activated.

Model inversion involves finding the posterior distribution over voxel weights .

In MVB, this includes a greedy search for the optimal covariance structure that governs the prior over .

Partition #1

subset

Pattern 8 is allowed to make some contribution, independently of the contributions of other patterns.

Partition #2

subset

subset

Partition #3

(optimal)

subset

subset

subset


Observations vs predictions

Observations vs. predictions

(motion)


Using mvb for point classification

Using MVB for point classification

MVB may outperform conventional point classifiers when using a more appropriate coding hypothesis.

Support Vector Machine


Model based analyses by data representation

Model-based analyses by data representation

Model-basedanalyses

How do patterns ofhidden quantities (e.g., connectivity among brain regions) differ between groups?

Activation-basedanalyses

Which functionaldifferences allow us toseparate groups?

Structure-basedanalyses

Which anatomical structures allow us to separate patients and healthy controls?


Model based clustering2

Model-based clustering

supervised learning:SVM classification

unsupervised learning:GMM clustering (using effective connectivity)

best model

78%

78%

71%

71%

62%

55%

regionalactivity

functional connectivity

effective connectivity

(model selection)

(model validation)

Brodersen, Deserno, Schlagenhauf, Penny, Lin, Gupta, Buhmann, Stephan (in preparation)


Generative embedding and dcm

Generative embedding and DCM

Question 1 – What do the data tell us about hidden processes in the brain?

 compute the posterior

?

Question 2 – Which model is best w.r.t. the observed fMRI data?

 compute the model evidence

?

Question 3 – Which model is best w.r.t. an external criterion?

 compute the classification accuracy

{ patient, control }


Model based classification using dcm

Model-based classification using DCM

model-basedclassification

{ group 1, group 2 }

activation-basedclassification

model selection

vs.

structure-basedclassification

inference onmodelparameters

?


  • Login