Klaas Enno Stephan (with 90% of slides kindly contributed by Kay H. Brodersen ) Translational Neuromodeling Unit (TNU) Institute for Biomedical Engineering University of Zurich & ETH Zurich. Multivariate models for fMRI data. Why multivariate?.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Klaas Enno Stephan
(with 90% of slides kindly contributed by Kay H. Brodersen)
Translational Neuromodeling Unit (TNU)Institute for Biomedical EngineeringUniversity of Zurich & ETH Zurich
Multivariate models for fMRI data
Univariate approaches are excellent for localizing activations in individual voxels.
n.s.
*
v1
v2
v1
v2
reward
no reward
Multivariate approaches can be used to examine responses that are jointly encoded in multiple voxels.
n.s.
n.s.
v1
v2
v1
v2
v2
v1
orange juice
apple juice
modulatoryinput
t
drivinginput
t
Multivariate approaches can utilize ‘hidden’ quantities such as coupling strengths.
signal
signal
signal
observed BOLD signal
activity
activity
activity
hidden underlying neural activity and coupling strengths
Friston, Harrison & Penny (2003) NeuroImage; Stephan & Friston (2007) Handbook of Brain Connectivity; Stephan et al. (2008) NeuroImage
1Modelling principles
2Classification
3Multivariate Bayes
4Generative embedding
1Modelling principles
2Classification
3Multivariate Bayes
4Generative embedding
encoding model
decoding model
condition
stimulus
response
prediction error
context (cause or consequence)
BOLD signal
independentvariables
(regressors)
continuousdependent variable
independentvariables
(features)
categoricaldependent variable
(label)
vs.
A univariate model considers a single voxel at a time.
A multivariate model considers many voxels at once.
context
BOLD signal
context
BOLD signal
Spatial dependencies between voxels are only introduced afterwards, through random field theory.
Multivariate models enable inferences on distributed responses without requiring focal activations.
The goal of prediction is to find a generalisable encoding or decoding function.
predicting a cognitive state using abrainmachine interface
predicting asubjectspecific diagnostic status
predictive density
marginal likelihood (model evidence)
1 parameter
4 parameters
9 parameters
truth
data
model
underfitting
optimal
overfitting
Bishop (2007) PRML
1Modelling principles
2Classification
3Multivariate Bayes
4Generative embedding
A principled way of designing a classifier would be to adopt a probabilistic approach:
that which maximizes
In practice, classifiers differ in terms of how strictly they implement this principle.
Searchlight approach
A sphere is passed across the brain. At each location, the classifier is evaluated using only the voxels in the current sphere map of tscores.
Wholebrain approach
A constrained classifier is trained on wholebrain data. Its voxel weights are related to their empirical null distributions using a permutation test map of tscores.
Nandy & Cordes (2003) MRMKriegeskorte et al. (2006) PNAS
MouraoMiranda et al. (2005) NeuroImage
Linear SVM
Nonlinear SVM
v2
v1
Vapnik (1999) Springer; Schölkopf et al. (2002) MIT Press
Feature extraction
Classification using crossvalidation
Performance evaluation
Bayesian mixedeffects inference
mixed effects
We can obtain trialwise estimates of neural activity by filtering the data with a GLM.
data
design matrix
coefficients
Boxcar regressor for trial 2
Estimate of this coefficient reflects activity on trial 2
The generalization ability of a classifier can be estimated using a resampling procedure known as crossvalidation. One example is 2fold crossvalidation:
examples
1
?
training example
2
?
test examples
?
3
?
...
...
99
?
100
?
1
2
folds
performance evaluation
A more commonly used variant is leaveoneout crossvalidation.
examples
1
?
training example
2
?
test example
?
3
?
...
...
...
...
...
...
99
?
100
?
1
98
99
2
100
folds
performance evaluation
Singlesubject study with trials
The most common approach is to assess how likely the obtained number of correctly classified trials could have occurred by chance.
subject
Binomial test
In MATLAB:
p = 1  binocdf(k,n,pi_0)
number of correctly classified trials
total number of trials
chance level (typically 0.5)
binomial cumulative density function
trial 1
…
trial

+


+
population
subject 1
subject 2
subject 3
subject 4
subject
…
trial 1
…
trial
…

+


+
Group study with subjects, trials each
In a group setting, we must account for both withinsubjects (fixedeffects) and betweensubjects (randomeffects) variance components.
Binomial test on concatenated data
Binomial test on averaged data
ttest onsummary statistics
Bayesian mixedeffects inference
available for MATLAB and R
fixed effects
fixed effects
random effects
mixed effects
sample mean of sample accuracieschance level (typically 0.5)
sample standard deviationcumulative Student’s distribution
Brodersen, Mathys, Chumbley, Daunizeau, Ong, Buhmann, Stephan (2012) JMLR
Brodersen, Daunizeau, Mathys, Chumbley, Buhmann, Stephan (2013) NeuroImage
Overall classification accuracy
Spatial deployment of discriminative regions
accuracy
80%
100 %
50 %
Healthy or ill?
Left or right button?
Truthorlie?
55%
classification task
Temporal evolution of discriminability
Modelbased classification
accuracy
Participant indicates decision
100 %
{ group 1, group 2 }
50 %
Accuracy rises above chance
withintrial time
Pereira et al. (2009) NeuroImage, Brodersen et al. (2009) The New Collection
Todd et al. 2013, NeuroImage
Todd et al. 2013, NeuroImage
Todd et al. 2013, NeuroImage
1Modelling principles
2Classification
3Multivariate Bayes
4Generative embedding
SPM brings multivariate analyses into the conventional inference framework of hierarchical Bayesian models and their inversion.
Mike West
Multivariate analyses in SPM rest on the central notion that inferences about how the brain represents things can be reduced to model comparison.
vs.
decoding model
some cause or consequence
sparse coding in orbitofrontal cortex
distributed coding in prefrontal cortex
Encoding model: GLM
Decoding model: MVB
Friston et al. 2008 NeuroImage
To make the illposed regression problem tractable, MVB uses a prior on voxel weights. Different priors reflect different anatomical and/or coding hypotheses.
For example:
patterns
Voxel 2 is allowed to play a role.
Voxel 3 is allowed to play a role, but only if its neighbours play similar roles.
voxels
Friston et al. 2008 NeuroImage
photic
motion
attention
const
MVB can be illustrated using SPM’s attentiontomotion example dataset.
This dataset is based on a simple block design. There are three experimental factors:
During these scans, for example, subjects were passively viewing moving dots.
scans
Büchel& Friston 1999 Cerebral CortexFriston et al. 2008 NeuroImage
Step 1
After having specified and estimated a model, use the Results button.
Step 2
Select the contrast to be decoded.
Step 3
Pick a region of interest.
Step 5
Here, the region of interest is specified as a sphere around the cursor. The spatial prior implements a sparse coding hypothesis.
anatomical hypothesis
coding hypothesis
Step 4
Multivariate Bayes can be invoked from within the Multivariate section.
Step 6
Results can be displayed using the BMS button.
log BF = 3
How does the brain represent things?
Evaluating competing coding hypotheses
Where does the brain represent things?
Evaluating competing anatomical hypotheses
1Modelling principles
2Classification
3Multivariate Bayes
4Generative embedding
A
step 1 —
modelling
step 2 —
embedding
A → B
A → C
B → B
B → C
B
C
measurements from an individual subject
subjectspecificgenerative model
subject representation in modelbased feature space
1
A
step 3 —
classification
step 4 —
evaluation
step 5 —
interpretation
accuracy
B
C
0
discriminability of groups?
jointly discriminative
connection strengths?
classification model
Brodersen, Haiss, Ong, Jung, Tittgemeyer, Buhmann, Weber, Stephan (2011) NeuroImageBrodersen, Schofield, Leff, Ong, Lomakina, Buhmann, Stephan (2011) PLoS Comput Biol
planumtemporale
planumtemporale
Heschl’sgyrus(A1)
Heschl’sgyrus(A1)
L
R
medialgeniculatebody
medialgeniculatebody
y = –26 mm
stimulus input
anatomicalregions of interest
*
100%
90%
80%
balanced accuracy
patients
controls
70%
60%
n.s.
n.s.
50%
a
c
s
p
m
e
z
o
f
l
r
activation
based
correlation
based
model
based
patients
controls
Voxelbased activity space
Modelbased parameter space
generative embedding
Voxel 3
Parameter 3
Voxel 2
Parameter 2
Voxel 1
Parameter 1
classification accuracy75%
classification accuracy98%
PT
PT
HG(A1)
HG(A1)
L
R
MGB
MGB
stimulus input
PT
PT
HG(A1)
HG(A1)
L
R
MGB
MGB
stimulus input
highly discriminative
somewhat discriminative
not discriminative
42 patients diagnosedwith schizophrenia
fMRI data acquired during workingmemory task & modelled using a threeregion DCM
WM
PC
dLPFC
41 healthy controls
VC
stimulus
Deserno, Sterzer, Wüstenberg, Heinz, & Schlagenhauf (2012) J Neurosci
model selection
interpretation
validation
Brodersen et al. 2014, NeuroImage: Clinical
Classification
Multivariate Bayes
Generative embedding
Classification
Multivariate Bayes
Multivariate approaches can exploit a sampling bias in voxelized imagesto reveal interesting activity on a subvoxel scale.
Boynton (2005) Nature Neuroscience
The last 10 years have seen a notable increase in decoding analyses in neuroimaging.
PET
prediction
Lautrup et al. (1994) Supercomputing in Brain Research
Haxby et al. (2001) Science
Which brain regions are jointly informative of a cognitive state of interest?
Searchlight approach
A sphere is passed across the brain. At each location, the classifier is evaluated using only the voxels in the current sphere map of tscores.
Wholebrain approach
A constrained classifier is trained on wholebrain data. Its voxel weights are related to their empirical null distributions using a permutation test map of tscores.
Nandy & Cordes (2003) MRMKriegeskorte et al. (2006) PNAS
MouraoMiranda et al. (2005) NeuroImage
Brodersen, Ong, Buhmann, Stephan (2010) ICPR
Example – decoding the identity of the person speaking to the subject in the scanner
voxel 1
voxel 2
...
fingerprint plot
(one plot per class)
Formisano et al. (2008) Science
Is there a link between and ?
To test for a statistical dependency between a contextual variable and the BOLD signal , we compare
Which statistical test?
The NeymanPearson lemma
The most powerful test of size is:to reject when the likelihood ratio exceeds a criticial value ,
with chosen such that
The null distribution of the likelihood ratio can be determined nonparametrically or under parametric assumptions.
This lemma underlies both classical statistics and Bayesian statistics (where is known as a Bayes factor).
Neyman & Person (1933) Phil Trans Roy Soc London
In summary
Neyman & Person (1933) Phil Trans Roy Soc London
Kass & Raftery (1995) J Am Stat Assoc
Friston et al. (2009) NeuroImage
Example – decoding which button the subject pressed
classificationaccuracy
motor cortex
decision
response
frontopolar cortex
Soon et al. (2008) Nature Neuroscience
Approach
Mitchell et al. (2008) Science
Approach
Paninski et al. (2007) Progr Brain Res
Miyawaki et al. (2009) Neuron
1st level – spatial coding hypothesis
patterns
Voxel 2 is allowed to play a role.
Voxel 3 is allowed to play a role, but only if its neighbours play weaker roles.
voxels
2nd level – pattern covariance structure
Thus: and
Pattern 3 makes an important positive contribution / is activated.
Model inversion involves finding the posterior distribution over voxel weights .
In MVB, this includes a greedy search for the optimal covariance structure that governs the prior over .
Partition #1
subset
Pattern 8 is allowed to make some contribution, independently of the contributions of other patterns.
Partition #2
subset
subset
Partition #3
(optimal)
subset
subset
subset
(motion)
MVB may outperform conventional point classifiers when using a more appropriate coding hypothesis.
Support Vector Machine
Modelbasedanalyses
How do patterns ofhidden quantities (e.g., connectivity among brain regions) differ between groups?
Activationbasedanalyses
Which functionaldifferences allow us toseparate groups?
Structurebasedanalyses
Which anatomical structures allow us to separate patients and healthy controls?
supervised learning:SVM classification
unsupervised learning:GMM clustering (using effective connectivity)
best model
78%
78%
71%
71%
62%
55%
regionalactivity
functional connectivity
effective connectivity
(model selection)
(model validation)
Brodersen, Deserno, Schlagenhauf, Penny, Lin, Gupta, Buhmann, Stephan (in preparation)
Question 1 – What do the data tell us about hidden processes in the brain?
compute the posterior
?
Question 2 – Which model is best w.r.t. the observed fMRI data?
compute the model evidence
?
Question 3 – Which model is best w.r.t. an external criterion?
compute the classification accuracy
{ patient, control }
modelbasedclassification
{ group 1, group 2 }
activationbasedclassification
model selection
vs.
structurebasedclassification
inference onmodelparameters
?