A (very) brief introduction to multivoxel analysis “stuff”

A (very) brief introduction to multivoxel analysis “stuff” Jo Etzel, Social Brain Lab j.a.etzel@med.umcg.nl

Mass univariate (spm) Test every voxel separately Fit a linear model to each voxel Look for brain structures where the blobs occur Obtain a p-value for each voxel Use parametric (or permutation) statistics to evaluate significance Multivariate Test groups of voxels (ROIs) at once Use machine learning algorithms Structures first (no blobs) Obtain a classification accuracy for each ROI Evaluate significance of accuracy by permutation testing and perhaps subsetting (parametric statistics also possible)

Classification • What sort of flower is this? • Is this example type 1 or type 2? from www.cac.science.ru.nl/people/ustun/index.html

from http://spect.yale.edu/displaying_results.html listening to hand action sounds listening to mouth action sounds Is the brain activity in this ROI different while listening to hand and mouth action sounds? Can the activity in the right premotor cortex classify the volumes into hand and mouth action sounds? - or -

www.cs.sunysb.edu/~mueller/research/brainMiner/ to classifiers temporal compression; extract GLM parameter estimates for each ROI in each volume (or summary volume)

testing set • classify each subject separately, average accuracy across subjects • subjects can have their own patterns • classify subjects together, test left-out subject • subjects need the same patterns all data 2nd test: mouth vs. hand? 1st train: mouth vs. hand? classifier training set accuracy (from test set)

* P < 0. 0042, permutation test (Bonferroni correction) of 0.05 for 12 ROIs. ROIs So, have which ROIs allowed significant classification accuracy for separating mouth and hand action sounds: preML, preMR, M1L, S2R, audL, audR.

What? null hypothesis is arbitrary labeling If labeling is arbitrary (random), then there is no relationship between the labels and the activity, so classification should not be possible. real data classified about the same as arbitrary data: no evidence for a relationship between the labels and the activity: not significant. real data is classified better than the arbitrary data: it is “unusual:” significant. How? classify the real data make many randomized-label data sets (arbitrary or “fake” data). classify each fake data set in the same way as the real data proportion of fake data sets classified better than the real data is the p-value Permutation Test

1. Analyze the real data (get accuracy). M1 L = 0.6000 2. Make lots of permuted-label data sets (all if possible, at least 1,000). 3. Analyze the fake data sets (get accuracies). 0.4944 0.510 0.499 0.5211 0.480 0.5002 0.498 0.519 0.5720 0.4789 … 4. Count how many of the fake data sets were classified more accurately than the real data. Of 1000 fake data sets, none had accuracy higher than 0.60. 5. Divide and get the p-value. (50/1001 = 0.0499) 1/1001 = 0.000999, so M1 L p = 0.001 Done!

Permutation test results, showing true results with uncorrected p-value cutoff lines. True mean accuracy lines that fall above the range of the permuted lines are highly significant (p= 0.002).

Random Forests (RFs) first fMRI use, I think algorithm makes 10,000 CARTs, all try to classify, majority vote final class algorithm makes random variable selections for the “forest” of “trees” support vector machines (svms) previously used with fMRI data algorithm converts data to a higher dimension to try to find a linear separating line (hyperplane) between the classes. Black box? classifier

k-nearest neighbor Linear discriminant analysis Gaussian Naïve Bayes Hidden Markov Models Partial Least Squares For more info/review: Mitchell, Machine Learning, 2004, 145-175; O’toole, Journal of Cognitive Neuroscience, 2007, 1735-1753. classifier Other classification-type algorithms Neural Networks • A generalization of linear regression functions; many variations. • Each node calculates a weighted sum, and does a binary input to the next layer, so simple networks can be expressed as (long) equations. • Training the network for each person involves setting the weights on each voxel.

Idea: brain activity pattern during remembering an item should be similar to the pattern when learning the item. • Subjects learned 3 lists of 10 items each; items were labeled pictures of famous people, famous locations, and everyday objects. • After learning, subjects tried to remember all of the items; saying them aloud as they remembered them.

2: testing volumes when recalling 1: training volumes when learning items in each category classifier (neural network) 3: accuracy (sort of) “For each brain scan, the classifier produced an estimate of the match between the current testing pattern and each of the three study contexts.”

“However, follow-up analyses indicated that voxels outside of peak category-selective areas are also important for establishing this result.”

Take-Home Message • Classification (multivariate methods) can answer questions and find patterns not available with spm-type methods. • spm is still useful, though! • Method is (for me) ROI-focused: start with hypotheses about which ROIs can classify (or not) which stimuli. • Differences in activation to stimuli can be restated as classification predictions.

That’s It!

Permutation vs. t-test p-values The permutation and t-test p-values are very highly correlated, though not always identical. Calculated correlation line in solid, perfect in dotted.

A (very) brief introduction to multivoxel analysis “stuff”