1 / 57

Learning to distinguish cognitive subprocesses based on fMRI

Learning to distinguish cognitive subprocesses based on fMRI. Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators: Luis Barrios, Rebecca Hutchinson, Marcel Just, Francisco Pereira, Jay Pujara, John Ramish, Indra Rustandi.

dagmar
Download Presentation

Learning to distinguish cognitive subprocesses based on fMRI

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning to distinguish cognitive subprocesses based on fMRI Tom M. Mitchell Center for Automated Learning and Discovery Carnegie Mellon University Collaborators: Luis Barrios, Rebecca Hutchinson, Marcel Just, Francisco Pereira, Jay Pujara, John Ramish, Indra Rustandi

  2. Finds sentence ambiguous or not? Can we distinguish brief cognitive processes using fMRI?

  3. Can we classify/track multipleoverlapping processes? Read sentence View picture Decide whether consistent Observed fMRI: Observed button press:

  4. Mental Algebra Task [Anderson, Qin, & Sohn, 2002] c 24 3

  5. Activity Predicted by ACT-R Model [Anderson, Qin, & Sohn, 2002] Typical ACT-R rule: IF “_ op a = b” THEN “ _ = <b <inv op> a>”

  6. [Anderson, Qin, & Sohn, 2002]

  7. Outline • Training classifiers for short cognitive processes • Examples • Classifier learning algorithms • Feature selection • Training across multiple subjects • Simultaneously classifying multiple overlapping processes • Linear Model and classification • Hidden Processes and EM

  8. Training “Virtual Sensors” of Cognitive Processes Train classifiers of form: fMRI(t, t+d)  CognitiveProcess e.g., fMRI(t, t+8) = {ReadSentence, ViewPicture} • Fixed set of cognitive processes • Fixed time interval [t, t+d]

  9. Study 1: Pictures and Sentences Data from [Keller et al., 2001] Press Button View Picture Or Read Sentence Read Sentence Or View Picture Fixation Rest t=0 4 sec. 8 sec. • Subject answers whether sentence describes picture by pressing button. • 13 subjects, TR=500msec

  10. It is not true that the star is above the plus.

  11. + --- *

  12. .

  13. Press Button View Picture Or Read Sentence Read Sentence Or View Picture Fixation Rest t=0 4 sec. 8 sec. picture or sentence? picture or sentence? • Learn fMRI(t,t+8)  {Picture,Sentence}, for t=0,8 Difficulties: only 8 seconds of very noisy data overlapping hemodynamic responses additional cognitive processes occuring simultaneously

  14. Learning task formulation: • Learn fMRI(t, …, t+8)  {Picture, Sentence} • 40 trials (40 pictures and 40 sentences) • fMRI(t,…t+8) = voxels x time (~ 32,000 features) • Train separate classifier for each of 13 subjects • Evaluate cross-validated prediction accuracy • Learning algorithms: • Gaussian Naïve Bayes • Linear Support Vector Machine (SVM) • k-Nearest Neighbor • Artificial Neural Networks • Feature selection/abstraction • Select subset of voxels (by signal, by anatomy) • Select subinterval of time • Summarize by averaging voxel activities over space, time • …

  15. C f1 f2 … fn Applying GNB classifier to new instance • Learning a Gaussian Naïve Bayes (GNB) classifier for <f1, … fn>  C • For each class value, ci, • Estimate • For each feature fj estimate • modeling distribution for each ci, fj, as Gaussian,

  16. Support Vector Machines [Vapnik et al. 1992] • Method for learning classifiers corresponding to linear decision surface in high dimensional spaces • Chooses maximum margin decision surface • Useful in many high-dimensional domains • Text classification • Character recognition • Microarray analysis

  17. Support Vector Machines (SVM)

  18. Linear SVM

  19. Non-linear Support Vector Machines • Based on applying kernel functions to data points • Equivalent to projecting data into higher dimensional space, then finding linear decision surface • Select kernel complexity (H) to minimize ‘structural risk’ True error rate Variance term related to kernel H complexity and number of training examples m Error on training data

  20. Generative vs. Discriminative Classifiers Goal: learn , equivalently • Discriminative classifier: • Learn directly • Generative classifier: • Learn • Classify using

  21. Generative vs. Discriminative Classifiers

  22. Gaussian naïve Bayes Model P(X|C) as a class-conditional Gaussian Decision surface: hyperplane Learning converges in O(log(n)) examples, where n is number of data attributes Logistic regression Model P(C|X) as a logistic function Decision surface: hyperplane Learning converges in O(n) examples Asymptotic error less or same as GNB GNB vs. Logistic regression [Ng, Jordan NIPS03]

  23. Accuracy of Trained Pict/Sent Classifier • Results (leave one out cross validation) • Guessing  50% accuracy • SVM: 91% mean accuracy • Single subject accuracies ranged from 75% to 98% • GNB: 84% mean accuracy • Feature selection step important for both • ~10,000 voxels x 16 time samples = 160,000 features • Selected only 240 voxels x 16 time samples

  24. Can We Train Subject-Indep Classifiers?

  25. Training Cross-Subject Classifiers for Picture/Sentence [Wang, Hutchinson, Mitchell. NIPS03] • Approach1: define “supervoxels” based on anatomically defined brain regions • Abstract to seven brain region supervoxels • Each supervoxel 100’s to 1000’s of voxels • Train on n-1 subjects, test on nth subject • Result: 75% prediction accuracy over subjects outside training set • Compared to 91% avg. single-subject accuracies • Significantly better than 50% guessing accuracy

  26. Word categories: Fish Trees Vegetables Tools Dwellings Building parts Study 2: Semantic Word Categories [Francisco Pereira] Experimental setup: • Block design • Two blocks per category • Each block begins by presenting category name, then 20 words • Subject indicates whether word fits category

  27. Learning task formulation • Learn fMRI(t, …, t+32)  WordCategory • fMRI(t,…t+32) represented by mean fMRI image • Train on presentation 1, test on presentation 2 (and vice versa) • Learning algorithm: • 1-Nearest Neighbor, based on spatial correlation [after Haxby] • Feature selection/abstraction • Select most ‘object selective’ voxels, based on multiple regression on boxcars convolved with gamma function • 300 voxels in ventral temporal cortex produced greatest accuracy

  28. Results predicting word semantic category Mean pairwise prediction accuracy averaged over 8 subjects: • Ventral temporal: 77% (low: 57%, high 88%) • Parietal: 70% • Frontal: 67% Random guess: 50%

  29. Mean Activation per Voxel for Word Categories Vegetables P(fMRI | WordCategory) Tools one horizontal slice, ventral temporal cortex [Pereira, et al 2004] Dwellings

  30. Plot of single-voxel classification accuracies. Gaussian naïve Bayes classifier (yellow and red are most predictive). Images from three different subjects show similar regions with highly informative voxels. Subject 1 Subject 2 Subject 3

  31. Single-voxel GNB classification error vs. p value from T-statistic N=10^3, P < 0.0001, Error = 0.01 N=10^6, P < 0.0001, Error = 0.51 Cross validated prediction error is unbiased estimate of the Bayes optimal error – the area under the intersection

  32. Question: No. Do different people’s brains ‘encode’ semantic categories using the same spatial patterns? But, there are cross-subject regularities in “distances” between categories, as measured by classifier error rates.

  33. Six-Category Study: Pairwise Classification Errors (ventral temporal cortex) * Worst * Best

  34. LDA classification of semantic categories of photographs. [Carlson, et al., J. Cog. Neurosci, 2003]

  35. Cox & Savoy, Neuroimage 2003 Trained SVM and LDA classifiers for semantic photo categories. Classifiers applied to same subject a week later were equally accurate

  36. Lessons Learned Yes, one can train machine learning classifiers to distinguish a variety of cognitive processes • Comprehend Picture vs. Sentence • Read ambiguous sentence vs. unambiguous • Read Noun vs. Verb • Read Nouns about “tools” vs. “building parts” Failures too: • True vs. false sentences • Negative vs. affirmative sentences

  37. NoYes NoYes NoYes NoYes Which Machine Learning Method Works Best? • GNB and SVM tend to outperform KNN • Feature selection important Average per-subject classification error

  38. Which Feature Selection Works Best? Wish to learn F: <x1,x2,…xn>  {A,B} • Conventional wisdom: pick features xi that best distinguish between classes A and B • E.g., sort xi by mutual information, choose the top n • Surprise: • Alternative strategy worked much better

  39. The learning setting Class A Class B Voxel discriminability Voxel activity Voxel activity Rest / Fixation

  40. GNB Classifier Errors: Feature Selection fMRI study Picture Sentence Syntactic Ambiguity Nouns vs. Verbs Word Categories All features .29 .43 .36 .10 Discriminate target classes .26 .34 .36 .10 feature selection method Active .16 .25 .34 .08 ROI Active .18 .27 .31 .09 ROI Active Average .21 .27 .23 NA

  41. “Zero Signal” learning setting. Select features based on discrim(X1,X2) or discrim(Z,Xi)? Class 1 observations Class 2 observations • Goal: learn f: XY or P(Y|X) • Given: • Training examples <Xi, Yi> where Xi = Si + Ni , signal Si ~ P(S|Y= Yi), noise Ni ~ Pnoise • Observed noise with zero signal N0 ~ Pnoise X1=S1+N1 X2=S2+N2 Z = N0 Zero signal (fixation)

  42. “Zero Signal” learning setting • Conjecture: feature selection using discrim(Z,Xi) will improve relative to discrim(X1,X2) as: • # of features increases • # of training examples decreases • signal/noise ratio decreases • fraction of relevant features decreases

  43. Input stimuli: ? 2. Can we classify/track multipleoverlapping processes? Read sentence View picture Decide whether consistent Observed fMRI: Observed button press:

  44. fMRI: see [Hojen-Sorensen et al, NIPS99] Bayes Net related State-Space Models HMM’s, DBNs, etc. e.g., [Ghahramani, 2001] Cognitive subprocesses / state variables:

  45. Hidden Process Model [with Rebecca Hutchinson] Each process defined by: • ProcessID: <comprehend sentence> • Maximum HDR duration: R • EmissionDistribution: [ W(v,t) ] Interpretation Z of data: set of process instances • Desire max likelihood { <ProcessIDi, StartTimei>} • Where data likelihood is Generative model for classifying overlapping hidden processes

  46. Classifying Processes with HPMs Start time known: Start time unknown: consider candidate times S

  47. HPM: picture or sentence? picture or sentence? Press Button View Picture Or Read Sentence Read Sentence Or View Picture Fixation Rest t=0 4 sec. 8 sec. 16 sec. picture or sentence? GNB: picture or sentence? GNB classifier is a special case of HPM classifier

  48. Learning HPMs • Known start times: Least squares regression, eg. see Dale[HMB, 1999] • Unknown start times: EM algorithm • Repeat: • Estimate P(S|Y,W) • W’  arg max Currently implement M step with gradient ascent

More Related