1 / 76

Justin Dauwels LIDS, MIT Amari Research Unit, Brain Science Institute, RIKEN June 11, 2008

Machine learning techniques for quantifying neural synchrony: application to the diagnosis of Alzheimer's disease from EEG. Justin Dauwels LIDS, MIT Amari Research Unit, Brain Science Institute, RIKEN June 11, 2008. Acknowledgments. Collaborators

scott-chen
Download Presentation

Justin Dauwels LIDS, MIT Amari Research Unit, Brain Science Institute, RIKEN June 11, 2008

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Machine learning techniques for quantifying neural synchrony:application to the diagnosis of Alzheimer's diseasefrom EEG Justin Dauwels LIDS, MIT Amari Research Unit, Brain Science Institute, RIKEN June 11, 2008

  2. Acknowledgments Collaborators François Vialatte*, Theo Weber+, Shun-ichi Amari*, Andrzej Cichocki* (*RIKEN, +MIT) Financial Support • RIKEN Wako Campus (near Tokyo) • about 400 researchers and staff (20% foreign) • 300 research fellows and visiting scientists • about 60 laboratories • research covers most aspects of brain science

  3. Overview • Alzheimer’s Disease (AD) • EEG of AD patients: decrease in synchrony • Synchrony measure in time-frequency domain • Pairs of EEG signals • Collections of EEG signals • Numerical Results • Outlook

  4. Alzheimer's disease Outside glimpse: clinical perspective Evolution of the disease (stages) One disease, many symptoms • 2 to 5 years before • mild cognitive impairment (often unnoticed) • 6 to 25 % progress to Alzheimer's per year memory, language, executive functions, apraxia, apathy, agnosia, etc… • Mild (early stage) • becomes less energetic or spontaneous • noticeable cognitive deficits • still independent (able to compensate) Memory (forgetting relatives) • Moderate (middle stage) • Mental abilities decline • personality changes • become dependent on caregivers Apathy • Severe (late stage) • complete deterioration of the personality • loss of control over bodily functions • total dependence on caregivers Loss of Self-control • 2% to 5% of people over 65 years old • up to 20% of people over 80 • Jeong 2004 (Nature) Video sources: Alzheimer society

  5. Alzheimer's disease Inside glimpse: brain atrophy amyloid plaques and neurofibrillary tangles Video source: Alzheimer society Images: Jannis Productions. (R. Fredenburg; S. Jannis) Video source: P. Thompson, J.Neuroscience, 2003

  6. Overview • Alzheimer’s Disease (AD) • EEG of AD patients: decrease in synchrony • Synchrony measure in time-frequency domain • Pairs of EEG signals • Collections of EEG signals • Numerical Results • Outlook

  7. Alzheimer's disease Inside glimpse: abnormal EEG EEG system: inexpensive, mobile, useful for screening Brain “slow-down” slow rhythms (0.5-8 Hz) fast rhythms (8-30 Hz) (Babiloni et al., 2004; Besthorn et al., 1997; Jelic et al. 1996, Jeong 2004; Dierks et al., 1993). focus of this project Decrease of synchrony • AD vs. MCI (Hogan et al. 203; Jiang et al., 2005) • AD vs. Control (Hermann, Demilrap, 2005, Yagyu et al. 1997; Stam et al., 2002; Babiloni et al. 2006) • MCI vs. mildAD (Babiloni et al., 2006). Images: www.cerebromente.org.br

  8. Spontaneous EEG Time-frequency |X(t,f)|2 (wavelet transform) f (Hz) Time-frequency patterns (“bumps”) Fourier |X(f)|2 Fourier power t (sec) amplitude EEG x(t)

  9. Overview • Alzheimer’s Disease (AD) • EEG of AD patients: decrease in synchrony • Synchrony measure in time-frequency domain • Pairs of EEG signals • Collections of EEG signals • Numerical Results • Outlook

  10. Comparing EEG signals 2 signals PROBLEM I: Signals of 3 seconds sampled at 100 Hz ( 300 samples) Time-frequency representation of one signal = about 25 000 coefficients

  11. One pixel Numerous neighboring pixels Comparing EEG signals (2) PROBLEM II: Shifts in time-frequency!

  12. Sparse representation: bump model f(Hz) f(Hz) Bumps Sparse representation t (sec) f(Hz) t (sec) 104- 105 coefficients t (sec) about 102 parameters • Assumptions: • time-frequency map is suitable representation • oscillatory bursts (“bumps”) convey key information F. Vialatte et al. “A machine learning approach to the analysis of time-frequency maps and its application to neural dynamics”, Neural Networks (2007).

  13. Similarity of bump models... How “similar”or “synchronous”are two bump models?

  14. ... by matching bumps y2 y1 Some bumps match Offset between matched bumps SIMILAR bump models if: Many matches Strongly overlapping matches

  15. ... by matching bumps (2) • Bumps in one model, but NOT in other • → fraction of “spurious” bumps ρspur • Bumps in both models, but with offset • → Average time offset δt(delay) • → Timing jitter with variance st • → Average frequency offset δf • → Frequencyjitter with variance sf • Synchrony: only st and ρspur relevant Stochastic Event Synchrony (SES) =(ρspur,δt,st, δf, sf) PROBLEM: Given two bump models, compute (ρspur,δt,st, δf, sf)

  16. Generative model yhidden • Generate bump model (hidden) • geometric prior for number n of bumps • p(n) = (1- λ S) (λ S)-n • bumps are uniformly distributed in rectangle • amplitude, width (in t and f) all i.i.d. • Generate two “noisy” observations • offset between hidden and observed bump • = Gaussian random vector with • mean ( ±δt /2, ±δf /2) • covariance diag(st/2, sf /2) • amplitude, width (in t and f) all i.i.d. • “deletion” with probability pd y y’ ( -δt /2, -δf/2) ( δt /2, δf/2) Easily extendable to more than 2 observations…

  17. Generative model (2) y y’ i ( -δt /2, -δf/2) j’ i’ ( δt /2, δf/2) • Binary variables ckk’ • ckk’ = 1 if k and k’ are observations of same hidden bump, else ckk’ = 0 (e.g., cii’ = 1 cij’ = 0) • Constraints: bk = Σk’ ckk’ and bk’ = Σk ckk’ are binary (“matching constraints”) • Generative Model p(y, y’, yhidden ,c, δt, δf , st,sf ) (symmetric in y and y’) • Eliminate yhidden→ offset is Gaussian RV with mean = ( δt , δf ) and covariance diag (st , sf) • Probabilistic Inference: θ p(y, y’, c, θ) = ∫p(y, y’, yhidden ,c, θ) dyhidden (c*,θ*) = argmaxc,θ log p(y, y’, c, θ)

  18. Summary • Bumps in one model, but NOT in other • → fraction of “spurious” bumps ρspur • Bumps in both models, but with offset • → Average time offset δt(delay) • → Timing jitter with variance st • → Average frequency offset δf • → Frequencyjitter with variance sf PROBLEM: Given two bump models, compute (ρspur,δt,st, δf, sf) θ APPROACH: (c*,θ*) = argmaxc,θ log p(y, y’, c, θ)

  19. Objective function y y’ i ( -δt /2, -δf/2) j’ i’ ( δt /2, δf/2) • Logarithm of model: log p(y, y’, c, θ) = Σkk’wkk’ckk’ + log I(c) + log pθ(θ) + γ wkk’ = -(1/2st (t k’ – tk –δt)2 + 1/2sf (f k’ – fk–δf)2) - 2log β- 1/2 log st - 1/2 log sf β = pd (λ/V)1/2 Euclidean distance between bump centers • Largewkk’ if : a) bumps are close b) small pd c) few bumps per volume element • No need to specify pd , λ, and V, they only appear through β= knob to control # matches

  20. Generative model p(y, y’, c, θ) /I(c) pθ(θ) Πkk’(N(t k’ – tk ;δt ,st,kk’)N(f k’ – fk ;δf ,sf, kk’)β-2)ckk’

  21. Prior for parameters • Expect bumps to appear at about same frequency, but delayed Frequency shift requires non-linear transformation, less likely than delay • Conjugate priors for st and sf(scaled inverse chi-squared): • Improper prior for δtandδt : p(δt) = 1 = p(δf)

  22. Probabilistic inference PROBLEM: Given two bump models, compute (ρspur,δt,st, δf, sf) θ APPROACH: (c*,θ*) = argmaxc,θ log p(y, y’, c, θ) SOLUTION: Coordinate descent c(i+1)= argmaxc log p(y, y’, c, θ(i)) θ(i+1) = argmaxx log p(y, y’, c(i+1),θ) MATCHING POINT ESTIMATION Minx2X,y2Yd(x,y) X Y

  23. Probabilistic inference POINT ESTIMATION: θ(i+1) = argmaxx log p(y, y’, c(i+1),θ) Uniform prior p(θ): δt, δf= average offset, st, sf= variance of offset Conjugate prior p(θ): still closed-form expression Other kind of prior p(θ): numerical optimization (gradient method)

  24. Probabilistic inference MATCHING: c(i+1)= argmaxc log p(y, y’, c, θ(i)) EQUIVALENT to (imperfect) bipartite max-weight matching problem (belongs to P) c(i+1) = argmaxc log p(y, y’, c, θ(i)) = argmaxc Σkk’wkk’(i)ckk’ s.t. Σk’ ckk’ ≤ 1and Σk ckk’ ≤ 1 with ckk’ 2{0,1} find heaviest set of disjoint edges not necessarily perfect • ALGORITHMS • Polynomial-time algorithms gives optimal solution(s) (Edmond-Karp and Auction algorithm) • Linear programming relaxation gives optimal solution if unique [Sanghavi (2007)] • Max-product algorithmgives optimal solution if unique [Bayati et al. (2005), Sanghavi (2007)]

  25. Max-product algorithm MATCHING: c(i+1)= argmaxc log p(y, y’, c, θ(i)) Generative model p(y, y’, c, θ) /I(c) pθ(θ) Πkk’(N(t k’ – tk ;δt ,st,kk’)N(f k’ – fk ;δf ,sf, kk’)β-2)ckk’

  26. Max-product algorithm MATCHING: c(i+1)= argmaxc log p(y, y’, c, θ(i)) Conditioning on θ μ↓ μ↓ μ↑ μ↑

  27. Max-product algorithm (2) • Iteratively compute messages • At convergence, compute marginals p(ckk’) = μ↓(ckk’)μ↓(ckk’)μ↑(ckk’) • Decisions: c*kk’ = argmaxckk’ p(ckk’)

  28. Summary PROBLEM: Given two bump models, compute (ρspur,δt,st, δf, sf) θ APPROACH: (c*,θ*) = argmaxc,θ log p(y, y’, c, θ) SOLUTION: Coordinate descent c(i+1)= argmaxc log p(y, y’, c, θ(i)) θ(i+1) = argmaxx log p(y, y’, c(i+1),θ) MATCHING → max-product ESTIMATION → closed-form

  29. Average synchrony 3. SES for each pair of models 4. Average the SES parameters • Group electrodes in regions • Bump model for each region

  30. Overview • Alzheimer’s Disease (AD) • EEG of AD patients: decrease in synchrony • Synchrony measure in time-frequency domain • Pairs of EEG signals • Collections of EEG signals • Numerical Results • Outlook

  31. Beyond pairwise interactions... Multi-variate similarity Pairwise similarity

  32. ...by clustering y2 y1 y3 y4 y5 • Models similar if • few deletions/large clusters • little jitter y2 y1 y3 y4 y5 Constraint: in each cluster at most one bump from each signal

  33. Generative model yhidden • Generate bump model (hidden) • geometric prior for number n of bumps • p(n) = (1- λ S) (λ S)-n • bumps are uniformly distributed in rectangle • amplitude, width (in t and f) all i.i.d. y2 y1 y3 y4 y5 • Generate M“noisy” observations • offset between hidden and observed bump • = Gaussian random vector with • mean ( δt,m /2, δf,m /2) • covariance diag(st,m/2, sf,m /2) • amplitude, width (in t and f) all i.i.d. • “deletion” with probability pd pc(i) = p(cluster size = i |y) (i = 1,2,…,M) Parameters: θ = δt,m , δf,m , st,m , sf,m,pc

  34. Inference • SOLUTION 1 • ITERATE: • Infer hidden bump model (= cluster centers) • Compute parameters δt,m , δf,m , st,m , sf,m,pc • = adaption of K-means clustering • = ESTIMATION problem • PROBLEM: LOCAL extrema! • SOLUTION 2 • Assumption: one bump in each cluster is hidden bump (“exemplar”) • ITERATE: • Find exemplars (= cluster centers) and non-exemplars • Compute parameters δt,m , δf,m , st,m , sf,m,pc • = DETECTION problem (combinatorial optimization/integer program) • = adaption of “affinity propagation” [Frey et al., Science, 2007] • or convex clustering algorithm [Lashkari et al., NIPS 2007] • ADVANTAGE: we can find GLOBAL OPTIMUM!

  35. Exemplar-based formulation yhidden y2 y1 y3 y4 y5 pc(i) = p(cluster size = i |y) (i = 1,2,…,M) Parameters: θ = δt,m , δf,m , st,m , sf,m,pc Set of EXEMPLARS = “average” point process → compression

  36. Exemplar-based formulation: IP Binary Variables Integer Program: LINEAR objective function/constraints Equivalent to k-dim matching: for k = 2: in P but for k > 2: NP-hard!

  37. Probabilistic inference PROBLEM: Given M bump models, compute θ = δt,m , δf,m , st,m , sf,m,pc APPROACH: (b*,θ*) = argmaxc,θ log p(y, y’, b, θ) SOLUTION: Coordinate descent b(i+1)= argmaxc log p(y, y’, b, θ(i)) θ(i+1) = argmaxx log p(y, y’, b(i+1),θ) CLUSTERING (IP or MP) POINT ESTIMATION • Integer program • Max-product algorithm (MP) on sparse graph: FAILED! • Integer programming methods (e.g., LP relaxation): GREAT! • IP with 10.000 variables solved in about 1s, total run time about 5s • CPLEX: commercial toolbox for solving IPs (combines several algorithms)

  38. Overview • Alzheimer’s Disease (AD) • EEG of AD patients: decrease in synchrony • Synchrony measure in time-frequency domain • Pairs of EEG signals • Collections of EEG signals • Numerical Results • Outlook

  39. EEG Data • EEG of 22 Mild Cognitive Impairment (MCI) patients and 38 age-matched • control subjects (CTR) recorded while in rest with closed eyes • →spontaneous EEG • All 22 MCI patients suffered from Alzheimer’s disease (AD) later on • Electrodes located on 21 sites according to 10-20 international system • Electrodes grouped into 5 zones (reduces number of pairs) • 1 bump model per zone • Used continuous “artifact-free” intervals of 20s • Band pass filtered between 4 and 30 Hz EEG data provided by Prof. T. Musha

  40. Similarity measures • Correlation and coherence • Granger causality (linear system): DTF, ffDTF, dDTF, PDC, PC, ... • Phase Synchrony: compareinstantaneous phases (wavelet/Hilbert transform) • State space based measures • sync likelihood, S-estimator, S-H-N-indices, ... • Information-theoretic measures • KL divergence, Jensen-Shannon divergence, ... FREQUENCY TIME No Phase Locking Phase Locking

  41. Sensitivity (average synchrony) Corr/Coh Granger Info. Theor. State Space Phase SES Mann-Whitney test: small p value suggests large difference in statistics of both groups Significant differences for ffDTF and ρ!

  42. Classification ffDTF • Clearseparation, but not yet useful as diagnostic tool • Additionalindicators needed (fMRI, MEG, DTI, ...) • Can be used for screening population (inexpensive, simple, fast)

  43. Correlations Strong (anti-) correlations „families“ of sync measures

  44. Recent results for multivariate SES Cluster size = 1 Cluster size = 2 Cluster size = 3 Cluster size = 4 SMALLER clusters in MCI patients! Cluster size = 5

  45. Recent results for multivariate SES (2) ± 85% correctly classified Average cluster size ± 90% correctly classified Average cluster size

  46. Overview • Alzheimer’s Disease (AD) • EEG of AD patients: decrease in synchrony • Synchrony measure in time-frequency domain • Pairs of EEG signals • Collections of EEG signals • Numerical Results • Ongoing Work and Outlook

  47. Ongoing work no stimulus no stimulus stimulus high st low st high st high st low st high st Time-varying similarity parameters st

  48. Probabilistic inference PROBLEM: Given M bump models, compute θ = δt,m , δf,m , st,m , sf,m,pc APPROACH: (b*,θ*) = argmaxc,θ log p(y, y’, b, θ) SOLUTION: Coordinate descent b(i+1)= argmaxc log p(y, y’, b, θ(i)) θ(i+1) = argmaxx log p(y, y’, b(i+1),θ) CLUSTERING (IP or MP) CUBIC SPLINE SMOOTHING PRIOR: POINT ESTIMATION:

  49. Future work MODEL yhidden • Less trivial prior (not i.i.d.) • Multiple hidden processes • independent • dependent y2 y1 y3 y4 y5 More complex transformations = “observation model” INFERENCE: infer distributions instead of point estimates

More Related