1 / 21

Combined Distance and Feature-Based Clustering of Time-Series: An Application on Neurophysiolohy

SETN 2002 April 10-12 2002 Thessaloniki, Greece. Combined Distance and Feature-Based Clustering of Time-Series: An Application on Neurophysiolohy. George Potamias. Institute of Computer Science FORTH Heraklion, Crete. The Application Domain. Adult brain: Complex network of fibers

liang
Download Presentation

Combined Distance and Feature-Based Clustering of Time-Series: An Application on Neurophysiolohy

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SETN 2002 April 10-12 2002 Thessaloniki, Greece Combined Distance and Feature-Based Clustering of Time-Series: An Application on Neurophysiolohy George Potamias Institute of Computer Science FORTH Heraklion, Crete

  2. The Application Domain • Adult brain: • Complex network of fibers • Brain nuclei functional structures Brain development: Series of eventscell proliferation and migration, growth of axons and dendrites, formation of functional connections and synapses, cell death, myelination of axons and refinement of neuronal specificity Knowledge of the underlying mechanisms that govern these complex processes, and the study of histogenesis and neural plasticityduring brain development are critical for the understanding of the function of normal or injured brain.

  3. Study & Goal Avian Brain • Biosynthetic activity, such as protein synthesis, underlies brain-development events. • The history of in vivo protein synthesis activity of specific brain areas could: • yield insight on their pattern of maturation • reveal relationships between distantly located structures • suggest different rolesof the topographically organized brain structures in the maturation processes The late embryonic development of avian brain was selected for this study; ? Study:The time course of protein-synthesis activity of individual brain areas as a model to correlate critical periods during development Goal: Extract critical-relationships that govern the normal ontogenic processes

  4. Biomedical Background • For the determination of biosynthetic activity the in vivo • auto-radiographic method of carboxyl labeled L-Leucine • was used (an essential amino acid present in most proteins) • The experimental data concern 30 chick embryos • The late embryonic development between day 11 (E11) and day 19 (E19) as well as • the post-hatching day 1(P1) was studied • During that time proliferation of neurons has ceased and cell growth, differentiation, migration and death, axon elongation, refinement of connections, and establishment of functional neuronal networks occurs

  5. Time-Series Representation Protein Synthesis Patterns • 49brain-areas (nuclei) were identified. • Autoradiographic film  Image Analysis Intensities • For each area, the means over all chicks were recorded Intensities The final outcome is a set of 49time-series in a time-span of 6 time-points (five embryonic days and one post-hatching day) Days

  6. The Problem How to get meaning from the mesh ? How to get indicative developmental patterns ?

  7. Method: Discovery of Coherences between Time Series Time Series collection Compute distances(similarities) Distance & feature-based Hierarchical Clustering Visualize – Interpret clustering result(s) Induce underlying/hidden models || Brain Development Hierarchy Time-Series discretization … need for hierarchical modeling

  8. Time-Series Matching: Problems & Tasks ? Use of a normal distance metric … outliers; different scaling factors and baselines • Ignore small or not-significant parts • Translate the offset align vertically • Amplitude scaling  fixed width Need for an adjustable and adaptive time-series matching operation … apply matching metric

  9. Time-Series Discretization v1:drastic-increase v2: increase v3:decrease v4:drastic-decrease 4 intervals = 4 nominal values QDT: Qualitative Discrete Transformation A new continuous value will be assigned to the same discrete value as its preceding values if the continuous value belongs to the same population(based on statistical-significance testing). … the number of discrete-intervals to be specified by the user Lopez et.al., 2000 … … v2 v2 v2 v2 v4 v1 v3 • Achieves- in a convenient way, amplitude scaling, vertical-alignment and identification of (non) significant parts.

  10. Discretization specifics s: number of discrete values width= vi= discr(Xi) = Discrete Transform ofT T’: {v1, v2, …, vm} For a time-series T: {X1, X2, …, Xn}

  11. Distance Metric dist(Ta,Tb)= dist(T’a,T’b) = distance(va;i , vb;i) = DTW Segmentation …

  12. Graph Theoretic Hierarchical Clustering: The Basics dist(Ta,Tb) Time-Series Nodes TS distance weighted Edge Fully connectedweighted Graph Minimum Spanning Tree • offers the ability to ‘isolate’ and group nodes  STOP  STOP  STOP Category Utility: A probabilistic metric • preserves the minimum distance between time-series Iterative Partitioning Hierachical Clustering … which sub-group to form? … when to stop?

  13. Category Utility Distribution of Feature-Values … if CLUSTERED Over ALL formed clusters # formed clusters Distribution of Feature-Values … if NOT-clustered

  14. Stopping Criterion G21 G12 G22 G11 Best Partitioning CU(G11,G12) > CU(G21,G22) G12 G122 G11 G121 G111 G112 Current Best CU(G111,G112,G12) < Previous Best CU(G11,G12)  STOP Current Best CU(G121,G122,G11) > Previous Best CU(G11,G12)  continue

  15. STOP … STOP … GTC- Graph Theoretic Clustering: The Procedure Hierarchical Clustering-Tree ~O(n2 F  V) (preliminary)

  16. Patterning Brain Developmental Events: The Clusters Cluster # Objects Brain Nuclei (areas) c1 13 CA, CP, E, FPLa, LC, LPO, Mld ,PL, PT, SP, Spi, TPc ,VeM c2 20 Ac, CDL, DL, FPLp, GLv, IO, MM, N, NI, OcM, Ov, Rt, SM, Slu, Tov, nBOR, Loc, PA, PM, RPO c3 16 AM, Ad, Bas, Cpi, DM, GCt, HV, Hip, Co, POM, SL, Tn, Lli, PP, Imc, SCA So, the mean of each cluster offers an indicative and representative model for the brain-developmental events … • induction of critical relationships between the brain areas • The biosynthetic activities of each cluster’s brain-areas- over the stamped developmental ages, exhibit no statistical-significant deviation from the respective mean of the cluster

  17. Patterning Brain Developmental Events: The Patterns C1:Decrease – Increase C2:Decrease C3: Increase

  18. Patterning Brain Developmental Events: Hierarchical-Tree  Critical Relationships early maturation early maturation or, control c3 c2 c2 late maturation c3 c1 c1

  19. Patterning Brain Developmental Events: Biomedical Interpretation • Clusters {c1}{c2} • Second order sensory and limbic areas • Decline in protein-synthesis  cell deathorcell displacementdue tomigrationrepresent a common phenomenon in many brain regions under development • Differ significantly at post-hatching day • {c1}: receive sensory-input increase • {c2}: leucine-incorporation isdecreased • Cluster {c3} • Somatosensory, motor,and white-matter areas • Increase in protein-synthesis  myelination and motor-activity

  20. Conclusion & Future work • The introduced time-series mining methodology (QDT/GTC), and the respective analysis on the history of in vivo protein synthesis activity of specific brain areas, yields insight on their maturation patterns and reveal relationships between distantly located structures • The presented study contribute to the identification of common origin of brain structures and provide possible homologies in the mammalian brain • Inclusion of additional formulas and procedures for computing the distance between time-series • Experimentation on other application domains in order to validate the approach and examine its scalability to huge collections of time-series (initial experiments on economic time-series are already in progress with encouraging preliminary results)

  21. A subset of the dataset for words: • “spend”, “lose”, “forget”, “innocent”, “norway”, • “happy”, “later”, “eat”, “cold”, “crazy” Euclidean 2 DTW 22 Keogh and Pazzani, 1999 3rd Conf. on Principles & Practice of Knowledge Discovery in Databases SDTW 21 QDT/GTC 25 “one vs. another” . . . . . . word-1 word-2 out of45 GTC on ASL Australian Sign Language dataset

More Related