1 / 82

Structuring Interactive Cluster Analysis

Structuring Interactive Cluster Analysis. Wayne Oldford University of Waterloo. Structuring Interactive Cluster Analysis. This talk is about interactive cluster analysis, that is about interactive tools for finding and identifying groups in data.

adamek
Download Presentation

Structuring Interactive Cluster Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Structuring Interactive Cluster Analysis Wayne Oldford University of Waterloo Structuring Interactive Cluster Analysis R.W. Oldford

  2. Structuring Interactive Cluster Analysis This talk is about interactive cluster analysis, that is about interactive tools for finding and identifying groups in data. But more than that, it's about stepping back and understanding the structure of this process so that software tools can be organized to simplify and to aid the analysis. Wayne Oldford University of Waterloo Structuring Interactive Cluster Analysis R.W. Oldford

  3. Overview The problem of `cluster analysis' or of `finding groups in data' is ill defined. So there can be no universal solution and any claimed solution must necessarily solve some other suitably constrained problem and not the more general one. What we need instead are highly interactive tools which allow us to adapt to the peculiarities of the data and the problem at hand. These tools are usefully organized and integrated if we step back and consider the problem as one of exploratory data analysis, except that now, in addition to the data itself, the exploration is to take place as well on the space of partitions of the data. Existing algorithms need to be recast, and new ones developed, in terms of exploring the space of partitions. The algorithms can then be easily integrated with other interactive tools so that jointly they provide a broadly useful and easily adapted tool-set for finding and identifying groups in data. Argument: • ill-defined problem • high-interaction desirable • explore partitions • recast algorithms Structuring Interactive Cluster Analysis R.W. Oldford

  4. Overview Argument: Develop by example: • ill-defined problem • high-interaction desirable • explore partitions • recast algorithms • problems • resources • interactive clustering • partition moves • implications • prototype interface Structuring Interactive Cluster Analysis R.W. Oldford

  5. Problem … geometric/visual structure Visual system easily identifies groups … algorithms are often motivated and/or understood via visual intuition and geometric structure Structuring Interactive Cluster Analysis R.W. Oldford

  6. Problem … geometric/visual structure Visual system easily identifies groups … algorithms are often motivated and/or understood via visual intuition and geometric structure Structuring Interactive Cluster Analysis R.W. Oldford

  7. Problem … Consider visually grouping here: Context matters … each point is a document located by each word’s frequency within the document Structuring Interactive Cluster Analysis R.W. Oldford

  8. Problem … two similar documents of different lengths should be “closer” … one of these has more text than the other. Structuring Interactive Cluster Analysis R.W. Oldford

  9. Problem … green “closer” to orange than to red? … “distance” measured by angle? Structuring Interactive Cluster Analysis R.W. Oldford

  10. Problem … structure in context … segmentation in MRI … groups are spatially contiguous in the plane of the image and nearby in the intensity. … shape is not defined a priori … image source Structuring Interactive Cluster Analysis R.W. Oldford

  11. Problem … context specific structure … aneurysm presents as intensity in blood vessels … groups are spatially contiguous tubes of similar intensity … shape is restricted a priori to be 3-d tubes … image source Structuring Interactive Cluster Analysis R.W. Oldford

  12. Problem … some specific some not … image source … same slice, five different measurements at each location … spatial grouping as before, additional grouping possible across measurements Structuring Interactive Cluster Analysis R.W. Oldford

  13. Problem … some specific some not … image source 4 dimensional data from connected images: … 2d spatial with clear biological grouping, connected to … 2d intensity measures with abstract structure/grouping Structuring Interactive Cluster Analysis R.W. Oldford

  14. Problem • Find groups in data • Similar objects are together • Groups are separated • What do you mean similar? • Problem is ill defined: • E.g. what is contiguous structure? • When are groups separate? • Can we believe it? Structuring Interactive Cluster Analysis R.W. Oldford

  15. Computational resources 1. Processing 2. Memory 3. Display Structuring Interactive Cluster Analysis R.W. Oldford

  16. Computational resources (and response) 1. Processing • Gflops, Tflops, multiple processors • “computationally intensive” methods • problem constrained and optimized 2. Memory 3. Display Structuring Interactive Cluster Analysis R.W. Oldford

  17. Computational resources (and response) 1. Processing 2. Memory • GBs, TBs, disk and RAM • try to analyze huge data-sets • data-sets larger than necessary? 3. Display Structuring Interactive Cluster Analysis R.W. Oldford

  18. Computational resources (and response) 1. Processing 2. Memory 3. Display • high resolution, large • graphics processors, digital video • more data, more visual detail Structuring Interactive Cluster Analysis R.W. Oldford

  19. Computational resources 1. Processing 2. Memory 3. Display Exploit no one resource exclusively Balance and integrate Structuring Interactive Cluster Analysis R.W. Oldford

  20. High interaction (much overlooked by researchers) • assume multiple displays • integrate computational resources • challenge is to design software to be simple, understandable, integrated and extensible Structuring Interactive Cluster Analysis R.W. Oldford

  21. Example: image analysis … find groups via intensity (contours and two small unusual structures revealed) Structuring Interactive Cluster Analysis R.W. Oldford

  22. Example: image analysis … other measurements may contain interesting structure Structuring Interactive Cluster Analysis R.W. Oldford

  23. Example: image analysis … identify new structure location in the original image Structuring Interactive Cluster Analysis R.W. Oldford

  24. Example: image analysis … mark new groups by colour (hue, preserving lightness in original image) Structuring Interactive Cluster Analysis R.W. Oldford

  25. Example: image analysis … explore relation between old and new groups via contours in the image itself Structuring Interactive Cluster Analysis R.W. Oldford

  26. humans Gorillas, orangutans chimps hominids Proconsul Africanus Example: 8 dimensions from teeth measurements on species (+ sex) Structuring Interactive Cluster Analysis R.W. Oldford

  27. Example: apes, hominids, modern humans • multiple and very different views • 3-d point clouds (of first 3 discriminant co-ordinates) • cases identified in a list • each point represented as a smooth curve by projecting it on a direction vector smoothly moving around the surface of an 8-d sphere • all linked via colour by cases being displayed • context helps • knowing the species encourages grouping • grouping based on context + the visual information • grouping is confirmed across different kinds of display Structuring Interactive Cluster Analysis R.W. Oldford

  28. Example: mutual support and shapes a 3-d projection Shape from all dimensions How many groups? Structuring Interactive Cluster Analysis R.W. Oldford

  29. Example: mutual support and shapes Groups found here Same in all dimensions? How many groups? Structuring Interactive Cluster Analysis R.W. Oldford

  30. Example: mutual support and shapes Observe effect here Split black group by shape How many groups? Structuring Interactive Cluster Analysis R.W. Oldford

  31. Example: mutual support and shapes Get new 3-d projection Coloured by shape Five groups corroborated Structuring Interactive Cluster Analysis R.W. Oldford

  32. Example: exploratory data analysis How many groups? Structuring Interactive Cluster Analysis R.W. Oldford

  33. Example: exploratory data analysis Choose data to cut away Explore the rest Distinguish groups Structuring Interactive Cluster Analysis R.W. Oldford

  34. Example: exploratory data analysis Bring data back Explore all together Some black with red? Focus on centre Structuring Interactive Cluster Analysis R.W. Oldford

  35. Example: exploratory data analysis Explore separately Mark group Discard new view Explore all together Two groups Structuring Interactive Cluster Analysis R.W. Oldford

  36. Interactive clustering • visual grouping • location, motion, shape, texture, ... • linking across displays • manual • selection • cases, variates, groups, ... • colouring • focus • immediate and incremental • context can be used to form groups • multiple partitions Structuring Interactive Cluster Analysis R.W. Oldford

  37. Automated clustering: typical software • resources dedicated to numerical computation • teletype interaction • runs to completion • graphical “output” • don’t always work so well (no universal solution) • confirm via exploratory data analysis Must be integrated with interactive methods Structuring Interactive Cluster Analysis R.W. Oldford

  38. Example: K-means clustering K = 2 groups Starting groups as shown have centre ball in one group K-means moves one point at a time to “improve” 2 groups Structuring Interactive Cluster Analysis R.W. Oldford

  39. Example: K-means clustering K = 2 groups Final groups shown maximize F-like statistic (between/within) Central ball is lost K-means poor for this data configuration Structuring Interactive Cluster Analysis R.W. Oldford

  40. Example: VERI Visual Empirical Regions of Influence join points if no third point falls in this region Visual Empirical Regions of Influence Structuring Interactive Cluster Analysis R.W. Oldford

  41. Example: VERI Visual Empirical Regions of Influence join points if no third point falls in this region Visual Empirical Regions of Influence Structuring Interactive Cluster Analysis R.W. Oldford

  42. Visual Empirical Regions of Influence • psychophysical experiments of human visual perception to join data points • very special circumstances (two lines of three equi-spaced points each) • works well on demonstration 2-d cases • extends to higher dimensions • two points are joined or not depending on their joint configuration with a third point • each third point examined forms a plane with the candidate pair and so VERI shape applies • works in high-d with published demonstration cases Structuring Interactive Cluster Analysis R.W. Oldford

  43. Example: VERI Each colour is a different group found by VERI. Central ball is lost. VERI fails for this data configuration (also for small perturbations of demonstration cases). There is no universal method, nor can there be. Structuring Interactive Cluster Analysis R.W. Oldford

  44. Example: VERI (with parameters) VERI algorithm, but parameterized now to shrink region size. Becomes minimal spanning tree in the limit (MST gets 2 groups here). Again. no universal method possible, but methods can be parameterized. Structuring Interactive Cluster Analysis R.W. Oldford

  45. Integrating automatic methods: Move about the space of partitions: Pa --> Pb --> Pc --> …. Which operators f f(Pa) --> Pb are of interest? Structuring Interactive Cluster Analysis R.W. Oldford

  46. Refine Need not be nested. Nesting produces hierarchy Reduce Structuring Interactive Cluster Analysis R.W. Oldford

  47. Reassign Structuring Interactive Cluster Analysis R.W. Oldford

  48. Refinement sequence: Begin with partition containing all points in one group. 1 Structuring Interactive Cluster Analysis R.W. Oldford

  49. -> 2 Refinement sequence: Refine partition to move to a new partition containing two groups. 1 This refinement was had by projecting all points onto the eigen-vector of the largest eigen value of the sample variance covariance matrix and splitting at the largest gap between projected points. Blue points are on the outer sphere. Structuring Interactive Cluster Analysis R.W. Oldford

  50. -> 2 -> 3 Refinement sequence: Refine partition (2) to move to a new partition containing three groups. 1 • Refinement move: • select group whose sample var-cov matrix has largest eigen-value • for that group, project and split as before. Green points are also on the outer sphere. Structuring Interactive Cluster Analysis R.W. Oldford

More Related