1 / 47

Topological characterization of an image dataset with Betti numbers and a generative model.

Topological characterization of an image dataset with Betti numbers and a generative model. Context. Multivariate data exploration Signals , images, …. Classical ML techniques Clustering : K- Means ; Gaussian Mixture Models -> convex clusters

jackie
Download Presentation

Topological characterization of an image dataset with Betti numbers and a generative model.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Topologicalcharacterization of an image datasetwith Betti numbers and a generative model. Maxime MAILLOT (Exalead) Michaël AUPETIT (CEA LIST) Gérard GOVAERT (UTC-CNRS) DataSense | 08-07-2014

  2. Context • Multivariate data exploration • Signals , images, …. • Classical ML techniques • Clustering: K-Means; Gaussian Mixture Models -> convex clusters • Dimension reduction: Self-OrganizingMaps, MDS, PCA -> Dime Reduct artefacts imposed by the representationspace • Topological information (fromunderlying structure) : • Number of connected components • Intrinsic dimension • Topological invariants (Betti numbers) DataSense | 08-07-2014

  3. Why topological information? • Cognition and topology • Neuronal encodingof topological information survivedDarwiniannaturalselectionshowing the importance of this information in our cognitive processes Retinotopicmap of a mouse [Hübener 2003] DataSense | 08-07-2014

  4. Why topological information? • Topology and visual perception • Gestalt psychological theory [1920] • The whole is more than summing the parts • Law of continuity, proximity, similarity Topologicalview Statisticalview Geometricalview Underlying structure Underlyingdensity Points location or underlyingshapes Descriptive model: sampleisenough, no hypothesis about the populationunderlying the data Predictive model: Our visual system instantlyprovides a topologicalmodel of the population DataSense | 08-07-2014

  5. Why topological information? • Mental map and topology • Topological invariants as an objectiverepresentation Objectivemap M of a building B Whateverradicallydifferent the perception process and experience of eachperson are, a topological invariant stillexistscommon to bothpersons’ mental models and the real building’smap: Theyshare the sameconnectedness Subjective map M1 of B Subjective mapM2 of B DataSense | 08-07-2014

  6. Why topological information? • Patterns reliability and topology • A large family of transformations • Reliability • The processing pipeline from data to decisionis more likely to be a homotopy • So topological information is more likely to survive to the distortions of the pipeline • Hencetopological information is a morereliable basis for decisionfacinguncertainty Betti numbers Intrinsic dimension Probabilitydensityfunctions Geometry • IsometriesSimilaritiesHomeomorphisms Homotopies U U U Initial space DataSense | 08-07-2014

  7. Somehints about topology • Topology in a nutshell • Whatis the differencebetween a mug and a doughnut? DataSense | 08-07-2014

  8. Somehints about topology • Topology in a nutshell • Whatis the differencebetween a mug and a doughnut? Taste is significantly different! DataSense | 08-07-2014

  9. Blue and brown1-cycles cannot collapse to eachother Theyform a homology group, the rank of whichis2 (b1=2) 1-cycles whichcannotcontract to a point 1-cyclewhichcan contractto a point Somehints about topology • Topological invariants • Twospaces have the sametopologyiffthey are homeomorphic to eachother, i.e.they are linkedthrough a continuousfunction H whoseinverse H-1isalsocontinuous. • Topology classifies spacesbased on theirtopologicalinvariants like the Bettinumbers Sensor space Sample of a robot’s trajectory Image of walls 1 and 2 In the robot-to-sensors distance space Measures Sensor 3 Topological Wall 1 Sensor 1 inference Wall 2 Sensor 2 # of connected components # of independent 1-cycles (tunnels) # of independent 2-cycles (cavities) (b0,b1,b2)= (1,2,1) DataSense | 08-07-2014

  10. From sets of points to Betti numbers • Simplex family • Simplex assembly • SIMPLICIAL COMPLEX 0-simplex 1-simplex 2-simplex 3-simplex DataSense | 08-07-2014

  11. From sets of points to Betti numbers • For any manifold V itexists a simplicialcomplexC whichishomeomorphic to V (C(V) is a triangulation of V) • Two triangulations may have the same Betti numberswhiletheir manifolds are not homeomorphic. Simplicialcomplex Computationaltopology Betti numbers DataSense | 08-07-2014

  12. R=11 10 8 8 8 d a c R=9 8 b 8 8 d a c R b From sets of points to Betti numbers • Vietori-Ripscomplex and Betti numbers (1,2,0) (1,0,0) (37,6,0) (N,0,0) (b0,b1,b2) • Topologicalpersistence and multiscaleanalytics= persistence of topological structure throughscale [Chazal] DataSense | 08-07-2014

  13. Restricted Delaunay complex • From manifold to triangulation [Edelsbrunner, Shah 1997] M1 M2 DataSense | 08-07-2014

  14. Restricted Delaunay complex • From manifold to triangulation • . [Edelsbrunner, Shah 1997] M1 M2 DataSense | 08-07-2014

  15. Restricted Delaunay complex • Alpha-shapes Moleculestopology [Edelsbrunner1994] Manifold = union of spheres Centered on the atoms’ core (alphasets the spheresradius DataSense | 08-07-2014

  16. Topologyrepresenting Networks • TopologyRepresenting Network [Martinetz, Schulten 1994] Connect 1stand 2ndNearest Neighbor prototype of each data: Competitive Hebbian Learning (CHL) DataSense | 08-07-2014

  17. Topologyrepresenting Networks • TopologyRepresenting Network [Martinetz, Schulten 1994] 2nd 1er Connect 1stand 2ndNearest Neighbor prototype of each data: Competitive Hebbian Learning (CHL) DataSense | 08-07-2014

  18. Topologyrepresenting Networks • TopologyRepresenting Network [Martinetz, Schulten 1994] 2nd 1er Connect 1stand 2ndNearest Neighbor prototype of each data: Competitive Hebbian Learning (CHL) DataSense | 08-07-2014

  19. Topologyrepresenting Networks • TopologyRepresenting Network [Martinetz, Schulten 1994] 2nd 1er Connect 1stand 2ndNearest Neighbor prototype of each data: Competitive Hebbian Learning (CHL) ROI = Order 2 Voronoi cells DataSense | 08-07-2014

  20. Topologyrepresenting Networks • TopologyRepresenting Network [Martinetz, Schulten 1994] 2nd 1er Connect 1stand 2ndNearest Neighbor prototype of each data: Competitive Hebbian Learning (CHL) ROI = Order 2 Voronoi cells DataSense | 08-07-2014

  21. Topologyrepresenting Networks No noise Order 2 Voronoicells Samplewithgaussian noise DataSense | 08-07-2014

  22. A Generative model approach • When a Statisticianmeets a Topologist… • What is the probability for a HEAD if you flip a coin cut in a Moebiusstrip? Moebius strip DataSense | 08-07-2014

  23. A Generative model approach • When a Statisticianmeets a Topologist… • What is the probability for a HEAD if you flip a coin cut in a Moebiusstrip? Moebius strip HEAD or TAIL? P( HEAD ) = ? DataSense | 08-07-2014

  24. A Generative model approach • When a Statisticianmeets a Topologist… • What is the probability for a HEAD if you flip a coin cut in a Moebiusstrip? Moebius strip HEAD or TAIL? P( HEADACHE ) = 1 DataSense | 08-07-2014

  25. Generative Graph [Gaillard 2010] • Statisticalgenerative model – Where the data come from? Topologicalinference from the sample to the population …fromwhich are drawnsampleswithunknownprobabilitydensity… Unknowngenerative manifoldswith possible differenttopology, different labels, and possiblyoverlapping… …corruptedwithunknown noise… …leadingto the actualdata observations. DataSense | 08-07-2014

  26. Generative Graph [Gaillard 2010] • Statisticalgenerative model – General hypotheses …fromwhich are drawnsampleswithunknownprobabilitydensity… Unknowngenerative manifolds … Unknowngenerative manifoldswith possible differenttopology, different labels, and possiblyoverlapping… …corruptedwithunknown noise… …leadingto the actualdata observations. DataSense | 08-07-2014

  27. Generative Graph [Gaillard 2010] • Statisticalgenerative model – Simplifiedhypotheses Unknowngenerative manifolds… …fromwhich are drawnsampleswithunknownprobabilitydensity… …corruptedwithunknownnoise… DataSense | 08-07-2014

  28. 1 0 p 1-p Delaunay graph of some prototypes with class label probability Uniformdensity over eachtopological component (vertices and edges) Gaussian noise withidentity covariance Generative Graph [Gaillard 2010] • GenerativeGaussian Graph (GGG) – Simplifiedhypotheses Unknowngenerative manifolds… …fromwhich are drawnsampleswithunknownprobabilitydensity… …corruptedwithunknownnoise… DataSense | 08-07-2014

  29. Generative Graph [Gaillard 2010] GGG: From data to topologicalsynthesis Delaunay Multivariate data GMM Topologicalsummary Likelihood Maximization (EM) Model selection (# vertices): BayesianInformation Criterion DataSense | 08-07-2014

  30. Generative simplicial complex [Maillot2012] • Generativesimplicesfamilly A g0 … (Pseudo-Monte Carlo estimation) DataSense | 08-07-2014

  31. Data sampledfrom a generativegaussian simplex d= 0 d= 1 d= 2 σ= 0.1 σ= 0.2 σ= 0.5 DataSense | 08-07-2014

  32. Generative simplicial complex Expectation-Maximization π1 < π2 < π3 < ………< πi < …… <πn DataSense | 08-07-2014 BIC max

  33. From data to Generativesimplicialcomplex DataSense | 08-07-2014

  34. From data to Generativesimplicialcomplex Protoypes location initializedwith GMM DataSense | 08-07-2014

  35. From data to Generativesimplicialcomplex Delaunay complexbuilt on top of the prototypes First the edges… DataSense | 08-07-2014

  36. From data to Generativesimplicialcomplex Delaunay complexbuilt on top of the prototypes First the edges… Then the surfaces… DataSense | 08-07-2014

  37. From data to Generativesimplicialcomplex • Likelihoodmaximization for dimension 1 components The p proportion of eachedgeisestimatedwith EM Edgeswithtoolow proportion do not contributesignificantly to the model (wrtBayesian Information Criterion), they are prunedfrom the model DataSense | 08-07-2014

  38. From data to Generativesimplicialcomplex • Likelihoodmaximization for dimension 2 components Proportions of both surfaces and remainingedges are estimatedwith EM, thenprunedwrt BIC DataSense | 08-07-2014

  39. From data to Generativesimplicialcomplex • Topologicalcleaning If a simplex survived, all itsfacets are pruned. DataSense | 08-07-2014

  40. Results (1/3) SPHERE (1,0,1,0…) TORE (1,2,1,0…) KLEIN BOTTLE (1,1,0…) DataSense | 08-07-2014

  41. Results (2/3) • Images data COIL-100 : • 100 objects in rotation eachrepresented by 72 images (5°) with 64x64 pixels (projected by PCA on the 71 first principal components) • O 2D simplices • Delaunay complexonlycomputed for 1D then 2D elements in the 71D space • Werecover a cycle structure DataSense | 08-07-2014

  42. Results (3/3) • Images data COIL-100 : • Expected Betti numbers (1,1,0 …) • (1,2,0 …) correspond to an 8 shape • The (1,n,0 …) shows thatmany faces of the objects look similar • (1,0,0,…) shows a rotatioal invariant object Example for (1,2,0,…) (like an 8) DataSense | 08-07-2014

  43. Conclusions • GSC: first generative model to extract Betti numbersfrom a data set • No meta-parameter to tune (EM + BIC) DataSense | 08-07-2014

  44. Perspectives • Topologicalanalysis for eachconnected component separately • Algorithmicimprovements (pseudo-monte-carlo, pruning…) • Link BIC optim al and Betti numbers • Deep Networks : how topological invariants couldbeexplicitelyencodedwithineach layer? DataSense | 08-07-2014

  45. Thankyou for your attention • MA, Learning Topology with the Generative Gaussian Graph and the EM algorithm. NIPS 2005 Conference proceeding, pp.83-90, 2006. • Gaillard Pierre, MA, Gérard Govaert. Learning topology of a labeled data set with the supervised generative Gaussian graph. Neurocomputing, 71(7-9): 1283-1299, Elsevier March 2008 • Maillot Maxime, MA, Gérard Govaert. Extraction of Betti numbers based on a generative model. ESANN 2012 • Maillot Maxime, MA, Gérard Govaert. The Generative Simplicial Complex to extract Betti numbers from unlabeled data. Workshop at NIPS 2012 • Questions? DataSense | 08-07-2014

  46. QUESTIONS • Pourquoi un modèle de bruit isovarié? -> pour la complexité du modèlesoitattrapée par le complexesimplicial et les nombres de Betti • Pourquoi les nombres de Betti? La connexiotésemblesuffire pour les applications ? Formeprise par les états d’un systèmedynamique (épilepsie / cas normal-alerte-catastrophe… ) pas de casréelmaismise au point d’un modèle/système de mesure. DataSense | 08-07-2014

  47. Suggestions: • - comparaisontopologie ND vstopologie 2D pour évaluationdistorsions de projections • - systèmedynamiquechangeant de forme et dont la formeindiquel’état (bon, alerte, mauvais) • - analyse/caractérisationtopologique de données • - contrôle de passage dans zone d’alerte (systèmedynamiquedont on observe l’étatbruité) on veutvérifierquel’on ne peut pas passer directement d’un état bon à un étatmauvais sans passer par l’étatd’alerte: extension du SGGG au cas des CS: trousdans la structure = fuite possible A CLARIFIER • - Cas de l’analyse de locuteurssur les lettres (triangle NSI2000): utiliser un locuteurcommesommet du GSC et positionner les autres par rapport à lui, détecter la forme des lettresprononcées NON NE MARCHE PAS la formeestsimilaire à unehomothétieprès DataSense | 08-07-2014

More Related