Misha pesenson isaac pesenson bruce mccollum california institute of technology temple university
Download
1 / 28

Misha Pesenson, Isaac Pesenson, Bruce McCollum California Institute of Technology, Temple University - PowerPoint PPT Presentation


  • 418 Views
  • Uploaded on

Information Visualization, Nonlinear Dimensionality Reduction and Sampling for Large and Complex Data Sets. Misha Pesenson, Isaac Pesenson*, Bruce McCollum California Institute of Technology, *Temple University. Acknowledgment. We would like to thank Dr. Mike Egan for his support

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Misha Pesenson, Isaac Pesenson, Bruce McCollum California Institute of Technology, Temple University' - urbano


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Misha pesenson isaac pesenson bruce mccollum california institute of technology temple university l.jpg

Information Visualization, Nonlinear Dimensionality Reduction and Sampling

for Large and Complex Data Sets

Misha Pesenson, Isaac Pesenson*, Bruce McCollum

California Institute of Technology, *Temple University

215th AAS Meeting, Washington DC


Acknowledgment l.jpg
Acknowledgment Reduction and Sampling

  • We would like to thank Dr. Mike Egan for his support

    This work was carried out at the SSC, Caltech and supported by

  • The National Geospatial-Intelligence Agency,

    Grant # HM1582-08-1-0019

215th AAS Meeting, Washington DC


Motivation l.jpg
Motivation Reduction and Sampling

  • The Data Big Bang

  • The Expanding Digital Universe

  • Inflationary Epoch

215th AAS Meeting, Washington DC


Motivation cont l.jpg
Motivation Reduction and Sampling (cont.)

  • Data is now produced faster than it can be meaningfully analyzed

  • Modern data are complex - dozens or hundreds of useful parameters associated with each astronomical object

    • LSST: The ten-year survey will result in tens of petabytes of image and catalog data and will require ~250 TFlops of processing to reduce.

    • A discussion related to LSST can be found in: The Spectrum of LSST Data Analysis Challenges: Kiloscale to Petascale, 2010, by T. Loredo, G. Babu, K. Borne, E. Feigelson, A. Gray, 215th AAS

215th AAS Meeting, Washington DC


Motivation cont5 l.jpg
Motivation Reduction and Sampling (cont.)

  • To capitalize on the opportunities provided by these data sets one needs to be able to organize, analyze and visualize them

  • Traditional methods are often inadequate not merely because of the size in bytes of the data sets, but also because of the complexity of modern data sets

  • To be successful, these approaches must extend beyond traditional scientific analysis and information visualization

215th AAS Meeting, Washington DC


Motivation cont6 l.jpg
Motivation Reduction and Sampling (cont.)

  • Moreover, to detect the expected and discover the unexpected in massive data sets requires a synergistic approach that utilizes recent advances in:

    • Statistics

    • Applied mathematics

    • Computer science

    • Artificial intelligence

    • Machine learning

    • Knowledge representation

    • Cognitive and perceptual sciences

    • Decision sciences, and more

215th AAS Meeting, Washington DC


Motivation cont7 l.jpg
Motivation Reduction and Sampling (cont.)

  • Valuable results pertaining to these problems are mostly to be found only in the publications outside of astronomy

  • There is a big gap between applied mathematics, artificial intelligence and computer science on the one side and astronomy on the other

215th AAS Meeting, Washington DC


Goals of this presentation l.jpg
Goals of This Presentation Reduction and Sampling

  • To attract attention of the astronomical community to the aforementioned gap

  • To help bridge this gap by briefly reviewing the some of the advanced methods

  • “To increase the general awareness and avoidance of unprincipled data analysis methods” (Xiao Li Meng, 2009, Desired and Feared—What Do We Do Now and Over the Next 50 Years?, American Statistician, v. 63, 3, 202-210).

215th AAS Meeting, Washington DC


Complex data spectral imaging l.jpg
Complex Data: Spectral Imaging Reduction and Sampling

224 spectral channels

215th AAS Meeting, Washington DC


Slide10 l.jpg

Astronomical Data Types and Approaches to their Representation and Processing

215th AAS Meeting, Washington DC


Scientific visualization vs illustrative visualization l.jpg
Scientific Visualization vs. Representation and Processing Illustrative Visualization

  • Scientific Visualization (SV) does not simply reproduce visible things, but makes the things visible

  • SV enables extraction of meaningful patterns from multiparametric data sets

215th AAS Meeting, Washington DC


The curse of dimensionality and dimension reduction dr l.jpg
The Curse of Dimensionality Representation and Processing and Dimension Reduction (DR)

  • Extraction and Visualization of meaningful structures from multiparametric, high-dimensional data sets require an accurate low-dimensional representation of data

  • DR is motivated by the fact that the more we are able to reduce the dimensionality of a data set, the more regularities (correlations) we have found in it and therefore, the more we have learned from the data

    • Pesenson M., Pesenson I., McCollum B., 2010, “The Data Big Bang and the Expanding Digital Universe: High-Dimensional, Complex and Massive Data Sets in an Inflationary Epoch”, Advances in Astronomy, special issue on Robotic Astronomy (accepted)

215th AAS Meeting, Washington DC


Dimension reduction cont l.jpg
Dimension Reduction (cont.) Representation and Processing

  • Greatly increases computational efficiency of machine

    learning algorithms

  • Improves statistical inference

  • Enables effective scientific visualization and

    classification

215th AAS Meeting, Washington DC


Dimension reduction linear data pca l.jpg
Dimension Reduction: Representation and Processing “Linear” Data, PCA

If the data are mainly confined to an almost linear low-dimensional subspace,

then simple linear methods such as principal component analysis (PCA) can be used

to discover the subspace and estimate its dimensionality

215th AAS Meeting, Washington DC


Limitations of linear methods l.jpg
Limitations of Linear Methods Representation and Processing

  • Linear methods such as PCA have a serious drawback in that they do not explicitly consider the structure of the manifold on which the data may possibly reside

  • PCA is intrinsically linear, so if data points form a nonlinear manifold, then obviously, there is no rotation & shift of the axis (this is what a linear transform like PCA provides) that can “unfold” such a manifold as the one on the next slide:

215th AAS Meeting, Washington DC


Data laying on manifolds l.jpg
Data Laying on Manifolds Representation and Processing

Formally applying geometrically linear methods would produce a complete misrepresentation of the data

215th AAS Meeting, Washington DC


Data laying on manifolds noise balasubramanian schwartz 2002 l.jpg
Data Laying on Manifolds + Noise Representation and Processing(Balasubramanian, Schwartz 2002 )

  • The practical usage of dimension reduction demands:

  • Representation of measurement errors in high-dimensional instrument calibration

    • Connors A., van Dyk D., Freeman P., Kashyap V., Siemiginowska A., et al. 2008

  • Careful improvement of signal-to-noise ratio without smearing essential features

    • Pesenson M., Roby W., McCollum, 2008

215th AAS Meeting, Washington DC


Handling geometrically nonlinear data l.jpg
Handling Geometrically Nonlinear Data Representation and Processing

  • The modern approach to multidimensional images or data sets is to approximate them by graphs or Riemannian manifolds

  • Next, after constructing a weighted graph, one can introduce the corresponding combinatorial Laplace operator

  • Belkin M., Niyogi P., 2005; Coifman R., Lafon S., 2006

  • Application to astronomy: Richards J., Freeman P., Lee A., & Schafer C., 2009

215th AAS Meeting, Washington DC


Nonlinear dimension reduction as an approach to nonlinear data l.jpg
Nonlinear Dimension Reduction as Representation and Processing an Approach to Nonlinear Data

  • The eigenfunctions and eigenvalues of the Laplacian form a basis, thus allowing one to develop a harmonic or Fourier analysis on graphs

  • This set of basis functions captures patterns intrinsic to a particular state space

  • Finds a lower-dimensional representation of high-dimensional data without losing a significant amount of information

215th AAS Meeting, Washington DC


Nonlinear dimension reduction and harmonic analysis on manifolds and graphs l.jpg
Nonlinear Dimension Reduction and Representation and Processing Harmonic Analysis on Manifolds and Graphs

  • We have devised innovative algorithms for nonlinear data dimension reduction and data compression:

    • enable one to overcome PCA’s limitations for handling nonlinear data manifolds

    • allow one to deal effectively with:

      1) missing observations

      2) partial sky coverage

      3) non-regular sampling

      For details:

  • Pesenson I., 2009, J. of Geometric Analysis, 19 (2), 390;

  • Pesenson I., Pesenson M., 2010, J. of Math. Analysis and Applications, accepted;

  • Pesenson I., Pesenson M., 2010, J. of Fourier Analysis and Applications, accepted

  • Pesenson M., Pesenson I., McCollum B., 2010, Advances in Astronomy, accepted

215th AAS Meeting, Washington DC


Visualization multispectral l.jpg
Visualization - Multispectral Representation and Processing

From a set of images obtained at multiple wavebands, effective dimension reduction provides a comprehensible, information-rich single image with minimal information

loss and statistical details, unlike a simple coadding with arbitrary, empirical weights

215th AAS Meeting, Washington DC


Manifold valued data and data laying on manifolds l.jpg
Manifold-Valued Data and Data Laying on Manifolds Representation and Processing

  • Application:

    • Cosmic Microwave Background (CMB)

      • Gorski K., et al. 2005

    • Solar Astrophysics

  • A powerful approach to the problem is based on Needlets - second generation spherical wavelets

    • Geller D., & Marinucci D., 2008

215th AAS Meeting, Washington DC


Manifold valued data and data laying on manifolds cont l.jpg
Manifold-Valued Data and Data Laying Representation and Processing on Manifolds (cont.)

  • Important properties of needlets that are not shared by other spherical wavelet constructions:

    • do not rely on any kind of tangent plane approximation;

    • have good localization properties in both pixel and harmonic space;

    • Needlet coefficients are asymptotically uncorrelated at any fixed angular distance (which makes their use in statistical procedures very promising)

  • Pesenson, I., 2006, Integral Geometry and Tomography, Contemporary Mathematics, 405, 135-148, American Mathematical Society;

  • Geller D., Pesenson I., 2010, Tight Frames and Besov Spaces on Compact Homogeneous Manifolds, J. of Geometric Analysis (accepted)

215th AAS Meeting, Washington DC


Unsupervised manifold learning and information visualization l.jpg
Unsupervised Manifold Learning Representation and Processing and Information Visualization

  • Manifold Learning and Visualization based on Nonlinear Dynamics

  • One needs to distinguish between geometrically nonlinear data and nonlinear methods of analysis

215th AAS Meeting, Washington DC


Unsupervised manifold learning a nonlinear approach l.jpg
Unsupervised Manifold Learning – Representation and Processing A Nonlinear Approach

  • Approximating a multidimensional image or a data set by a graph and associating a nonlinear dynamical system with each node enables us to unify the three seemingly unrelated tasks:

    • image segmentation,

    • unsupervised learning

    • data visualization

215th AAS Meeting, Washington DC


Slide26 l.jpg
Testing the Algorithm: a Simulated 3D set of a 10 Representation and Processing3 uniformly distributed random points with a double-diamond pattern

  • Left and middle: two screen shots from a running animation – each point in the set oscillates (in this case in 3 dimensions) with its own, random frequency

  • Right: synchronization made the points that are connected with high-weight edges oscillate in-phase thus allowing to reveal the pattern visually or by automatically selecting in-phase oscillating points and highlighting the pattern in red

  • Pesenson M., Pesenson I., McCollum B., 2010, Advances in Astronomy, (accepted).

  • Pesenson M., Pesenson I. 2010, Image Segmentation, Unsupervised Manifold Learning and

  • Information Visualization: A Unified Approach Based on Nonlinear Dynamics (submitted).

215th AAS Meeting, Washington DC


Conclusions l.jpg
Conclusions Representation and Processing

  • Many important challenges have been identified by various authors and presentations

  • Different groups have already been working on some of them the problems:

    • The Center for Astrostatistics at PSU (E. Feigelson, G. Babu)

    • BIPS at Cornell (T. Loredo)

    • InCA at CMU (C. Schafer et al.)

    • SAMSI-SaFeDe Collaboration (V. Kashyap et al.)

    • Caltech (M. Pesenson et al.)

    • Caltech (G. Djorgovski et al.)

    • AstroNeural collaboration (G. Longo et al.)

    • Georgia Tech (A. Gray et al.)

    • GMU (K. Borne et al.)

    • IIC at Harvard (A. Goodman et al.)

215th AAS Meeting, Washington DC


Conclusions cont l.jpg
Conclusions (cont.) Representation and Processing

  • The concepts and approaches described in this presentation also contribute tothe actual steps in creating needed novel approaches and algorithms

  • All the described efforts when combined together will enable effective automated analysis and processing of giant, complex data sets such as LSST

215th AAS Meeting, Washington DC


ad