1 / 37

Valero Laparra Jesús Malo Gustavo Camps

Gaussianization based on Principal Components Analisys (GPCA): an easy tool for optimal signal processing. Valero Laparra Jesús Malo Gustavo Camps. INDEX. What? Why? How? Conclusions Toolbox. What?. Estimate multidimensional Probability Densities

taite
Download Presentation

Valero Laparra Jesús Malo Gustavo Camps

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Gaussianization based on Principal Components Analisys (GPCA):an easy tool for optimal signal processing. Valero Laparra Jesús Malo Gustavo Camps

  2. INDEX What? Why? How? Conclusions Toolbox

  3. What? • Estimate multidimensional Probability Densities • How the N-D data is distributed in the N-D space What to pay atention to! What is important from our data

  4. What?

  5. Why? • GENERIC OPTIMAL SOLUTIONS

  6. Why? • GENERIC OPTIMAL SOLUTIONS

  7. Why? • GENERIC OPTIMAL SOLUTIONS

  8. Why? • GENERIC OPTIMAL SOLUTIONS

  9. How? • PDF estimation through samples always asume a model. • HISTOGRAM: without assuming a functional model

  10. How? • X = [ -1.66 1.25 0.73 1.72 0.88 0.19 -0.81 0.42 -0.14 …]

  11. How? Nbins = √Nsamples • Problem: Number of bins estimation

  12. How? • Problem: “the curse of dimensionality” - Nb_total = Nb_dim ^N_dim - If we assume: Ns = Nb^2 - Ns = Nb^2*Nd

  13. How? • Problem: “the curse of dimensionality” Nb total = Nb dimension ^N dimensions e.g. • Assuming a minimum number of Nb = 11 bins • We need Ns = 11^2*Nd • If Nd = 1, Ns = 121 968 Bytes Nd = 2, Ns = 14641 117.128 Bytes Nd = 3, Ns = 1771561 14.172.488 Bytes Nd = 4, Ns = 214358881 1.714.871.048 Bytes Nd = 5, Ns = 25.937.000.000 HELP MEMORY Nd = 6, Ns = 3.138.400.000.000HELP MEMORY

  14. How? • From P(x) to P(y) (Gaussian) ? T

  15. How?

  16. How? Answer: GPCA MATLAB, MATLAB, WHAT A WONDERFUL WORLD

  17. How? Theoretical convergence Proof • Negentropy:

  18. How? OPEN ISSUE

  19. How? GAUSSIAN UNIQUE DISTRIBUTION WITH MARGINAL DISTRIBUTIONS GAUSSIANS AND INDEPENDENTS • Stop criterion: I (Xn) = ~ 0 NOTE THAT: Measuring Mutual Information

  20. How? GPCA Inverse NOTE THAT: Synthesis

  21. How? GPCA Jacobian

  22. CONCLUSIONS • The optimal solution of many problems involves the knoledge of the data pdf. • GPCA obtains a transform that convert any pdf in a Gaussian pdf. • It has an easy inverse. • It has an easy Jacobian. • This transform can be used to calculate the pdf of any data.

  23. GPCA toolbox (Matlab)3 examples Wiki-page Beta version • PDF estimation • Mutual Information Measures • Synthesis

  24. Basic toolbox • [datT Trans] = GPCA (dat, Nit, Perc) - dat = data matrix with [N dimensions x N samples] e.g. 100 samples from 2-D gaussian dat = [2 x 100] • Nit = Number of iterations • Perc = percentage of increase the pdf Range.

  25. Basic toolbox • Perc = percentage of increase the pdf range.

  26. Basic toolbox • [datT Trans] = auto_GPCA(dat) • [datT] = apply_GPCA(dat,Trans) • [dat] = inv_GPCA(datT,Trans) • [Px pT detJ JJ] = GPCA_probability(x0,Trans)

  27. Estimating PDF/manifold • [datT Trans] = auto_GPCA(dat) • [Px pT detJ JJ] = GPCA_probability (XX,Trans);

  28. Estimating PDF/manifold • [datT Trans] = auto_GPCA(dat) • [Px pT detJ JJ] = GPCA_probability (XX,Trans);

  29. Estimating PDF/manifold • [datT Trans] = auto_GPCA(dat) • [Px pT detJ JJ] = GPCA_probability (XX,Trans);

  30. Estimating PDF/manifold • PROBLEMS • Not always arrives to Gaussian • Pdf with clusters is more complicated • The Jacobian estimation is highly point-dependent • The derivative (in the Jacobian estimation) is much more irregular than the integral. • The pdf has to be estimated for each point

  31. Measuring Mutual Information • [datT Trans] = auto_GPCA(dat) • MI = abs(min(cumsum(cat(1,Trans.I)))));

  32. Measuring Mutual Information

  33. Measuring Mutual Information • PROBLEMS • Entropy estimators are not perfectly defined • More iterations, more error • As more complicated pdf, more error

  34. Synthesizing data • [datT Trans] = auto_GPCA(dat) • [dat2] = inv_GPCA( randn(Dim,Nsamples) , Trans); T1 Inv T1 T2 Inv T2

  35. Synthesizing data • [datT Trans] = auto_GPCA(dat) • [dat2] = inv_GPCA(randn(Dim,Nsamples),Trans);

  36. Synthesizing data • PROBLEMS • Not always arrive to a Gaussian • Little variations on the variance of the random data obtains very different results. • No information about features of the data in the transformed domain.

  37. Thanks for your time

More Related