1 / 13

Principle Components & Neural Networks

Principle Components & Neural Networks. How I finished second in Mapping Dark Matter Challenge Sergey Yurgenson , Harvard University Pasadena, 2011. T o measure the ellipticity of 60,000 simulated galaxies. Scientific view. Kitching , 2011. Data mining view. Training set

Download Presentation

Principle Components & Neural Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Principle Components & Neural Networks How I finished second inMapping Dark Matter Challenge Sergey Yurgenson, Harvard University Pasadena, 2011

  2. To measure the ellipticity of 60,000 simulated galaxies Scientific view. Kitching, 2011

  3. Data mining view Training set 40,000 training examples Test set 60,000 examples P e P e e1= ? e2= ? e1=-0.13889 e2=0.090147 g: P -> e •Regression function g does not need to be justified in any scientific way! •Supervised learning is used to find g

  4. Neural Network e1=-0.13889 e2=0.090147 => => Matlab RMSE=0.01779 Too many inputs parameters. Many parameters are nothing more than noise. Slow training Result is not very good Reduce number of parameters Make parameters “more meaningful”

  5. Principle components to reduce number of input parameters Neural Network with PC as inputs : RMSE~0.0155

  6. Calculate center of mass with threshold. Center pictures using spline interpolation. Recalculate principle components Fine dune center position using amplitude of antisymmetrical components Centered Original Implicit use of additional information about data set: 2D matrixes are images of objects Objects have meaningful center.

  7. Principle Components after center recalculation

  8. Principle components - stars

  9. Components # 2 and # 3 Color – 2theta Color – (a-b)/(a+b) e1=[(a-b)/(a+b)]cos(2theta) e2=[(a-b)/(a+b)]sin(2theta) Linear regression using only components 2,3 => RMSE~0.02

  10. •Neural Network: 38 (galaxies PC) + 8 (stars PC) inputs 2 Hidden Layers -12 neurons (linear transfer function) and 8 neurons(sigmoid transfer function) 2 outputs – e1 and e2 as targets 80% random training subset, 20% validation subset •Multiple trainings with numerous networks achieving training RMSE<0.015 •Typical test RMSE =0.01517 – 0.0152 •Small score improvement by combining prediction of many networks (simple mean): Combination of multiple networks, training RMSE ~0.0149 public RMSE ~0.01505-0.01509 private RMSE ~0.01512-0.01516 Benefit of network combination is ~0.00007-0.0001 •Best submission – mean of 35 NN predictions

  11. Training set Test set std=0.01499 std=0.01518

  12. Questions: Method is strongly data depended. How method will perform for more diverse data set and real data ? Is there a place for this kind of methods in cosmology?

More Related