Principle Components & Neural Networks. How I finished second in Mapping Dark Matter Challenge Sergey Yurgenson , Harvard University Pasadena, 2011. T o measure the ellipticity of 60,000 simulated galaxies. Scientific view. Kitching , 2011. Data mining view. Training set
Principle Components & Neural Networks
How I finished second inMapping Dark Matter Challenge
Sergey Yurgenson, Harvard University
Data mining view
40,000 training examples
g: P -> e
•Regression function g does not need to be justified in any scientific way!
•Supervised learning is used to find g
Too many inputs parameters.
Many parameters are nothing more than noise.
Result is not very good
Reduce number of parameters
Make parameters “more meaningful”
Principle components to reduce number of input parameters
Neural Network with PC as inputs : RMSE~0.0155
Calculate center of mass with threshold.
Center pictures using spline interpolation.
Recalculate principle components
Fine dune center position using amplitude of antisymmetrical components
Implicit use of additional information about data set:
2D matrixes are images of objects
Objects have meaningful center.
Principle Components after center recalculation
Principle components - stars
Components # 2 and # 3
Color – 2theta
Color – (a-b)/(a+b)
Linear regression using only components 2,3 => RMSE~0.02
38 (galaxies PC) + 8 (stars PC) inputs
2 Hidden Layers -12 neurons (linear transfer function) and 8 neurons(sigmoid transfer function)
2 outputs – e1 and e2 as targets
80% random training subset, 20% validation subset
•Multiple trainings with numerous networks achieving training RMSE<0.015
•Typical test RMSE =0.01517 – 0.0152
•Small score improvement by combining prediction of many networks (simple mean):
Combination of multiple networks, training RMSE ~0.0149
public RMSE ~0.01505-0.01509
private RMSE ~0.01512-0.01516
Benefit of network combination is ~0.00007-0.0001
•Best submission – mean of 35 NN predictions
Method is strongly data depended. How method will perform for more diverse data set and real data ?
Is there a place for this kind of methods in cosmology?