1 / 11

Columbia University Advanced Machine Learning & Perception – Fall 2006 Term Project

This term project explores the use of kernel PCA and KNN classification to identify the low-dimensional manifold of climate data sets and make predictions on the original space. The project focuses on monthly sea surface temperature data and discusses the results and conclusions.

mcfarlandl
Download Presentation

Columbia University Advanced Machine Learning & Perception – Fall 2006 Term Project

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Columbia University Advanced Machine Learning & Perception – Fall 2006 Term Project Nonlinear Dimensionality Reduction and K-Nearest Neighbor Classification Applied to Global Climate Data Carlos Henrique Ribeiro Lima New York – Dec/2006

  2. Outline • Goals • Motivation and Dataset • Methodology • Results • Low-Dimensional Manifold • KNN on Low-Dimensional Manifold • Conclusion

  3. 1. Goals • Use of kernel PCA based on Semidefinite Embedding to identify the low-dimensional, non-linear, manifold of climate data sets  identification of main modes of spatial variability; • Classification on the feature space  predictions on the original space (KNN method);

  4. 2. Motivation • Dataset of Monthly Sea Surface Temperature (SST) Huge economical and social impacts of extreme El Nino events (e.g. 1997)  Need of forecasting models!

  5. 2. Dataset • Monthly Sea Surface Temperature (SST) Data • from Jan/1856 to Dec/2005 • Latitudinal Band: 25oS-25oN • Grid with 599 cells; • Training data: Jan/1856 to Dec/1975 = 120 years • Testing set: Jan/1976 to Dec/2005 = 30 years • Input matrix: n = 1440 points m = 599 dimensions

  6. 3. Methodology • 1) Semidefinite Embedding (Code from K. Q. Weinberger) Semipositive definiteness Inner product centered on the origin Isometry - local distances of the input space are preserved on the feature space 2) KNN  Euclidian Distance 3) Probabilistic Forecasting  Skill Score (RPS)

  7. 4. Results Low-Dimensional Manifold

  8. 4. Results Labeling on the feature space

  9. 4. Results Forecasts – Testing Set KNN method and skill score E.g. March – 1997; 1) Want to predict the class of nino3 in Dec/1997  lead time = 9 months. 2) KNN on feature space (March:1856 to 1975); 3) Take classes and weights of the k neighbors; 4) Skill score.

  10. 4. Results Forecasts – Testing Set KNN method and skill score – El Nino of 1982 and 1997

  11. 5. Conclusions • Semidefinite Embedding performs well on the SST data (high dimensional  just 3 dimensions ~90%of exp. variance); • KNN method provides very good classification and forecasts; • Need to check sensibility to change in some parameters (# local neighbors, #KNN); • Plan to extend to other climate datasets; • Try other metrics, multivariate data, etc.

More Related