1 / 22

Clustering Solutions FINAL Exploratory Run Full 10’ Resolution – 41,311 samples

Clustering Solutions FINAL Exploratory Run Full 10’ Resolution – 41,311 samples. Michael A. Lindgren EWHALE Laboratory Institute of Arctic Biology University of Alaska Fairbanks February 11, 2011. About This Run….

cachet
Download Presentation

Clustering Solutions FINAL Exploratory Run Full 10’ Resolution – 41,311 samples

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Clustering SolutionsFINAL Exploratory RunFull 10’ Resolution – 41,311 samples Michael A. Lindgren EWHALE Laboratory Institute of Arctic Biology University of Alaska Fairbanks February 11, 2011

  2. About This Run… • This “FINAL” exploratory run, refers to the decision of which clustering level the group will choose for the final Biome Shift Analysis. • I was able to modify the R code to pass a very large proximity matrix created in RandomForests to the PAM clustering algorithm, where all 10’ resolution samples were included. • The clustering levels I am showing for at least the preliminary decision making about the optimal number are 5, 10, 15, 20, 25, & 30. • Also included are silhouette plots for each cluster level.

  3. Silhouette Plots • The silhouette value for each point is a measure of how similar that point is to points in its own cluster compared to points in other clusters, and ranges from -1 to +1. It is defined as: S(i) = (min(b(i,:),2) - a(i)) ./ max(a(i),min(b(i,:))) • where a(i) is the average distance from the ithpoint to the other points in its cluster, and b(i,k) is the average distance from the ith point to points in another cluster k. *From MathWorks website, developers of Matlab.  See document I have attached with this Presentation, which discusses the Silhouette Plots as a metric of deciding when an acceptable cluster solution is achieved.

  4. Silhouette Plots

  5. 5 Clusters Returned

  6. 5 Clusters Returned

  7. 5 Clusters Returned

  8. 10Clusters Returned

  9. 10Clusters Returned

  10. 10 Clusters Returned

  11. 15 Clusters Returned

  12. 15 Clusters Returned

  13. 15 Clusters Returned

  14. 20 Clusters Returned

  15. 20Clusters Returned

  16. 20 Clusters Returned

  17. 25 Clusters Returned

  18. 25 Clusters Returned

  19. 25 Clusters Returned

  20. 30 Clusters Returned

  21. 30 Clusters Returned

  22. 30 Clusters Returned

More Related