1 / 20

How to solve a classification problem with 45 class levels using Random Forests Nicholas L. Crookston Gerald E. Rehfeld

How to solve a classification problem with 45 class levels using Random Forests Nicholas L. Crookston Gerald E. Rehfeldt US Forest Service, Rocky Mountain Research Station, Moscow, ID Western Mensurationists Missoula, MT June 20-22, 2010. Problem (we have 45 class levels, that’s a lot)

riva
Download Presentation

How to solve a classification problem with 45 class levels using Random Forests Nicholas L. Crookston Gerald E. Rehfeld

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. How to solve a classification problem with 45 class levels using Random Forests Nicholas L. Crookston Gerald E. Rehfeldt US Forest Service, Rocky Mountain Research Station, Moscow, ID Western Mensurationists Missoula, MT June 20-22, 2010

  2. Problem (we have 45 class levels, that’s a lot) Solution (we broke the problem into many subsets and formed an ensemble classifier) Results (very good, and we have a measure of extrapolation) Discussion Contents

  3. We desire to predict the biotic community as a function of climate. There are 45 biotic communities of interest. Brown, D.E., F. Reichenbacher, S.E. Franson. 1998. A classification of North American biotic communities. University of Utah Press, Salt Lake City. 141 pp. Problem

  4. In a 2006 effort on a subset of these communities, we had great results using: Breiman, Leo. 2001. Random Forests. Machine Learning 45:5-32. These results were published in: Rehfeldt, G.E., N.L. Crookston, M.V. Warwell and J.S. Evans. 2006. Empirical analyses of plant-climate relationships for the western United States. Int. J. Plant Sci. 167, 1123-1150. Problem

  5. A Random Forest (RF) is a set of classification or regression trees (CART). RF builds many trees, each one minimizes the classification error on a boot-strap sample of training data. 32 class-levels are supported, but when there are over 10, it uses a sampling scheme for each tree. Random Forests

  6. To classify a new observation: RF puts the new observation down each of the trees in the forest Each tree gives a classification, the classification is a vote. The forest chooses the class having the most votes over all the trees. Random Forests -- continued

  7. We have 45 class levels, over the limit in package randomForest 32! We desire to make predictions using future climates. RF might predict nonsense answers for future climatic conditions that are unique with respect to the training data. These are extrapolations we need to detect. Problem -- continued

  8. Training data: ~1.6 million obs, 35 climate variables from the Moscow climate model. We created 100 Random Forests. To create 1 of the forests: Sample 9 of 45 class levels (without replacement) Make a copy of the training data. Recode the biotic community in this copy; keep as is if code is one of the 9 in the sample, otherwise change the observed class to “other”. Solution -- Steps

  9. Fit each of the 100 RFs. To make a prediction: Put the new case down all 100 RFs, providing a vector of 100 predictions for the case. Count the number of predictions by biotic community code, including “other”. This gives a table of codes and counts that has 46 rows (one for each community code plus “other”). Steps -- continued.

  10. Divide the counts for each code by the number of RFs that contained the code. The ensemble classification is the class value corresponding to the maximum of these quotients. Steps -- continued.

  11. Example 1 (contemporary climate):

  12. Example 2 (future climate 1):

  13. Example 3 (future climate 2):

  14. We interpret predictions of other to indicate extrapolation. For this work, extrapolation indicates there is no biotic community in our study area that corresponds to the (new) climate. It is not a perfect indication of extrapolation. Results

  15. Application to Brown’s biotic communities All of North America Prediction of community as a function of climatic metrics Mapped at 0.0083333 arc degrees (~ 1km2) Results

  16. No analog: contemporary

  17. No analog: 2030

  18. No analog: 2090 Canadian Princeton Hadley

  19. The method can be use on larger problems and perhaps with CART-based methods other than Random Forests. One could add samples that are actually other, that is, not any of those of interest. Random Forests remains a very important tool in our tool set. Discussion / Conclusion

More Related