identifying feature relevance using a random forest n.
Skip this Video
Loading SlideShow in 5 Seconds..
Identifying Feature Relevance Using a Random Forest PowerPoint Presentation
Download Presentation
Identifying Feature Relevance Using a Random Forest

Loading in 2 Seconds...

  share
play fullscreen
1 / 25
Download Presentation

Identifying Feature Relevance Using a Random Forest - PowerPoint PPT Presentation

andrew-mcdowell
175 Views
Download Presentation

Identifying Feature Relevance Using a Random Forest

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Identifying Feature Relevance Using a Random Forest Jeremy Rogers & Steve Gunn

  2. Overview • What is a Random Forest? • Why do Relevance Identification? • Estimating Feature Importance with a Random Forest • Node Complexity Compensation • Employing Feature Relevance • Extension to Feature Selection

  3. Random Forest • Combination of base learners using Bagging • Uses CART-based decision trees

  4. Random Forest (cont...) • Optimises split using Information Gain • Selects feature randomly to perform each split • Implicit Feature Selection of CART is removed

  5. Feature Relevance: Ranking • Analyse Features individually • Measures of Correlation to the target • Feature is relevant if: Assumes no feature interaction Fails to identify relevant features in parity problem

  6. Feature Relevance: Subset Methods • Use implicit feature selection of decision tree induction • Wrapper methods • Subset search methods • Identifying Markov Blankets • Feature is relevant if:

  7. Relevance Identification using Average Information Gain • Can identify feature interaction • Reliability dependant upon node composition • Irrelevant features give non-zero relevance

  8. Node Complexity Compensation • Some nodes are easier to split • Requires each sample to be weighted by some measure of node complexity • Data projected on to one-dimensional space • For Binary Classification:

  9. Unique & Non-Unique Arrangements • Some arrangements are reflections (non-unique) Some arrangements are symmetrical about their centre (unique)

  10. Node Complexity Compensation (cont…) Au - No. Unique Arrangements

  11. Information Gain Density Functions • Node Complexity improves measure of average IG • The effect is visible when examining the IG density functions for each feature • These are constructed by building a forest and recording the frequencies of IG values achieved by each feature

  12. Information Gain Density Functions • RF used to construct 500 trees on an artificial dataset • IG density functions recorded for each feature

  13. Employing Feature Relevance • Feature Selection • Feature Weighting • Random Forest uses a Feature Sampling distribution to select each feature. • Distribution can be altered in two ways • Parallel: Update during forest construction • Two-stage: Fixed prior to forest construction

  14. Parallel • Control update rate using confidence intervals. • Assume Information Gain values have normal distribution. Statistic has a Student’s t distribution with n-1 degrees of freedom Maintain most uniform distribution within confidence bounds

  15. Convergence Rates

  16. Results • 90% of data used for training, 10% for testing • Forests of 100 trees were tested and averaged over 100 trials

  17. Irrelevant Features • Average IG is the mean of a non-negative sample. • Expected IG of an irrelevant feature is non-zero. • Performance is degraded when there is a high proportion of irrelevant features.

  18. Expected Information Gain nL - No. examples in left descendant iL - No. positive examples in left descendant

  19. Expected Information Gain No. positive examples No. negative examples

  20. Bounds on Expected Information Gain • Upper can be approximated as Lower Bound is given by

  21. Irrelevant Features: Bounds • 100 trees built on artificial dataset • Average IG recorded and bounds calculated

  22. Friedman FS: CFS:

  23. Simple FS: CFS:

  24. Results • 90% of data used for training, 10% for testing • Forests of 100 trees were tested and averaged over 100 trials • 100 trees constructed for feature evaluation in each trial

  25. Summary • Node complexity compensation improves measure of feature relevance by examining node composition • Feature sampling distribution can be updated using confidence intervals to control the update rate • Irrelevant features can be removed by calculating their expected performance