1 / 77

Decision Trees

Decision Trees. Advanced Statistical Methods in NLP Ling572 January 10, 2012. Information Gain. InfoGain (S,A): expected reduction in entropy due to A. Information Gain. InfoGain (S,A): expected reduction in entropy due to A. Information Gain.

dakota
Download Presentation

Decision Trees

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Decision Trees Advanced Statistical Methods in NLP Ling572 January 10, 2012

  2. Information Gain • InfoGain(S,A): expected reduction in entropy due to A

  3. Information Gain • InfoGain(S,A): expected reduction in entropy due to A

  4. Information Gain • InfoGain(S,A): expected reduction in entropy due to A

  5. Information Gain • InfoGain(S,A): expected reduction in entropy due to A • Select A with max InfoGain • Resulting in lowest average entropy

  6. Fraction of samples down branch i Disorder of class distribution on branch i Computing Average Entropy |S| instances Branch 2 Branch1 Sa2a Sa2b Sa1a Sa1b

  7. Sunburn Example

  8. Hair Color Height Lotion Weight Picking a Test Brown Blonde Tall Short Red Average Alex:N Annie:B Katie:N Sarah:B Emily:B John:N Sarah: B Dana: N Annie: B Katie: N Alex: N Pete: N John: N Dana:N Pete:N Emily: B Yes No Heavy Light Average Sarah:B Annie:B Emily:B Pete:N John:N Dana:N Alex:N Katie:N Dana:N Alex:N Annie:B Emily:B Pete:N John:N Sarah:B Katie:N

  9. Entropy in Sunburn Example

  10. Entropy in Sunburn Example S = [3B,5N]

  11. Entropy in Sunburn Example S = [3B,5N]

  12. Entropy in Sunburn Example S = [3B,5N] Hair color= 0.954-(4/8(-2/4 log 2/4 - 2/4log2/4) + 1/8*0 + 3/8 *0) = 0.954- 0.5 = 0.454 Height = 0.954 - 0.69= 0.264 Weight = 0.954 - 0.94= 0.014 Lotion = 0.954 - 0.61= 0.344

  13. Height Lotion Weight Picking a Test Tall Short Average Annie:B Katie:N Sarah:B Dana:N Yes No Heavy Light Average Sarah:B Annie:B Dana:N Katie:N Dana:N Annie:B Sarah:B Katie:N

  14. Entropy in Sunburn Example S=[2B,2N] Height = 1-2/4(-1/2log1/2-1/2log1/2) + 1/4*0+1/4*0 = 1- 0.5 =0.5 Weight = 1-2/4(-1/2log1/2-1/2log1/2) +2/4(-1/2log1/2-1/2log1/2) = 1 Lotion = 1- 0 = 1

  15. Building Decision Trees with Information Gain • Until there are no inhomogeneous leaves

  16. Building Decision Trees with Information Gain • Until there are no inhomogeneous leaves • Select an inhomogeneous leaf node

  17. Building Decision Trees with Information Gain • Until there are no inhomogeneous leaves • Select an inhomogeneous leaf node • Replace that leaf node by a test node creating subsets that yield highest information gain

  18. Building Decision Trees with Information Gain • Until there are no inhomogeneous leaves • Select an inhomogeneous leaf node • Replace that leaf node by a test node creating subsets that yield highest information gain • Effectively creates set of rectangular regions • Repeatedly draws lines in different axes

  19. Alternate Measures • Issue with Information Gain:

  20. Alternate Measures • Issue with Information Gain: • Favors features with more values • Option:

  21. Alternate Measures • Issue with Information Gain: • Favors features with more values • Option: • Gain Ratio

  22. Alternate Measures • Issue with Information Gain: • Favors features with more values • Option: • Gain Ratio • Sa : elements of S with value A=a

  23. Overfitting • Overfitting: • Model fits the training data TOO well • Fits noise, irrelevant details

  24. Overfitting • Overfitting: • Model fits the training data TOO well • Fits noise, irrelevant details • Why is this bad?

  25. Overfitting • Overfitting: • Model fits the training data TOO well • Fits noise, irrelevant details • Why is this bad? • Harms generalization • Fits training data too well, fits new data badly

  26. Overfitting • Overfitting: • Model fits the training data TOO well • Fits noise, irrelevant details • Why is this bad? • Harms generalization • Fits training data too well, fits new data badly • For model m, training_error(m), D_error(m) – D=all data

  27. Overfitting • Overfitting: • Model fits the training data TOO well • Fits noise, irrelevant details • Why is this bad? • Harms generalization • Fits training data too well, fits new data badly • For model m, training_error(m), D_error(m) – D=all data • If overfit, for another model m’, • training_error(m) < training_error(m’), but • D_error(m) > D_error(m’)

  28. Avoiding Overfitting • Strategies to avoid overfitting:

  29. Avoiding Overfitting • Strategies to avoid overfitting: • Early stopping:

  30. Avoiding Overfitting • Strategies to avoid overfitting: • Early stopping: • Stop when InfoGain < threshold • Stop when number of instances < threshold • Stop when tree depth > threshold • Post-pruning

  31. Avoiding Overfitting • Strategies to avoid overfitting: • Early stopping: • Stop when InfoGain < threshold • Stop when number of instances < threshold • Stop when tree depth > threshold • Post-pruning • Grow full tree and remove branches • Which is better?

  32. Avoiding Overfitting • Strategies to avoid overfitting: • Early stopping: • Stop when InfoGain < threshold • Stop when number of instances < threshold • Stop when tree depth > threshold • Post-pruning • Grow full tree and remove branches • Which is better? • Unclear, both used. • For some applications, post-pruning better

  33. Post-Pruning • Divide data into • Training set: used to build the original tree • Validation set: used to perform pruning

  34. Post-Pruning • Divide data into • Training set: used to build the original tree • Validation set: used to perform pruning • Build decision tree based on training data

  35. Post-Pruning • Divide data into • Training set: used to build the original tree • Validation set: used to perform pruning • Build decision tree based on training data • Until pruning does not reduce validation set performance • Compute perf. for pruning each nodes (& its children) • Greedily remove nodes that do not reduce VS performance

  36. Post-Pruning • Divide data into • Training set: used to build the original tree • Validation set: used to perform pruning • Build decision tree based on training data • Until pruning does not reduce validation set performance • Compute perf. for pruning each nodes (& its children) • Greedily remove nodes that do not reduce VS performance • Yields smaller tree with best performance

  37. Performance Measures • Compute accuracy on:

  38. Performance Measures • Compute accuracy on: • Validation set • k-fold cross-validation

  39. Performance Measures • Compute accuracy on: • Validation set • k-fold cross-validation • Weighted classification error cost: • Weight some types of errors more heavily

  40. Performance Measures • Compute accuracy on: • Validation set • k-fold cross-validation • Weighted classification error cost: • Weight some types of errors more heavily • Minimum description length:

  41. Performance Measures • Compute accuracy on: • Validation set • k-fold cross-validation • Weighted classification error cost: • Weight some types of errors more heavily • Minimum description length: • Favor good accuracy on compact models • MDL = error(tree) + model_size(tree)

  42. Rule Post-Pruning • Convert tree to rules

  43. Rule Post-Pruning • Convert tree to rules • Prune rules independently

  44. Rule Post-Pruning • Convert tree to rules • Prune rules independently • Sort final rule set

  45. Rule Post-Pruning • Convert tree to rules • Prune rules independently • Sort final rule set • Probably most widely used method (toolkits)

  46. Modeling Features • Different types of features need different tests • Binary: Test branches on

  47. Modeling Features • Different types of features need different tests • Binary: Test branches on true/false • Discrete: Branches

  48. Modeling Features • Different types of features need different tests • Binary: Test branches on true/false • Discrete: Branches for each discrete value • Continuous?

  49. Modeling Features • Different types of features need different tests • Binary: Test branches on true/false • Discrete: Branches for each discrete value • Continuous? • Need to discretize

  50. Modeling Features • Different types of features need different tests • Binary: Test branches on true/false • Discrete: Branches for each discrete value • Continuous? • Need to discretize • Enumerate all values

More Related