ripper fast effective rule induction n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
RIPPER Fast Effective Rule Induction PowerPoint Presentation
Download Presentation
RIPPER Fast Effective Rule Induction

Loading in 2 Seconds...

play fullscreen
1 / 31
Thomas

RIPPER Fast Effective Rule Induction - PowerPoint PPT Presentation

152 Views
Download Presentation
RIPPER Fast Effective Rule Induction
An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. RIPPERFast Effective Rule Induction Machine Learning 2003 Merlin Holzapfel & Martin Schmidt Mholzapf@uos.deMartisch@uos.de

  2. Rule Sets - advantages • easy to understand • usually better than decision Tree learners • representable in first order logic • > easy to implement in Prolog • prior knowledge can be added

  3. Rule Sets - disadvantages • scale poorly with training set size • problems with noisy data • likely in real-world data • goal: • develop rule learner that is efficient on noisy data • competitive with C4.5 / C4.5rules

  4. Problem with Overfitting • overfitting also handles noisy cases • underfitting is too general • solution pruning: • reduced error pruning (REP) • post pruning • pre pruning

  5. Post Pruning (C4.5) • overfit & simplify • construct tree that overfits • convert tree to rules • prune every rule separately • sort rules according accuracy • consider order when classifying • bottom - up

  6. Pre pruning • some examples are ignored during concept generation • final concept does not classify all training data correctly • can be implemented in form of stopping criteria

  7. Reduced Error Pruning • seperate and conquer • split data in training and validation set • construct overfitting tree • until pruning reduces accuracy • evaluate impact on validation set of pruning a rule • remove rule so it improves accuracy most

  8. Time Complexity • REP has a time complexity of O(n4) • initial phase of overfitting alone has a complexity of O(n²) • alternative concept Grow: • faster in benchmarks • time complexity still O(n4) with noisy data

  9. Incremental Reduced Error Pruning - IREP • by Fürnkranz & Widmer (1994) • competitive error rates • faster than REP and Grow

  10. How IREP Works • iterative application of REP • random split of sets  bad split has negative influence (but not as bad as with REP) • immediately pruning after a rule is grown (top-down approach)  no overfitting

  11. Cohens IREP Implementation • build rules until new rule results in too large error rate • divide data (randomly) into growing set(2/3) and pruning set(1/3) • grow rule from growing set • immediately prune rule • Delete final sequence of conditions • delete condition that maximizes function v until no deletion improves value of v • add pruned rule to ruleset • delete every example covered by rule (p/n)

  12. Cohens IREP - Algorithm

  13. IREP and Multiple Classes • order classes according to increasing prevalence (C1,....,Ck) • find rule set to separate C1 from other classes IREP(PosData=C1,NegData=C2,...,Ck) • remove all instances learned by rule set • find rule set to separate C2 from C3,...,Ck ... • Ck remains as default class

  14. IREP and Missing Attributes • handle missing attributes: • for all tests involving A • if attribute A of an instance is missing test fails

  15. Differences Cohen <> Original • pruning: final sequence <> single final condition • stopping condition: error rate 50% <> accuracy(rule) < accuracy(empty rule) • application: missing attributes, numerical variables, multiple classes <> two-class problems

  16. Time Complexity IREP: O(m log² m),m = number of examples (fixed number of classification noise)

  17. 37 Benchmark Problems

  18. Generalization Performance • IREP performs worse on benchmark problems than C4.5rules • won-lost-tie ratio: 11-23-3 • error ratio • 1.13 excluding mushroom • 1.52 including mushroom

  19. Improving IREP • three modifications: • alternative metric in pruning phase • new stopping heuristics for rule adding • post pruning of whole rule set (non-incremental pruning)

  20. the Rule-Value Metric • old metric not intuitive R1: p1 = 2000, n1 = 1000 R2: p1 = 1000, n1 = 1 metric preferes R1 (fixed P,N) leads to occasional failure to converge • new metric (IREP*)

  21. Stopping Condition • 50%-heuristics often stops too soon with moderate sized examples • sensitive to the ‘small disjunct problem‘ • solution: • after a rule is added, the total description length of rule set and missclassifications (DL=C+E) • If DL is d bits larger then the smallest length so far stop (min(DL)+d<DLcurrent) • d = 64 in Cohen‘s implementation  MDL (Minimal Description Length) heuristics

  22. IREP* • IREP* is IREP, improved by the new rule-value metric and the new stopping condition • 28-8-1 against IREP • 16-21-0 against C4.5rules error ratio 1.06 (IREP 1.13) respectively 1.04 (1.52) including mushrooms

  23. Rule Optimization • post prunes rules produced by IREP* • The rules are considered in turn • for each rule R, two alternatives are constructed • Ri‘ new rule • Ri‘‘ based on Ri • final rule is chosen according to MDL

  24. RIPPER • IREP* is used to obtain a rule set • rule optimization takes place • IREP* is used to cover remaining positive examples  Repeated Incremental Pruning to Produce Error Reduction

  25. RIPPERk • apply steps 2 and 3 k times

  26. RIPPER Performance • 28-7-2 against IREP*

  27. Error Rates RIPPER obviously is competitive

  28. Efficency of RIPPERk • modifications do not change complexity

  29. Reasons for Efficiency • find model with IREP* and then improve • effiecient first model with right size • optimization takes linear time • C4.5 has expensive optimization improvement process • to large initial model • RIPPER is especially more efficient on large noisy datasets

  30. Conclusions • IREP is efficient rule learner for large noisy datasets but performs worse than C4.5 • IREP improved to IREP* • IREP* improved to RIPPER • k iterated RIPPER is RIPPERk • RIPPERk more efficient and performs better than C4.5

  31. References • Fast Effective Rule Induction William W. Cohen [1995] • Incremental Reduced Error Pruning J. Fürnkranz & G. Widmer [1994] • Efficient Pruning Methods William W. Cohen [1993]