1 / 23

Feature transformation through rule induction: A case study with the k -NN classifier

Feature transformation through rule induction: A case study with the k -NN classifier. Antal van den Bosch Tilburg University, The Netherlands http://ilk.uvt.nl - Antal.vdnBosch@uvt.nl. Outline. General idea Feature space transform for k -NN k -NN classification over rules

atara
Download Presentation

Feature transformation through rule induction: A case study with the k -NN classifier

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Feature transformation through rule induction: A case study with the k-NN classifier Antal van den Bosch Tilburg University, The Netherlands http://ilk.uvt.nl - Antal.vdnBosch@uvt.nl

  2. Outline • General idea • Feature space transform for k-NN • k-NN classification over rules • An implementation using RIPPER • Intermezzo - parameter optimization • Experiments on UCI data • Conclusions

  3. A C F Z B B D Y B C C X f1 f2 f3 c r1 r2 r3 r4 c B B C ? 0 0 0 1 1 0 1 0 0 1 0 1 0 1 1 0 Z Y X ? If f1=A then c=Z If f1=B and f2=B then c=Y If f2=C then c=X If f3=C then c=X Feature transformation

  4. k-NN over rules • Different classification: • No “class of first rule that matches” • Instead, produce • majority class of nearest neighbors • that share the most matching rules with the new instance (weighted, …) • Different outcomes possible • Rule’s class is not considered, only NN’s • Rules become features with weights; can outweigh and outnumber others

  5. Related work • Sébag and Schoenauer (1994) • same transformation, but for local regression; interesting dimension reduction • Generalizing instances to rules in k-NN • Salzberg (1990), NGE (hyperrectangles) • Domingos (1996), RISE (merging with wildcards) • Van den Bosch (1999), FAMBL (merging by disjuncting values) • Van den Bosch (2000) • earlier version only on natural language processing tasks

  6. Implementation • RIPPER (Cohen, 1995) • Sequential covering, MDL-driven • Induces sets of rules per class • Uses partitioning to validate and select rules • Many heuristics, many parameters, fast • Procedure: • Apply RIPPER to training set • Recode training and test set using RIPPER rules • Train and test k-NN (IB1 in TiMBL 5.0)

  7. Variants • Transformed IB1 (T-IB1) • new features replace original • IB1 plus new features (IB1+T) • new features are added to original • Compared against RIPPER and IB1 • 10-fold CV • Unpaired one-tailed t-tests

  8. UCI data sets • Artificial data sets • Fully known underlying concept • Known conditional dependencies • Natural data sets • Partly understood underlying problem • Unknown conditional dependencies

  9. Data specs

  10. Intermezzo • Parameter settings matter, but • Good setting is unpredictable • Parameters interact • Exhaustive wrapping is not an option • Both k-NN (TiMBL) and RIPPER have lots of parameters • Heuristic: Wrapped progressive sampling (Van den Bosch, 2004)

  11. Main idea

  12. WPS parameter spaces • Ripper: • F (min. # inst/r) • a (class order) • n (negation) • S (simplify) • O (# opt. passes) • L (loss ratio) • 648 combinations • IB1 (TiMBL): • k (k-NN) • w (feature wght) • m (sim. metric) • d (distance wght) • L (metric backoff) • 925 combinations

  13. Effect of WPS

  14. End of intermezzo

  15. RIPPER vs T-IB1 vs IB1

  16. Numbers of rules

  17. IB1 vs IB1+T

  18. Result summary

  19. Discussion • T-IB1 ≈ RIPPER • RIPPER classification can be interchanged with k-NN classification • IB1+T outperforms IB1 and RIPPER • Extra features add useful new views on task • Effects mainly on artificial data • Complex “game” rules help k-NN in finding the correct nearest neighbors

  20. One example: tic-tac-toe • Tic-tac-toe • donated by David Aha to UCI repository • 958 possible endings of 3x3 board game • class: whether board constitutes a win for X • Yes or no (no can be a win for O, or a draw) • Typical 100% correct eight rule set: • Check 2 diagonals • Check 3 horizontals • Check 3 verticals for consecutive Xs • Usually RIPPER finds these eight, but sometimes induces other rules

  21. X O O O O X O O O X O O X X X X X O Tic-tac-toe freak rule • Test on O in three locations, not three in a row • X may win • But may mean draw!

  22. X O O O X X X O IB1+T saves the day • Finds a nearest neighbor • Mismatches in two positions • Matches on the rule feature • Represents a draw X O O O X X X O

  23. Future work • More relaxed and redundant rule inducer (more rules per instance) • Bigger context: plug k-NN onto RIPPER, maxent, SVM, Winnow, … • AUC instead of accuracy

More Related