1 / 17

CSC 196k Semester Project: Instance Based Learning

CSC 196k Semester Project: Instance Based Learning. Weka Assignment 2 Glynis Hawley. Agenda. Background: Instance-based Learning Project Requirements Data Progress Conclusions References. Background: Instance Based Learning.

Download Presentation

CSC 196k Semester Project: Instance Based Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSC 196k Semester Project: Instance Based Learning Weka Assignment 2 Glynis Hawley

  2. Agenda • Background: Instance-based Learning • Project • Requirements • Data • Progress • Conclusions • References

  3. Background: Instance Based Learning • Learning/classification based on information stored in a “set” of examples • No rules or decision trees • “New” instance classified based on its similarity to one (or more) stored example(s) • e.g. Nearest-neighbor

  4. IBL Algorithm research by David W. Aha • Two papers helpful in understanding this assignment • Instance-based Learning Algorithms • David W. Aha , Dennis Kibler, Marc K. Albert • 1991 • Tolerating noisy, irrelevant and novel attributes in instance-based learning algorithms • David W. Aha • 1992 • Three algorithms: IB1, IB2, IB3

  5. IB1:Instance-Based Learnerversion 1 • Similar to nearest neighbor algorithm • Differences: • Normalizes all attributes in range [0,1] • Handles missing attributes • Training: Stores all instances from training set • Classification: Searches all stored instances for nearest neighbor. • High computational and spatial expense

  6. IB2:Instance-Based Learnerversion 2 • Attempts to reduce storage requirements and computational complexity • Saves only misclassified instances • Algorithm: Stored instances = {}; For each instance in the training set, Tentatively classify the instance based on nearest stored instance. If classification != true class Add the instance to the stored set • Tends to accumulate noisy instances

  7. IB3: Instance-Based Learnerversion 3 • Tracks the performance of each exemplar • Uses only those that are “good enough” • Performance exceeds some upper threshold • Discards those that are “not good enough” • Performance falls below some lower threshold • Exemplars “in between” • Performance statistics upgraded whenever exemplar is the nearest neighbor to a “new” instance • Performance and storage better that IB1 and IB2

  8. Aha’s Results Results are averaged over 50 trials. [1:274], [2:57]

  9. The Weka IBL Project • Implement IB2 and IB3 • Compare their performance with that of IB1 and C4.5 (Weka version is called J48) • Data • Iris data: for initial testing of IB2 • LED data • Glass data

  10. LED Dataset • Synthetic dataset created with led-creator.c [3] • 8 attributes • 7 segments of display: 0 or 1 • Class: digits 0 through 9 • Input • Number of instances to be created • Seed • % noise per attribute • 10% noise means each bit has a 10% chance of being flipped

  11. Glass Identification Dataset • 214 instances • 163 Window glass (building windows and vehicle windows) • 87 float processed • 70 building windows • 17 vehicle windows • 76 non-float processed • 76 building windows • 0 vehicle windows • 51 Non-window glass • 13 containers • 9 tableware • 29 headlamps

  12. Progress Report - Accomplished • Implemented IB2 • Modification of IB1 class methods • buildClassifier( ) • updateClassifier( ) • Preliminary testing with iris data • Compared accuracy of IB1, IB2, and C4.5 on LED data • 10 sets of 700 instances each with 10% noise • training set = first 200 instances of each set • testing set = last 500 instances of each set

  13. Compare David Aha’s results [2:52] (over 50 trials): IB1: 70.5  0.4 % IB2: 62.5  0.6 %

  14. Progress Report - To Do • Implement IB3 • More involved than IB2 • Even more difficult when you don’t know java • Test accuracy of IB3 on LED data to compare with that of IB1, IB2, and C4.5 • Test accuracy of IB1, IB2, IB3, and C4.5 on the glass data

  15. Conclusions • Thus far, comparisons of IB1 and IB2 are similar to David Aha’s results. • Weka assignments (except perhaps #1) • Are somewhat vague. • Require some research to determine what actual project requirements should be. • Are valuable in building an understanding of the algorithms and their design.

  16. References [1] Aha, David W. 1992. Training noisy, irrelevant and novel attributes in instance-based learning algorithms. International Journal of Man-Machine Studies 36(2):267-287. [2] Aha, David W., Dennis Kibler and Marc Albert. 1991. Instance-based learning algorithms. Machine Learning 6:37-66. [3] http://ftp.ics.uci.edu/pub/machine-learning-databases/led-display-creator/led-creator.c

More Related