1 / 32

Instance-based Learning Algorithms

Instance-based Learning Algorithms. Presented by Yan T. Yang. Agenda. Background what is instance-based learning? Two simple algorithms Extensions [Aha, 1994]: F eedback algorithm Noise reduction Irrelevant attribute elimination Novel attribute adoption. Learning Paradigms.

janna-ryan
Download Presentation

Instance-based Learning Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Instance-based Learning Algorithms Presented by Yan T. Yang

  2. Agenda • Background what is instance-based learning? • Two simple algorithms • Extensions [Aha, 1994]: • Feedback algorithm • Noise reduction • Irrelevant attribute elimination • Novel attribute adoption

  3. Learning Paradigms • Cognitive psychology: how people/animals/ machines learn? Jerome Bruner • Two schools of thoughts: [Bruner, Goodnow and Austin 1967] • Abstraction-based: • Form a generalized idea from the examples, then use it to classifynew objects.

  4. Learning Paradigms • Cognitive psychology: how people/animals/ machines learn? Jerome Bruner • Two schools of thoughts: [Bruner, Goodnow and Austin 1967] • Abstraction-based: • Examples: • Artificial Neural Network, • Support Vector Machine, • Rule based learner/decision trees: If not animated… then not an animal

  5. Learning Paradigms • Cognitive psychology: how people/animals/ machines learn? Jerome Bruner • Two schools of thoughts: [Bruner, Goodnow and Austin 1967] • Instance-based: • Store all (suitable) training examples, compare new objects to the examples.

  6. Comparison Between Two Paradigms • Instance Based • Store (suitable) examples • Saved instances • Workload is during query time • Little work during training time • Abstraction Based • Generalization: • Rules • Discriminant planes or functions • Trees • Workload is during training time • Little work during query time

  7. Instance-based Learning Training Set Example [Aha, 1994]: Attributes – “is enrolled”, “has MS degree”, and “is married” ( <True, True, True>, PhD student) ( <False, False, True>, not PhD student) ( <True, False, False>, PhD student)

  8. Instance-based Learning Training Set Instance-based learning Algorithm Concept Description

  9. Instance-based Learning Training Set Instance-based learning Algorithm Concept Description Similarity Function

  10. Instance-based Learning Training Set Instance-based learning Algorithm Concept Description Similarity Function Classification Function

  11. Instance-based Learning Algorithm • Input: Training set • Output: Concept Description • Similarity function • Classification function • Optional: • Keep track of each concept description instance’s correct and incorrect rates • Concept Description Adder • Concept Description Remover

  12. Instance-based Learning Algorithm • Advantages and disadvantages [Mitchell, 1997] • Advantages: • Training is very fast • Learn complex class membership • Do not lose information • Disadvantages: • Slow at query time • Easily fooled by irrelevant attributes

  13. Instance-based Learning Algorithm CD= concept description • Example IBL1: • Assign the class of the most similar concept description instance to the new instance. • Nearest neighbor • Save all training instances in concept description

  14. Instance-based Learning Algorithm Voronoi Tessellation Training data • Example IBL1: • Assign the class of the most similar concept description instance to the new instance. • Nearest neighbor • Save all training instances in concept description

  15. Instance-based Learning Algorithm CD= concept description • Example IBL2: • Similar to IBL1: nearest neighbor • Save only incorrectly classified instances in training set: Intuition: “These are nearly always lies in the boundary between two classes. So, only if these are fully saved, the rest which are far from boundaries, can be easily deduced by using the similarity function” [Karadeniz,1996]

  16. Criticisms Mainly because of Nearest Neighbor Algorithms as the basis:[Brieman, Friedman, Olshen and Stone, 1984 ] • They are expensive due to their storage • They are sensitive to the choice of the similarity function • They cannot easily work with missing attribute values • They cannot easily work with nominal attributes • They do not yield concise summaries of concepts

  17. Criticisms Mainly because of Nearest Neighbor Algorithms as the basis:[Brieman, Friedman, Olshen and Stone, 1984 ] • They are expensive due to their storage • They are sensitive to the choice of the similarity function • They cannot easily work with missing attribute values • They cannot easily work with nominal attributes • They do not yield concise summaries of concepts [Aha, 1992] • IBL2 rectifies 1. • Extensions (following slides) rectifies 1,2,3. • [Stanfill and Waltz, 1986] rectifies 4. • [Salzberg, 1990] rectifies 5.

  18. Extension: Filtering Noisy Training Instances (IBL3) Modification: 1. Maintain classification records 2. Only significantly good instances are saved; and 3. Discard noisy saved instance (i.e. those instances with significantly poor classification performance)

  19. Extension: Filtering Noisy Training Instances (IBL3)

  20. Extension: Filtering Noisy Training Instances (IBL3) “Signficantly” good or bad: use statistical confidence intervals (CI). construct CI for the current instance’s classification accuracy. construct CI for its class’s current observed relative frequency. “Significantly” good Classification accuracy Class frequency

  21. Extension: Filtering Noisy Training Instances (IBL3) “Signficantly” good or bad: use statistical confidence intervals (CI). construct CI for the current instance’s classification accuracy. construct CI for its class’s current observed relative frequency. “Significantly” bad Classification accuracy Class frequency

  22. Extension: Filtering Noisy Training Instances (IBL3) “Signficantly” good or bad: use statistical confidence intervals (CI). construct CI for the current instance’s classification accuracy. construct CI for its class’s current observed relative frequency. [Hogg and Tanis, 1983]

  23. Extension: Tolerate irrelevant attributes (IBL4) • IBL1-IBL3: Assume all attributes have equal relevance ; • Real World: some attributes are more discriminative than others; • Irrelevant attributes cause poor performance.

  24. Extension: Tolerate irrelevant attributes (IBL4) • Regular similarity measure (Euclidean Distance) • IBL4’s similarity measure (Euclidean Distance) Concept-dependent: sim(animal, tiger, cat) > sim(pet, tiger, cat)

  25. Extension: Tolerate irrelevant attributes (IBL4) • IBL4’s similarity measure (Euclidean Distance)

  26. Extension: Tolerate irrelevant attributes (IBL4) • IBL4’s similarity measure (Euclidean Distance)

  27. Extension: Tolerate novel attributes (IBL5) • (IBL1– IBL4) assume: all attributes are known a priori to the training process; • Everyday situations: instances may not initially described by all possible attributes; • Missing value: a different issue. 1) assigning “don’t know”; 2) assigning the most probable value; 3) assigning all possible values [Gams and Lavrac, 1987]

  28. Extension: Tolerate novel attributes (IBL5) • Extension (IBL5): allow novel attributes introduced late in the training process(extra: handle missing values in a novel way) • IBL4’s similarity measure (Euclidean Distance) • IBL5’s similarity measure (Euclidean Distance)

  29. Extension: Tolerate novel attributes (IBL5) • Extension (IBL5): allow novel attributes introduced late in the training process(extra: handle missing values in a novel way) • IBL5’s similarity measure (Euclidean Distance)

  30. Results IB = instance based learning (IBL)

  31. Results

  32. Thanks • Q and A

More Related