1 / 10

Rule induction: Ross Quinlan's ID3 algorithm

Rule induction: Ross Quinlan's ID3 algorithm. Fredda Weinberg CIS 718X Fall 2005 Professor Kopec Assignment #3. The learning problem. You are presented with the data. You have a supervised learning problem (that is, a target variable).

neena
Download Presentation

Rule induction: Ross Quinlan's ID3 algorithm

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Rule induction:Ross Quinlan's ID3 algorithm Fredda Weinberg CIS 718X Fall 2005 Professor Kopec Assignment #3

  2. The learning problem • You are presented with the data. • You have a supervised learning problem (that is, a target variable). • In practice, there is no such thing as the correct model. • You are looking for a “best approximating” model. • There is no reason to think that linear models provide the “best approximating” model. • SPSS CLementine Users Group

  3. Terms • General: • Decision trees. • “Recursive partitioning” -- Apply the same splitting rule to smaller and smaller partitions of the sample space. • Classification • Tree-based classification. • Classification trees. • ibid

  4. Rule induction 1. For each attribute, compute its entropy with respect to the conclusion 2. Select the attribute (say A) with lowest entropy. 3. Divide the data into separate sets so that within a set, A has a fixed value (eg Color=green eye color in one set, Color=brown in another, etc). 4. Build a tree with branches: if A=a1 then ... (subtree1) if A=a2 then ... (subtree2) ...etc... 5. For each subtree, repeat this process from step 1. 6. At each iteration, one attribute gets removed from consideration. The process stops when there are no attributes left to consider, or when all the data being considered in a subtree have the same value for the conclusion (eg they all say Conclusion=safe from sunburn). Rule induction: Ross Quinlan's ID3 algorithm

  5. Iterative Dichotomizer The rule induction algorithm was first used by Hunt in his CLS (concept learning system) in 1962. Then, with extensions for handling numeric data too, it was used by Ross Quinlan for his ID3 system in 1979. Quinlan'sID3 tried to cut down on effort by inducing a set of rules from a small subset of data, and then testing to see if those rules explained other data. Data not explained were then added to the chosen subset, and new rules induced. This process continued until all the data was accounted for. The letters ID stood for `iterative dichotomiser', a fancy name for this simple algorithm. Rule induction: Ross Quinlan's ID3 algorithm

  6. Entropy • Entropy = Si -pi log2 pi • Information-theoretic criterion: Minimum number of bits needed to encode the classification of an arbitrary case. • Ranges from 0 to 1. • 0 if p is concentrated in one class. • Maximal if p is uniform across classes. • Entropy gain is reduction in entropy after split. Interpretation: Number of bits saved when encoding the target value with knowledge of the predictor. • Entropy gain is biased in favor of attributes with many values. Gain ratio discourages the selection of attributes with many uniformly distributed values. • SPSS CLementine Users Group

  7. Tech Support toy database: is it the equipment or the commander? Decision Trees by Computational Intelligence

  8. The Decision Tree produced by the training data

  9. Testing with new examples: Predictions

  10. Applications • Predicting Magnetic Properties of Crystals • Profiling High Income Earners from Census Data • Assessing Churn Risk • Detecting Advertisements on the Web • Identifying Spam • Diagnosing Hypothyroidism

More Related