1 / 28

Exercises

Exercises. Labor Data. Sorted by wage-increase-first. Discretizing “wage-increase-first”. Split after 2 : avg(entropy[4/5,1/5], entropy[6/21, 15/21]) entropy[4/5,1/5] = -(4/5)*l(4/5)-(1/5)*l(1/5) = 0.5 entropy[6/21,15/21] = -(6/21)*l(6/21)-(15/21)*l(15/21) = .598

Download Presentation

Exercises

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Exercises

  2. Labor Data Sorted by wage-increase-first

  3. Discretizing “wage-increase-first” Split after 2: • avg(entropy[4/5,1/5], entropy[6/21, 15/21]) • entropy[4/5,1/5] = -(4/5)*l(4/5)-(1/5)*l(1/5) = 0.5 • entropy[6/21,15/21] = -(6/21)*l(6/21)-(15/21)*l(15/21) = .598 • avg = (5/26)*0.5 + (21/26)* .598 = .579 Split after 3: • avg(entropy[8/9,1/9], entropy[2/17, 15/17]) • entropy[8/9,1/9] = -(8/9)*l(8/9)-(1/9)*l(1/9) = .349 • entropy[2/17,15/17] = -(2/17)*l(2/17)-(15/17)*l(15/17) = .362 • avg = (9/26)* .349 + (17/26)* .362 = .357 Split after 3.5: • avg(entropy[8/12,4/12], entropy[2/14, 12/14]) • entropy[8/12,4/12] = -(8/12)*l(8/12)-(4/12)*l(4/12) = .637 • entropy[2/14,12/14] = -(2/14)*l(2/14)-(12/14)*l(12/14) = .41 • avg = (12/28)* .637 + (14/28)* .41 = .478

  4. Discretizing “wage-increase-first” • Split after 4: • avg(entropy[10/14,4/14], entropy[0/12,12/12]) • entropy[10/14,4/14] = -(10/14)*l(10/14)-(4/14)*l(4/14) = .598 • entropy[0/12,12/12] = -(0/12)*l(0/12)-(12/12)*l(12/12) = 0 • avg = (14/26)*0.598 + (12/26)* 0 = .322 (the smallest) • No Split: • entropy[10/26,16/26] = -(10/26)*l(10/26)-(16/26)*l(16/26) = .666

  5. Discretizing “wage-increase-first” Need to continue in the yellow part

  6. Discretizing “wage-increase-first” Split after 2: • avg(entropy[4/5,1/5], entropy[6/9, 3/9]) • entropy[4/5,1/5] = -(4/5)*l(4/5)-(1/5)*l(1/5) = .5 • entropy[6/9,3/9] = -(6/9)*l(6/9)-(3/9)*l(3/9) = .637 • avg = (5/14)*0.5 + (9/14)* 0.637 = .588 Split after 3: • avg(entropy[8/9,1/9], entropy[2/5, 3/5]) • entropy[8/9,1/9] = -(8/9)*l(8/9)-(1/9)*l(1/9) = .349 • entropy[2/5,3/5] = -(2/5)*l(2/5)-(3/5)*l(3/5) = .673 • avg = (9/14)* .349 + (5/14)* .673 = .465 Split after 3.5: • avg(entropy[8/12,4/12], entropy[2/2, 0/2]) • entropy[8/12,4/12] = -(8/12)*l(8/12)-(4/12)*l(4/12) = .637 • entropy[2/2,0/2] = -(2/2)*l(2/2)-(0/2)*l(0/2) = 0 • avg = (12/14)* .637 + (2/14)* 0 = .546

  7. Discretizing “wage-increase-first” • No Split: • entropy[10/14,4/14] = -(10/14)*l(10/14)-(4/14)*l(4/14) = .598 • So, we need to split “after 3.”

  8. Discretizing “wage-increase-first” Need to continue in the yellow part, as well as in the red part. For the yellow part it’s easy: There is only one choice, which makes the entropy perfect. What about the red part?

  9. Discretizing “wage-increase-first” What about the red part? Should we split further?

  10. Discretizing “wage-increase-first” • Split after 2: • avg(entropy[4/5,1/5], entropy[4/4, 0/4]) • entropy[4/5,1/5] = -(4/5)*l(4/5)-(1/5)*l(1/5) = .5 • entropy[4/4,0/4] = 0 • avg = (5/9)*0.5 + (4/9)* 0 = .278 • No Split: • entropy[8/9,1/9] • entropy[8/9,1/9] = -(8/9)*l(8/9)-(1/9)*l(1/9) = .349 • So, it’s better to split. • You might get the impression that we split after each value. I.e. there would be one discrete value for each numeric value. • However, had we 13 instances in the red part as opposed to 9, there would be no split “after 2” • -(12/13)*l(12/13)-(1/13)*l(1/13) = .271

  11. Instance-based learning: IB2 • IB2: save memory, speed up classification • Work incrementally • Only incorporate misclassified instances • Problem: noisy data gets incorporated Data: “Who buys gold jewelry” (25,60,no) (45,60,no) (50,75,no) (50,100,no) (50,120,no) (70,110,yes) (85,140,yes) (30,260,yes) (25,400,yes) (45,350,yes) (50,275,yes) (60,260,yes)

  12. Instance-based learning: IB2 • Data: • (25,60,no) • (85,140,yes) • (45,60,no) • (30,260,yes) • (50,75,no) • (50,120,no) • (70,110,yes) • (25,400,yes) • (50,100,no) • (45,350,yes) • (50,275,yes) • (60,260,yes) This is the final answer. I.e. we memorize only these 5 points. However, let’s compute gradually the classifier.

  13. Instance-based learning: IB2 • Data: • (25,60,no)

  14. Instance-based learning: IB2 • Data: • (25,60,no) • (85,140,yes) Since so far the model has only the first instance memorized, this second instance gets wrongly classified. So, we memorize it as well.

  15. Instance-based learning: IB2 • Data: • (25,60,no) • (85,140,yes) • (45,60,no) So far the model has the two first instances memorized. The third instance gets properly classified, since it happens to be closer with the first. So, we don’t memorize it.

  16. Instance-based learning: IB2 • Data: • (25,60,no) • (85,140,yes) • (45,60,no) • (30,260,yes) So far the model has the two first instances memorized. The fourth instance gets properly classified, since it happens to be closer with the second. So, we don’t memorize it.

  17. Instance-based learning: IB2 • Data: • (25,60,no) • (85,140,yes) • (45,60,no) • (30,260,yes) • (50,75,no) So far the model has the two first instances memorized. The fifth instance gets properly classified, since it happens to be closer with the first. So, we don’t memorize it.

  18. Instance-based learning: IB2 • Data: • (25,60,no) • (85,140,yes) • (45,60,no) • (30,260,yes) • (50,75,no) • (50,120,no) So far the model has the two first instances memorized. The sixth instance gets wrongly classified, since it happens to be closer with the second. So, we memorize it.

  19. Instance-based learning: IB2 • Continuing in a similar way, we finally get, the figure in the right. • The colored points are the one that get memorized. This is the final answer. I.e. we memorize only these 5 points.

  20. Instance-based learning: IB3 • IB3: deal with noise • Discard instances that don’t perform well • Keep a record of the number of correct and incorrect classification decisions that each exemplar makes. • Two predetermined thresholds are set on success ratio. • An instance is used for training: • If the number of incorrect classifications is  the first threshold, and • If the number of correct classifications  the second threshold.

  21. Instance-based learning: IB3 • Suppose the lower threshold is 0, and upper threshold is 1. • Shuffle the data first • (25,60,no) • (85,140,yes) • (45,60,no) • (30,260,yes) • (50,75,no) • (50,120,no) • (70,110,yes) • (25,400,yes) • (50,100,no) • (45,350,yes) • (50,275,yes) • (60,260,yes)

  22. Instance-based learning: IB3 • Suppose the lower threshold is 0, and upper threshold is 1. • Shuffle the data first • (25,60,no) [1,1] • (85,140,yes) [1,1] • (45,60,no) [0,1] • (30,260,yes) [0,2] • (50,75,no) [0,1] • (50,120,no) [0,1] • (70,110,yes) [0,0] • (25,400,yes) [0,1] • (50,100,no) [0,0] • (45,350,yes) [0,0] • (50,275,yes) [0,1] • (60,260,yes) [0,0]

  23. Instance-based learning: IB3 • The points that will be used in classification are: • (45,60,no) [0,1] • (30,260,yes) [0,2] • (50,75,no) [0,1] • (50,120,no) [0,1] • (25,400,yes) [0,1] • (50,275,yes) [0,1]

  24. Rectangular generalizations • When a new exemplar is classified correctly, it is generalized by simply merging it with the nearest exemplar. • The nearest exemplar may be either a single instance or a hyper-rectangle.

  25. Rectangular generalizations • Data: • (25,60,no) • (85,140,yes) • (45,60,no) • (30,260,yes) • (50,75,no) • (50,120,no) • (70,110,yes) • (25,400,yes) • (50,100,no) • (45,350,yes) • (50,275,yes) • (60,260,yes)

  26. Classification • If the new instance lies within a rectangle then output the rectangle class • If the new instance lies in the overlap of several rectangles, then output the class of the rectangle whose center is the closest to the new data instance. • If the new instance lies outside any of the rectangles, output the class of the rectangle, which is the closest to the data instance. • The distance of a point from a rectangle is: • If an instance lies within rectangle, d=0 • If outside, d = distance from the closest rectangle part, i.e. distance from some point in the rectangle boundary. Class 1 Class 2 Separation line

  27. Aggregated data

  28. Exercise Assuming that the salary is the class attribute • Build a decision tree using ID3 • Build a Naïve Bayes model

More Related