1 / 36

Machine Learning in Practice Lecture 7

Machine Learning in Practice Lecture 7. Carolyn Penstein Ros é Language Technologies Institute/ Human-Computer Interaction Institute. Plan for the Day. Announcements No new homework this week No quiz this week Project Proposal Due by the end of the week Naïve Bayes Review

jpapa
Download Presentation

Machine Learning in Practice Lecture 7

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Machine Learning in PracticeLecture 7 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute

  2. Plan for the Day • Announcements • No new homework this week • No quiz this week • Project Proposal Due by the end of the week • Naïve Bayes Review • Linear Model Review • Tic Tac Toe across models • Weka Helpful Hints O X X X O O X O X

  3. Project proposals • If you are using one of the prefabricated projects on blackboard, let me know which one • Otherwise, tell me what data you are using • Number of instances • What you’re predicting • What features you are working with • 2 sentence description of what your ideas are for improving performance • If convenient, let me know what the baseline performance is

  4. Example of ideas: How could you expand on what’s here?

  5. Add features that describe the source Example of ideas: How could you expand on what’s here?

  6. Add features that describe things that were going on during the time when the poll was taken Example of ideas: How could you expand on what’s here?

  7. Add features that describe personal characteristics of the candidates Example of ideas: How could you expand on what’s here?

  8. Getting the Baseline Performance Percent correct Percent correct, controlling for correct by chance Performance on individual categories Confusion matrix * Right click in Result list and select Save Result Buffer to save performance stats.

  9. Assume 2 coders were assigning instances to category A or category B, and you want to measure their agreement. Clarification about Cohen’s Kappa A B Coder 2’s Codes 5 2 7 A 1 8 9 B 16 6 10 OverallTotal Coder 1’s Codes Total agreements = 13 Percent agreement = 13/16 = .81 Agreement by chance = i(Rowi*Coli)/OverallTotal = 7*6/16 + 9*10/16 = 2.63 + 5.63 = 8.3 Kappa = (TotalAgreement – Agreement by Chance)/ (Overall Total – Agreement by Chance) = (13 – 8.3)/(16 – 8.3) = 4.7 / 7.7 = .61

  10. Naïve Bayes Review

  11. Naïve Bayes Simulation You can modify the Class counts and Counts for each attribute value within each class. You can also turn smoothing on or off. Finally, you can manipulate the attribute values for the instance you want to classify with your model.

  12. Naïve Bayes Simulation You can modify the Class counts and Counts for each attribute value within each class. You can also turn smoothing on or off. Finally, you can manipulate the attribute values for the instance you want to classify with your model.

  13. Naïve Bayes Simulation You can modify the Class counts and Counts for each attribute value within each class. You can also turn smoothing on or off. Finally, you can manipulate the attribute values for the instance you want to classify with your model.

  14. Linear Model Review

  15. What do linear models do? • Notice that what you want to predict is a number • You use the number to order instances • You want to learn a function that can get the same ordering • Linear models literally add evidence Result = 2*A - B - 3*C Actual values between 2 and -4, rather than between 1 and 5, but order is the same. Order affects correlation, actual value affects absolute error.

  16. What do linear models do? • If what you want to predict is a category, you can assign values to ranges • Sort instances based on predicted value • Cut based on threshold • i.e., Val1 where f(x) < 0, Val2 otherwise Result = 2*A - B - 3*C Actual values between 2 and -4, rather than between 1 and 5, but order is the same.

  17. What do linear models do? • F(x) = X0 + C1X1 + C2X2 + C3X3 • X1-Xn are our attributes • C1-Cn are coefficients • We’re learning the coefficients, which are weights • Think of linear models as imposing a ranking on instances • Features associated with one class get negative weights • Features associated with the other class get positive weights

  18. More on Linear Regression • Linear regressions try to minimize the sum of the squares of the differences between predicted values and actual values for all training instances • Sum over all instances [ Square(predicted value of instance – actual value of instance) ] • Note that this is different from back propagation for neural nets that minimize the error at the output nodes considering only one training instance at a time • What is learned is a set of weights (not probabilities!)

  19. Limitations of Linear Regressions • Can only handle numeric attributes • What do you do with your nominal attributes? • You could turn them into numeric attributes • For example: red = 1, blue = 2, orange = 3 • But is red really less than blue? • Is red closer to blue than it is to orange? • If you treat your attributes in an unnatural way, your algorithms may make unwanted inferences about relationships between instances • Another option is to turn nominal attributes into sets of binary attributes

  20. Performing well with skewed class distributions • Naïve Bayes has trouble with skewed class distributions because of the contribution of prior probabilities • Linear models can compensate for this • They don’t have any notion of prior probability per se • If they can find a good split on the data, they will find it wherever it is • Problem if there is not a good split

  21. Skewed but clean separation

  22. Skewed but clean separation

  23. Skewed but no clean separation

  24. Skewed but no clean separation

  25. Tic Tac Toe

  26. Tic Tac Toe • What algorithm do you think would work best? • How would you represent the feature space? • What cases do you think would be hard? O X X X O O X O X

  27. Tic Tac Toe O X X X O O X O X

  28. Tic Tac Toe • Decision Trees: .67 Kappa • SMO: .96 Kappa • Naïve Bayes: .28 Kappa • What do you think is different about what these algorithms is learning? O X X X O O X O X

  29. Decision Trees

  30. Naïve Bayes • Each conditional probability is based on each square in isolation • Can you guess which square is most informative? O X X X O O X O X

  31. Linear Function • Counts every X as evidence of winning • If there are more X’s, then it’s a win for X • Usually right, except in the case of a tie O X X X O O X O X

  32. Take Home Message • Naïve Bayes is affected by prior probabilities in two places • Note that prior probabilities have an indirect effect on all conditional probabilities • Linear functions are not directly affected by prior probabilities • So sometimes they can perform better on skewed data sets • Even with the same data representation, different algorithms learn something different • Naïve Bayes learned that the center square is important • Decision trees memorized important trees • Linear function counted Xs

  33. Weka Helpful Hints

  34. Use the visualize tab to view 3-way interactions

  35. Click in one of the boxes to zoom in Use the visualize tab to view 3-way interactions

  36. Use the visualize tab to view 3-way interactions

More Related