1 / 45

Machine Learning in Practice Lecture 8

Machine Learning in Practice Lecture 8. Carolyn Penstein Ros é Language Technologies Institute/ Human-Computer Interaction Institute. Plan for the Day. Announcements Should be finalizing plans for term project Weka helpful hints Spam Dataset Overcoming some limits of Linear Functions

adrian-hahn
Download Presentation

Machine Learning in Practice Lecture 8

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Machine Learning in PracticeLecture 8 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute

  2. Plan for the Day • Announcements • Should be finalizing plans for term project • Weka helpful hints • Spam Dataset • Overcoming some limits of Linear Functions • Discussing ordinal attributes in light of linear functions

  3. Weka Helpful Hints

  4. Feature Selection * Click here to start setting up feature selection • Feature selection algorithms pick out a subset of the features that work best • Usually they evaluate each feature in isolation

  5. Feature Selection * Now click here • Feature selection algorithms pick out a subset of the features that work best • Usually they evaluate each feature in isolation

  6. Feature Selection * Now click here.

  7. Feature Selection

  8. * Now pick your base classifier just like before Feature Selection

  9. * Finally you will configure the feature selection Feature Selection

  10. * First click here. Setting Up Feature Selection

  11. * Select CHiSquaredAttributeEval Setting Up Feature Selection

  12. * Now click here. Setting Up Feature Selection

  13. * Select Ranker Setting Up Feature Selection

  14. * Now click here Setting Up Feature Selection

  15. * Set the number of features you want Setting Up Feature Selection

  16. Setting Up Feature Selection • The number you pick should not be larger than the number of features available • The number should not be larger than the number of coded examples you have

  17. Examining Which Features are Most Predictive • You can find a ranked list of features in the Performance Report if you use feature selection * Predictiveness score * Frequency

  18. Spam Data Set

  19. Spam Data Set • Word frequencies • Runs of $, !, Capitalization • All numeric • Spam versus NotSpam * Which algorithm will work best?

  20. Spam Data Set • Decision Trees (.85 Kappa) • SMO (linear function) (.79 Kappa) • Naïve Bayes (.6 Kappa)

  21. What did SMO learn?

  22. Decision tree model

  23. More on Linear Functions… exploring the idea of nonlinearity

  24. Limits of linear functions

  25. Numeric Prediction with the CPU Data • Predicting CPU performance from computer configuration • All attributes are numeric as well as the output

  26. Numeric Prediction with the CPU Data • Could discretize the output and predict good performance, mediocre performance, or bad performance • Numeric prediction allows you to make arbitrarily many distinctions

  27. Linear Regression R-squared= .87

  28. Outliers ** Notice that here it’s the really high values that fit the line the least well. That’s not always the case.

  29. The two most highly weighted features

  30. Exploring the Attribute Space * Identify outliers with respect to typical attribute values.

  31. The two most highly weighted features Within 1 standard deviation of the mean value

  32. Trees for Numeric Prediction • Looks like we may need a representation that allows for a nonlinear solution • Regression trees can handle a combination of numeric and nominal attributes • M5P: computes a linear regression function at each leaf node of the tree • Look at CPU performance data and compare a simple linear regression (R = .93) with M5P (R = .98)

  33. Results on CPU data with M5P More Data Here Biggest Outliers Here

  34. Results with M5P More Data Here Biggest Outliers Here

  35. Multi-Layer Networks can learn arbitrarily complex functions

  36. Multilayer Perceptron

  37. Best Results So Far

  38. Forcing a Linear Function Note that it weights the features differently than the linear regression Partly because of normalization Regression trees split on MMAX NN emphasizes MMIN

  39. Review of Ordinal Attributes

  40. Feature Space Design for Linear Functions • Often features will be numeric • Continuous values • May be more likely to generalize properly with discretized values • We discussed the fact that you lose ordering and distance • With respect to linear functions, it may be more important that you lose the ability to think in terms of ranges • Explicitly coding ranges allows for a simple form of nonlinearity

  41. Ordinal Values • Weka technically does not have ordinal attributes • But you can simulate them with “temperature coding”! • Try to represent “If X less than or equal to .35”? A B C D .2 .25 .28 .31 .35 .45 .47 .52 .6 .63 A A or B A or B or C A or B or C or D

  42. Ordinal Values • Weka technically does not have ordinal attributes • But you can simulate them with “temperature coding”! • Try to represent “If X less than or equal to .35”? A B C D .2 .25 .28 .31 .35 .45 .47 .52 .6 .63 A A or B A or B or C A or B or C or D

  43. Now how would you represent X <= .35? Ordinal Values • Weka technically does not have ordinal attributes • But you can simulate them with “temperature coding”! • Try to represent “If X less than or equal to .35”? A B C D .2 .25 .28 .31 .35 .45 .47 .52 .6 .63 A A or B A or B or C A or B or C or D

  44. Now how would you represent X <= .35? Feat2 = 1 Ordinal Values • Weka technically does not have ordinal attributes • But you can simulate them with “temperature coding”! • Try to represent “If X less than or equal to .35”? A B C D .2 .25 .28 .31 .35 .45 .47 .52 .6 .63 A A or B A or B or C A or B or C or D

  45. Take Home Message • Linear functions cannot learn interactions between attributes • If you need to account for interactions: • Multiple layers • Tree-like representations • Attributes that represent ranges • Later in the semester we’ll talk about other approaches

More Related