1 / 30

Empirical Research Methods in Computer Science

Empirical Research Methods in Computer Science. Lecture 7 November 30, 2005 Noah Smith. Using Data. Data. estimation; regression; learning; training. Model. classification; decision. pattern classification machine learning statistical inference . Action. Probabilistic Models.

lester
Download Presentation

Empirical Research Methods in Computer Science

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith

  2. Using Data Data estimation; regression; learning; training Model classification; decision pattern classification machine learning statistical inference ... Action

  3. Probabilistic Models • Let X and Y be random variables. (continuous, discrete, structured, ...) • Goal: predict Y from X. • A model defines P(Y = y | X = x). • Where do models come from? • If we have a model, how do we use it?

  4. Using a Model • We want to classify a message, x, as spam or mail: y ε {spam, mail}. Model P(spam | x) P(mail | x) x

  5. Bayes’ Rule likelihood: one distribution over complex observations per y prior what we said the model must define normalizes into a distribution:

  6. Naive Bayes Models • Suppose X = (X1, X2, X3, ..., Xm). • Let

  7. Naive Bayes: Graphical Model Y X1 X2 X3 Xm ...

  8. Part II Where do the model parameters come from?

  9. Using Data Data estimation; regression; learning; training Model Action

  10. Warning • This is a HUGE topic. • We will barely scratch the surface.

  11. Forms of Models • Recall that a model defines P(x | y) and P(y). • These can have a simple multinomial form, like P(mail) = 0.545, P(spam) = 0.455 • Or they can take on some other form, like a binomial, Gaussian, etc.

  12. Example: Gaussian • Suppose y is {male, female}, and one observed variable is H, height. • P(H | male) ~ N(μm, σm2) • P(H | female) ~ N(μf, σf2) • How to estimate μm, σm2,μf, σf2?

  13. Maximum Likelihood • Pick the model that makes the data as likely as possible max P(data | model)

  14. Maximum Likelihood (Gaussian) • Estimating the parameters μm, σm2,μf, σf2 can be seen as • fitting the data • estimating an underlying statistic (point estimate)

  15. Using the model

  16. Using the model

  17. Example: Regression • Suppose y is actual runtime, and x is input length. • Regression tries to predict some continuous variables from others.

  18. Regression • Linear: assume linear relationship, fit a line. • We can turn this into a model!

  19. Linear Model • Given x, predict y. y = β1x + β0+ N(0, σ2) random deviation true regression line

  20. Principle of Least Squares • Minimize the sum of squared vertical deviations. • Unique, closed form solution! vertical deviation

  21. Other kinds of regression • transform one or both variables (e.g., take a log) • polynomial regression • (least squares → linear system) • multivariate regression • logistic regression

  22. Example: text categorization • Bag-of-words model: • x is a histogram of counts for all words • y is a topic

  23. MLE for Multinomials • “Count and Normalize”

  24. The Truth about MLE • You will never see all the words. • For many models, MLE isn’t safe. • To understand why, consider a typical evaluation scenario.

  25. Evaluation • Train your model on some data. • How good is the model? • Test on different data that the system never saw before. • Why?

  26. Tradeoff low variance overfits the training data doesn’t generalize low accuracy

  27. Text categorization again • Suppose ‘v1@gra’ never appeared in any document in training, ever. • What is the above probability for a new document containing ‘v1@gra’ at test time?

  28. Solutions • Regularization • Prefer less extreme parameters • Smoothing • “Flatten out” the distribution • Bayesian Estimation • Construct a prior over model parameters, then train to maximize P(data | model) × P(model)

  29. One More Point • Building models is not the only way to be empirical. • Neural networks, SVMs, instance-based learning • MLE and smoothed/Bayesian estimation are not the only ways to estimate. • Minimize error, for example (“discriminative” estimation)

  30. Assignment 3 • Spam detection • We provide a few thousand examples • Perform EDA and pick features • Estimate probabilities • Build a Naive-Bayes classifier

More Related