Bayes Classifier , Linear Regression

1 / 14

# Bayes Classifier , Linear Regression - PowerPoint PPT Presentation

Bayes Classifier , Linear Regression. 10701 /15781 Recitation Jan uary 29, 2008. Parts of the slides are from previous years’ recitation and lecture notes, and from Prof. Andrew Moore’s data mining tutorials. Classification and Regression. Classification

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Bayes Classifier , Linear Regression' - stevie

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Bayes Classifier,Linear Regression

10701/15781 Recitation

January 29, 2008

Parts of the slides are from previous years’ recitation and lecture notes, and from Prof. Andrew Moore’s data mining tutorials.

Classification and Regression
• Classification
• Goal: Learn the underlying function

f: X (features)Y (class, or category)

e.g. words  “spam”, or “not spam”

• Regression

f: X (features)  Y (continuous values)

e.g. GPA  salary

Supervised Classification
• How to find an unknown function

f: X Y

(features  class)

or equivalently P(Y|X)

• Classifier:
• Find P(X|Y), P(Y), and use Bayes rule - generative
• Find P(Y|X) directly - discriminative
Classification

Learn P(Y|X)

1. Bayes rule:

P(Y|X) = P(X|Y)P(Y) / P(X) ~ P(X|Y)P(Y)

• Learn P(X|Y), P(Y)
• “Generative” classifier

2. Learn P(Y|X) directly

• “Discriminative”(to be covered later in class)
• e.g. logistic regression
Generative Classifier: Bayes Classifier

Learn P(X|Y), P(Y)

• e.g. email classification problem
• 3 classes for Y = { spam, not spam, maybe }
• 10,000 binary features for X = {“Cash”, “Rolex”,…}
• How many parameters do we have?
• P(Y) :
• P(X|Y) :
Generative learning:Naïve Bayes
• Introduce conditional independence

P(X1,X2|Y) = P(X1 |Y) P(X2 |Y)

P(Y|X) = P(X|Y)P(Y) / P(X) for X=(Xi,…,Xn)

= P(X1|Y)…P(Xn|Y)P(Y) / P(X)

= prodi P(Xi|Y) P(Y) / P(X)

• Learn P(X1|Y), … P(Xn|Y), P(Y)

instead of learning P(X1,…, Xn |Y) directly

Naïve Bayes
• 3 classes for Y = {spam, not spam, maybe}
• 10,000 binary features for X = {“Cash”,”Rolex”,…}
• Now, how many parameters?
• P(Y)
• P(X|Y)
• fewer parameters
• “simpler” – less likely to overfit
Full Bayes vs. Naïve Bayes

P(Y=1|(X1,X2)=(0,1))=?

• Full Bayes:

P(Y=1)=?

P((X1,X2)=(0,1)|Y=1)=?

• Naïve Bayes:

P(Y=1)=?

P((X1,X2)=(0,1)|Y=1)=?

• XOR
Regression
• Prediction of continuous variables
• e.g. I want to predict salaries from GPA.
•  I can regress that …
• Learn the mapping f: X  Y
• Model is linear in the parameters (+ some noise)

 linear regression

• Assume Gaussian noise
• Learn MLE Θ
1-parameter linear regression
• Normal linear regression

or equivalently,

• MLEΘ?
• MLE σ2 ?
Multivariate linear regression
• What if the inputs are vectors?
• Write matrix X and Y :

(n data points, k features for each data)

• MLE Θ =
Constant term?
• We may expect linear data that does not go through the origin
• Trick?
Regression: another example
• Assume the following model to fit the data. The model has one unknownparameter θ to be learned from data.
• A maximum likelihood estimation of θ?