Loading in 5 sec....

Bayes Classifier , Linear RegressionPowerPoint Presentation

Bayes Classifier , Linear Regression

- 197 Views
- Uploaded on
- Presentation posted in: General

Bayes Classifier , Linear Regression

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Bayes Classifier,Linear Regression

10701/15781 Recitation

January 29, 2008

Parts of the slides are from previous years’ recitation and lecture notes, and from Prof. Andrew Moore’s data mining tutorials.

- Classification
- Goal: Learn the underlying function
f: X (features)Y (class, or category)

e.g. words “spam”, or “not spam”

- Goal: Learn the underlying function
- Regression
f: X (features) Y (continuous values)

e.g. GPA salary

- How to find an unknown function
f: X Y

(features class)

or equivalently P(Y|X)

- Classifier:
- Find P(X|Y), P(Y), and use Bayes rule - generative
- Find P(Y|X) directly - discriminative

Learn P(Y|X)

1. Bayes rule:

P(Y|X) = P(X|Y)P(Y) / P(X) ~ P(X|Y)P(Y)

- Learn P(X|Y), P(Y)

2. Learn P(Y|X) directly

Learn P(X|Y), P(Y)

- e.g. email classification problem
- 3 classes for Y = { spam, not spam, maybe }
- 10,000 binary features for X = {“Cash”, “Rolex”,…}
- How many parameters do we have?
- P(Y) :
- P(X|Y) :

- Introduce conditional independence
P(X1,X2|Y) = P(X1 |Y) P(X2 |Y)

P(Y|X) = P(X|Y)P(Y) / P(X) for X=(Xi,…,Xn)

= P(X1|Y)…P(Xn|Y)P(Y) / P(X)

= prodi P(Xi|Y) P(Y) / P(X)

- Learn P(X1|Y), … P(Xn|Y), P(Y)
instead of learning P(X1,…, Xn |Y) directly

- Learn P(X1|Y), … P(Xn|Y), P(Y)

- 3 classes for Y = {spam, not spam, maybe}
- 10,000 binary features for X = {“Cash”,”Rolex”,…}
- Now, how many parameters?
- P(Y)
- P(X|Y)

- fewer parameters
- “simpler” – less likely to overfit

P(Y=1|(X1,X2)=(0,1))=?

- Full Bayes:
P(Y=1)=?

P((X1,X2)=(0,1)|Y=1)=?

- Naïve Bayes:
P(Y=1)=?

P((X1,X2)=(0,1)|Y=1)=?

- XOR

- Prediction of continuous variables
- e.g. I want to predict salaries from GPA.
- I can regress that …

- e.g. I want to predict salaries from GPA.
- Learn the mapping f: X Y
- Model is linear in the parameters (+ some noise)
linear regression

- Assume Gaussian noise
- Learn MLE Θ

- Model is linear in the parameters (+ some noise)

- Normal linear regression
or equivalently,

- MLEΘ?
- MLE σ2 ?

- What if the inputs are vectors?
- Write matrix X and Y :
(n data points, k features for each data)

- MLE Θ =

- Write matrix X and Y :

- We may expect linear data that does not go through the origin
- Trick?

- Assume the following model to fit the data. The model has one unknownparameter θ to be learned from data.
- A maximum likelihood estimation of θ?