- 225 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'Bayes Classifier , Linear Regression' - stevie

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Bayes Classifier,Linear Regression

10701/15781 Recitation

January 29, 2008

Parts of the slides are from previous years’ recitation and lecture notes, and from Prof. Andrew Moore’s data mining tutorials.

Classification and Regression

- Classification
- Goal: Learn the underlying function

f: X (features)Y (class, or category)

e.g. words “spam”, or “not spam”

- Regression

f: X (features) Y (continuous values)

e.g. GPA salary

Supervised Classification

- How to find an unknown function

f: X Y

(features class)

or equivalently P(Y|X)

- Classifier:
- Find P(X|Y), P(Y), and use Bayes rule - generative
- Find P(Y|X) directly - discriminative

Classification

Learn P(Y|X)

1. Bayes rule:

P(Y|X) = P(X|Y)P(Y) / P(X) ~ P(X|Y)P(Y)

- Learn P(X|Y), P(Y)
- “Generative” classifier

2. Learn P(Y|X) directly

- “Discriminative”(to be covered later in class)
- e.g. logistic regression

Generative Classifier: Bayes Classifier

Learn P(X|Y), P(Y)

- e.g. email classification problem
- 3 classes for Y = { spam, not spam, maybe }
- 10,000 binary features for X = {“Cash”, “Rolex”,…}
- How many parameters do we have?
- P(Y) :
- P(X|Y) :

Generative learning:Naïve Bayes

- Introduce conditional independence

P(X1,X2|Y) = P(X1 |Y) P(X2 |Y)

P(Y|X) = P(X|Y)P(Y) / P(X) for X=(Xi,…,Xn)

= P(X1|Y)…P(Xn|Y)P(Y) / P(X)

= prodi P(Xi|Y) P(Y) / P(X)

- Learn P(X1|Y), … P(Xn|Y), P(Y)

instead of learning P(X1,…, Xn |Y) directly

Naïve Bayes

- 3 classes for Y = {spam, not spam, maybe}
- 10,000 binary features for X = {“Cash”,”Rolex”,…}
- Now, how many parameters?
- P(Y)
- P(X|Y)

- fewer parameters
- “simpler” – less likely to overfit

Full Bayes vs. Naïve Bayes

P(Y=1|(X1,X2)=(0,1))=?

- Full Bayes:

P(Y=1)=?

P((X1,X2)=(0,1)|Y=1)=?

- Naïve Bayes:

P(Y=1)=?

P((X1,X2)=(0,1)|Y=1)=?

- XOR

Regression

- Prediction of continuous variables
- e.g. I want to predict salaries from GPA.
- I can regress that …
- Learn the mapping f: X Y
- Model is linear in the parameters (+ some noise)

linear regression

- Assume Gaussian noise
- Learn MLE Θ

Multivariate linear regression

- What if the inputs are vectors?
- Write matrix X and Y :

(n data points, k features for each data)

- MLE Θ =

Constant term?

- We may expect linear data that does not go through the origin
- Trick?

Regression: another example

- Assume the following model to fit the data. The model has one unknownparameter θ to be learned from data.
- A maximum likelihood estimation of θ?

Download Presentation

Connecting to Server..