- 116 Views
- Uploaded on
- Presentation posted in: General

Regression

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Regression

-.04

.98

-.79

DATA

REGRESSION LINE

DATA

REGRESSION CURVE

Minimize SSE over possible parameter values

Intercept parameter is significant at .0623 level

Slope parameter is significant at .001 level, so reject

Residual Standard Error:

R-squared is the correlation squared, also % of variation

explained by the linear regression

Example: we could try to predict change in diameter

using both change in height as well as starting height

and Fertilizer

- All variables are significant at .05 level
- The Error went down and R-squared went up (this is good)
- Can even handle categorical variables

Music Year

Timbre (90 attributes)

http://archive.ics.uci.edu/ml/datasets/YearPredictionMSD

- Let’s “train” (fit) different models to a training data set
- Then see how well they do at predicting a different “validation” data set (this is how ML competitions on Kaggle work)

- Create a random sample of size 10000 from original 515,345 songs
- Assign first 5000 to training data set, second 5000 are saved for validation

- Fit linear model and generalized boosting regression model (other popular choices include random forests and neural networks)
- The period after the tilde denotes we will use all 91 variables for training, the –V1 throws out V1 (since this is what we’re predicting)

- Next we make predictions for the validation data set
- We compare the models by calculating the sum of squares error (SSE) for each model