regression n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
AMCS/CS 340: Data Mining PowerPoint Presentation
Download Presentation
AMCS/CS 340: Data Mining

Loading in 2 Seconds...

play fullscreen
1 / 49

AMCS/CS 340: Data Mining - PowerPoint PPT Presentation


  • 187 Views
  • Uploaded on

Regression. AMCS/CS 340: Data Mining. Xiangliang Zhang King Abdullah University of Science and Technology. Outline. What is regression? Minimizing sum-of-square error Linear regression Nonlinear regression Statistic models for regression Overfitting and Cross validation for regression.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'AMCS/CS 340: Data Mining' - manton


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
regression
RegressionAMCS/CS 340: Data Mining

Xiangliang Zhang

King Abdullah University of Science and Technology

outline
Outline
  • What is regression?
  • Minimizing sum-of-square error
    • Linear regression
    • Nonlinear regression
    • Statistic models for regression
  • Overfitting and Cross validation for regression

2

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

classification reminder
Classification (reminder)

X  Y

  • Anything:
  • continuous (,d, …)
  • discrete ({0,1}, {1,…k}, …)
  • structured (tree, string, …)
  • discrete:
    • {0,1} binary
    • {1,…k} multi-class
    • tree, etc. structured

3

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

regression1
Regression

X  Y

  • continuous:
    • , d
  • Anything:
  • continuous (,d, …)
  • discrete ({0,1}, {1,…k}, …)
  • structured (tree, string, …)
  • discrete:
    • {0,1} binary
    • {1,…k} multi-class
    • tree, etc. structured

4

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

examples
Examples
  • Data  Predicting or Forecasting:
    • Processes, memory  Power consumption
    • Protein structure  Energy
    • Heart-beat rate, age, speed, duration  Fat
    • Oil supply, consumption, etc  Oil price
    • ……

5

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

linear regression
Linear regression

40

26

24

Temperature

22

20

20

30

40

20

30

20

10

0

10

0

10

20

0

0

Given examples

given a new point

Predict

6

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

linear regression1

26

24

22

20

30

40

20

30

Prediction

20

10

Prediction

10

0

0

Linear regression

40

Temperature

20

0

0

20

7

outline1
Outline
  • What is regression?
  • Minimizing sum-of-square error
    • Linear regression
    • Nonlinear regression
    • Statistic models for regression
  • Overfitting and Cross validation for regression

8

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

sum of squares error function
Sum-of-Squares Error Function

Observation

Prediction

Error or “residual”

minimizing

Sum squared error

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

sum of squares error function1
Sum-of-Squares Error Function

Minimizing E(w)

to find w*

E(w) ---- quadratic function of w

If derivative of E(w) w.r.t. w--- linear of w

unique solution for minimizing E(w)

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

least squares

Sum squared error

Least Squares

Error or “residual”

Observation

Prediction

0

0

20

Estimate the unknown parameter w by minimizing

11

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

minimize the sum squared error
Minimize the sum squared error

Sum squared error

=0

Predict

http://www.lri.fr/~xlzhang/KAUST/CS340_slides/linear_regression_demo1.m

12

linear regression models
Linear regression models

Generally

where Фj(x)are known as basis functions.

Typically, Ф0(x) = 1, so that w0 acts as a bias.

e.g. = [0 x x2 x3 ]

linear regression models1
Linear regression models

Example: Polynomial Curve Fitting

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

basis functions
Basis Functions

Polynomial basis functions: These are global; a small change in xaffectsall basis functions.

Gaussianbasis functions:

Sigmoidal basis functions:

where

Gaussian and Sigmoidal basis functions are local; a small change in x only affect nearby basis functions. μj and s control location and scale (width or slope).

minimize the sum squared error1
Minimize the sum squared error

Sum squared error

=0

Predict

http://www.lri.fr/~xlzhang/KAUST/CS340_slides/linear_regression_demo2.m

16

batch gradient descent algorithm 1
Batch gradient descent algorithm (1)

If derivative of E(w) w.r.t. wis not linear of w,

Batch gradient descent

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

batch gradient descent algorithm 2
Batch gradient descent algorithm (2)

Batch gradient descent example

Update the value of w according to the gradient

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

batch gradient descent algorithm 3
Batch gradient descent algorithm (3)

Batch gradient descent example

Iteratively approaches the optimum of the Error function

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

statistical models for regression
Statistical models for regression
  • Machine Learning paradigms
    • Regression Tree (CART)
    • Neural Networks
    • Support Vector Machine
  • Strength
    • flexibility within a nonparametric, assumption-free openwork that accommodates big data
  • Weakness
    • difficulty in interpreting the effects of each covariate

20

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

regression tree
Regression Tree

Target value

21

http://chem-eng.utoronto.ca/~datamining/dmc/decision_tree_reg.htm

svm regression
SVM Regression

Given training data

Find: ,

such that optimally describes the data:

Sum of errors

Subject to:

“Support vectors”

22

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

nn regression
NN Regression

Minimize

23

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

outline2
Outline
  • What is regression?
  • Minimizing sum-of-square error
  • Overfittingand Cross validation for regression

24

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

regression overfitting
Regression overfitting
  • Which one is the best ?
  • The one with best fit to the data ?
  • How well is it going to predict future data drawn from the same distribution?

x

25

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

outline3
Outline
  • What is regression?
  • Minimizing sum-of-square error
  • Overfittingand Cross validation for regression
    • Test set method
    • Leave-one-out
    • K-fold cross validation

26

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

the test set method
The test set method

27

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

the test set method1
The test set method

28

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

the test set method2
The test set method

29

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

the test set method3
The test set method
  • Pros
    • very simple
    • Can then simply choose the method with the best test-set score
  • Cons
    • Wastes data: we get an estimate of the best method to apply to 30% less data
    • If we don’t have much data, our test-set might just be lucky or unlucky (“test-set estimator of performancehas high variance”)

30

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

outline4
Outline
  • What is regression?
  • Minimizing sum-of-square error
  • Overfittingand Cross validation for regression
    • Test set method
    • Leave-one-out
    • K-fold cross validation

31

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

loocv leave one out cross validation
LOOCV (Leave-one-out Cross Validation)

32

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

loocv leave one out cross validation1
LOOCV (Leave-one-out Cross Validation)

33

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

loocv leave one out cross validation2
LOOCV (Leave-one-out Cross Validation)

34

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

loocv leave one out cross validation3
LOOCV (Leave-one-out Cross Validation)

35

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

loocv leave one out cross validation4
LOOCV (Leave-one-out Cross Validation)

When you have done all points, report the mean error

36

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

loocv for linear regression
LOOCV for Linear Regression

For k=1 to N

1. Let (xk,yk) be the k-th record

2. Temporarily remove (xk,yk) from the dataset

3. Train on the remaining N-1 data points

4. Note your error (xk,yk).

When you’ve done all points, report the mean error.

MSELOOCV = 2.12

37

loocv for quadratic regression
LOOCV for Quadratic Regression

For k=1 to N

1. Let (xk,yk) be the k-th record

2. Temporarily remove (xk,yk) from the dataset

3. Train on the remaining N-1 data points

4. Note your error (xk,yk).

When you’ve done all points, report the mean error.

MSELOOCV = 0.962

38

loocv for join the dots
LOOCV for Join the Dots

For k=1 to N

1. Let (xk,yk) be the k-th record

2. Temporarily remove (xk,yk) from the dataset

3. Train on the remaining N-1 data points

4. Note your error (xk,yk).

When you’ve done all points, report the mean error.

MSELOOCV = 3.33

39

which one of validations
Which one of validations ?

k-fold Cross Validation

gets the best of both worlds

40

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

outline5
Outline
  • What is regression?
  • Minimizing sum-of-square error
  • Overfittingand Cross validation for regression
    • Test set method
    • Leave-one-out
    • K-fold cross validation

41

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

k fold cross validation
k-fold Cross Validation

Randomly break the dataset into k partitions (in our example we’ll have k=3 partitions colored RedGreen and Blue)

42

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

k fold cross validation1
k-fold Cross Validation

Randomly break the dataset into k partitions (in our example we’ll have k=3 partitions colored RedGreen and Blue)

For the red partition: Train on all the points not in the red partition. Find the test-set sum of errors on the red points.

43

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

k fold cross validation2
k-fold Cross Validation

Randomly break the dataset into k partitions (in our example we’ll have k=3 partitions colored RedGreen and Blue)

For the red partition: Train on all the points not in the red partition. Find the test-set sum of errors on the red points.

For the green partition: Train on all the points not in the green partition. Find the test-set sum of errors on the green points.

44

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

k fold cross validation3
k-fold Cross Validation

Randomly break the dataset into k partitions (in our example we’ll have k=3 partitions colored RedGreen and Blue)

For the red partition: Train on all the points not in the red partition. Find the test-set sum of errors on the red points.

For the green partition: Train on all the points not in the green partition. Find the test-set sum of errors on the green points.

For the blue partition: Train on all the points not in the blue partition. Find the test-set sum of errors on the blue points.

45

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

k fold cross validation4
k-fold Cross Validation

Randomly break the dataset into k partitions (in our example we’ll have k=3 partitions colored RedGreen and Blue)

For the red partition: Train on all the points not in the red partition. Find the test-set sum of errors on the red points.

For the green partition: Train on all the points not in the green partition. Find the test-set sum of errors on the green points.

For the blue partition: Train on all the points not in the blue partition. Find the test-set sum of errors on the blue points.

Then report the mean error.

Linear Regression MSE3FOLD = 2.05

46

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

k fold cross validation5
k-fold Cross Validation

Randomly break the dataset into k partitions (in our example we’ll have k=3 partitions colored RedGreen and Blue)

For the red partition: Train on all the points not in the red partition. Find the test-set sum of errors on the red points.

For the green partition: Train on all the points not in the green partition. Find the test-set sum of errors on the green points.

For the blue partition: Train on all the points not in the blue partition. Find the test-set sum of errors on the blue points.

Then report the mean error.

Quadratic Regression MSE3FOLD = 1.11

47

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

k fold cross validation6
k-fold Cross Validation

Randomly break the dataset into k partitions (in our example we’ll have k=3 partitions colored RedGreen and Blue)

For the red partition: Train on all the points not in the red partition. Find the test-set sum of errors on the red points.

For the green partition: Train on all the points not in the green partition. Find the test-set sum of errors on the green points.

For the blue partition: Train on all the points not in the blue partition. Find the test-set sum of errors on the blue points.

Then report the mean error.

Join the Dots

MSE3FOLD = 2.93

48

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

cv based regression algorithm choice
CV-based Regression Algorithm Choice
  • Choosing which regression algorithm to use
    • Step 1: Compute 10-fold-CV error for six different model
    • Step 2: Whichever algorithm gave best CV score: train it with all the data, and that’s the predictive model you’ll use.

49

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining