AMCS/CS 340: Data Mining

1 / 49

AMCS/CS 340: Data Mining - PowerPoint PPT Presentation

Regression. AMCS/CS 340: Data Mining. Xiangliang Zhang King Abdullah University of Science and Technology. Outline. What is regression? Minimizing sum-of-square error Linear regression Nonlinear regression Statistic models for regression Overfitting and Cross validation for regression.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

PowerPoint Slideshow about 'AMCS/CS 340: Data Mining' - manton

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
RegressionAMCS/CS 340: Data Mining

Xiangliang Zhang

King Abdullah University of Science and Technology

Outline
• What is regression?
• Minimizing sum-of-square error
• Linear regression
• Nonlinear regression
• Statistic models for regression
• Overfitting and Cross validation for regression

2

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

Classification (reminder)

X  Y

• Anything:
• continuous (,d, …)
• discrete ({0,1}, {1,…k}, …)
• structured (tree, string, …)
• discrete:
• {0,1} binary
• {1,…k} multi-class
• tree, etc. structured

3

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

Regression

X  Y

• continuous:
• , d
• Anything:
• continuous (,d, …)
• discrete ({0,1}, {1,…k}, …)
• structured (tree, string, …)
• discrete:
• {0,1} binary
• {1,…k} multi-class
• tree, etc. structured

4

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

Examples
• Data  Predicting or Forecasting:
• Processes, memory  Power consumption
• Protein structure  Energy
• Heart-beat rate, age, speed, duration  Fat
• Oil supply, consumption, etc  Oil price
• ……

5

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

Linear regression

40

26

24

Temperature

22

20

20

30

40

20

30

20

10

0

10

0

10

20

0

0

Given examples

given a new point

Predict

6

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

26

24

22

20

30

40

20

30

Prediction

20

10

Prediction

10

0

0

Linear regression

40

Temperature

20

0

0

20

7

Outline
• What is regression?
• Minimizing sum-of-square error
• Linear regression
• Nonlinear regression
• Statistic models for regression
• Overfitting and Cross validation for regression

8

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

Sum-of-Squares Error Function

Observation

Prediction

Error or “residual”

minimizing

Sum squared error

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

Sum-of-Squares Error Function

Minimizing E(w)

to find w*

E(w) ---- quadratic function of w

If derivative of E(w) w.r.t. w--- linear of w

unique solution for minimizing E(w)

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

Sum squared error

Least Squares

Error or “residual”

Observation

Prediction

0

0

20

Estimate the unknown parameter w by minimizing

11

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

Minimize the sum squared error

Sum squared error

=0

Predict

http://www.lri.fr/~xlzhang/KAUST/CS340_slides/linear_regression_demo1.m

12

Linear regression models

Generally

where Фj(x)are known as basis functions.

Typically, Ф0(x) = 1, so that w0 acts as a bias.

e.g. = [0 x x2 x3 ]

Linear regression models

Example: Polynomial Curve Fitting

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

Basis Functions

Polynomial basis functions: These are global; a small change in xaffectsall basis functions.

Gaussianbasis functions:

Sigmoidal basis functions:

where

Gaussian and Sigmoidal basis functions are local; a small change in x only affect nearby basis functions. μj and s control location and scale (width or slope).

Minimize the sum squared error

Sum squared error

=0

Predict

http://www.lri.fr/~xlzhang/KAUST/CS340_slides/linear_regression_demo2.m

16

If derivative of E(w) w.r.t. wis not linear of w,

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

Update the value of w according to the gradient

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

Iteratively approaches the optimum of the Error function

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

Statistical models for regression
• Regression Tree (CART)
• Neural Networks
• Support Vector Machine
• Strength
• flexibility within a nonparametric, assumption-free openwork that accommodates big data
• Weakness
• difficulty in interpreting the effects of each covariate

20

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

Regression Tree

Target value

21

http://chem-eng.utoronto.ca/~datamining/dmc/decision_tree_reg.htm

SVM Regression

Given training data

Find: ,

such that optimally describes the data:

Sum of errors

Subject to:

“Support vectors”

22

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

NN Regression

Minimize

23

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

Outline
• What is regression?
• Minimizing sum-of-square error
• Overfittingand Cross validation for regression

24

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

Regression overfitting
• Which one is the best ?
• The one with best fit to the data ?
• How well is it going to predict future data drawn from the same distribution?

x

25

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

Outline
• What is regression?
• Minimizing sum-of-square error
• Overfittingand Cross validation for regression
• Test set method
• Leave-one-out
• K-fold cross validation

26

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

The test set method

27

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

The test set method

28

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

The test set method

29

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

The test set method
• Pros
• very simple
• Can then simply choose the method with the best test-set score
• Cons
• Wastes data: we get an estimate of the best method to apply to 30% less data
• If we don’t have much data, our test-set might just be lucky or unlucky (“test-set estimator of performancehas high variance”)

30

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

Outline
• What is regression?
• Minimizing sum-of-square error
• Overfittingand Cross validation for regression
• Test set method
• Leave-one-out
• K-fold cross validation

31

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

LOOCV (Leave-one-out Cross Validation)

32

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

LOOCV (Leave-one-out Cross Validation)

33

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

LOOCV (Leave-one-out Cross Validation)

34

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

LOOCV (Leave-one-out Cross Validation)

35

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

LOOCV (Leave-one-out Cross Validation)

When you have done all points, report the mean error

36

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

LOOCV for Linear Regression

For k=1 to N

1. Let (xk,yk) be the k-th record

2. Temporarily remove (xk,yk) from the dataset

3. Train on the remaining N-1 data points

When you’ve done all points, report the mean error.

MSELOOCV = 2.12

37

For k=1 to N

1. Let (xk,yk) be the k-th record

2. Temporarily remove (xk,yk) from the dataset

3. Train on the remaining N-1 data points

When you’ve done all points, report the mean error.

MSELOOCV = 0.962

38

LOOCV for Join the Dots

For k=1 to N

1. Let (xk,yk) be the k-th record

2. Temporarily remove (xk,yk) from the dataset

3. Train on the remaining N-1 data points

When you’ve done all points, report the mean error.

MSELOOCV = 3.33

39

Which one of validations ?

k-fold Cross Validation

gets the best of both worlds

40

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

Outline
• What is regression?
• Minimizing sum-of-square error
• Overfittingand Cross validation for regression
• Test set method
• Leave-one-out
• K-fold cross validation

41

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

k-fold Cross Validation

Randomly break the dataset into k partitions (in our example we’ll have k=3 partitions colored RedGreen and Blue)

42

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

k-fold Cross Validation

Randomly break the dataset into k partitions (in our example we’ll have k=3 partitions colored RedGreen and Blue)

For the red partition: Train on all the points not in the red partition. Find the test-set sum of errors on the red points.

43

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

k-fold Cross Validation

Randomly break the dataset into k partitions (in our example we’ll have k=3 partitions colored RedGreen and Blue)

For the red partition: Train on all the points not in the red partition. Find the test-set sum of errors on the red points.

For the green partition: Train on all the points not in the green partition. Find the test-set sum of errors on the green points.

44

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

k-fold Cross Validation

Randomly break the dataset into k partitions (in our example we’ll have k=3 partitions colored RedGreen and Blue)

For the red partition: Train on all the points not in the red partition. Find the test-set sum of errors on the red points.

For the green partition: Train on all the points not in the green partition. Find the test-set sum of errors on the green points.

For the blue partition: Train on all the points not in the blue partition. Find the test-set sum of errors on the blue points.

45

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

k-fold Cross Validation

Randomly break the dataset into k partitions (in our example we’ll have k=3 partitions colored RedGreen and Blue)

For the red partition: Train on all the points not in the red partition. Find the test-set sum of errors on the red points.

For the green partition: Train on all the points not in the green partition. Find the test-set sum of errors on the green points.

For the blue partition: Train on all the points not in the blue partition. Find the test-set sum of errors on the blue points.

Then report the mean error.

Linear Regression MSE3FOLD = 2.05

46

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

k-fold Cross Validation

Randomly break the dataset into k partitions (in our example we’ll have k=3 partitions colored RedGreen and Blue)

For the red partition: Train on all the points not in the red partition. Find the test-set sum of errors on the red points.

For the green partition: Train on all the points not in the green partition. Find the test-set sum of errors on the green points.

For the blue partition: Train on all the points not in the blue partition. Find the test-set sum of errors on the blue points.

Then report the mean error.

47

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

k-fold Cross Validation

Randomly break the dataset into k partitions (in our example we’ll have k=3 partitions colored RedGreen and Blue)

For the red partition: Train on all the points not in the red partition. Find the test-set sum of errors on the red points.

For the green partition: Train on all the points not in the green partition. Find the test-set sum of errors on the green points.

For the blue partition: Train on all the points not in the blue partition. Find the test-set sum of errors on the blue points.

Then report the mean error.

Join the Dots

MSE3FOLD = 2.93

48

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

CV-based Regression Algorithm Choice
• Choosing which regression algorithm to use
• Step 1: Compute 10-fold-CV error for six different model
• Step 2: Whichever algorithm gave best CV score: train it with all the data, and that’s the predictive model you’ll use.

49

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining