- 204 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'AMCS/CS 340: Data Mining' - manton

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

RegressionAMCS/CS 340: Data Mining

Xiangliang Zhang

King Abdullah University of Science and Technology

Outline

- What is regression?
- Minimizing sum-of-square error
- Linear regression
- Nonlinear regression
- Statistic models for regression
- Overfitting and Cross validation for regression

2

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

Classification (reminder)

X Y

- Anything:
- continuous (,d, …)
- discrete ({0,1}, {1,…k}, …)
- structured (tree, string, …)
- …

- discrete:
- {0,1} binary
- {1,…k} multi-class
- tree, etc. structured

3

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

Regression

X Y

- continuous:
- , d

- Anything:
- continuous (,d, …)
- discrete ({0,1}, {1,…k}, …)
- structured (tree, string, …)
- …

- discrete:
- {0,1} binary
- {1,…k} multi-class
- tree, etc. structured

4

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

Examples

- Data Predicting or Forecasting:
- Processes, memory Power consumption
- Protein structure Energy
- Heart-beat rate, age, speed, duration Fat
- Oil supply, consumption, etc Oil price
- ……

5

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

Linear regression

40

26

24

Temperature

22

20

20

30

40

20

30

20

10

0

10

0

10

20

0

0

Given examples

given a new point

Predict

6

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

Outline

- What is regression?
- Minimizing sum-of-square error
- Linear regression
- Nonlinear regression
- Statistic models for regression
- Overfitting and Cross validation for regression

8

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

Sum-of-Squares Error Function

Observation

Prediction

Error or “residual”

minimizing

Sum squared error

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

Sum-of-Squares Error Function

Minimizing E(w)

to find w*

E(w) ---- quadratic function of w

If derivative of E(w) w.r.t. w--- linear of w

unique solution for minimizing E(w)

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

Least Squares

Error or “residual”

Observation

Prediction

0

0

20

Estimate the unknown parameter w by minimizing

11

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

Minimize the sum squared error

Sum squared error

=0

Predict

http://www.lri.fr/~xlzhang/KAUST/CS340_slides/linear_regression_demo1.m

12

Linear regression models

Generally

where Фj(x)are known as basis functions.

Typically, Ф0(x) = 1, so that w0 acts as a bias.

e.g. = [0 x x2 x3 ]

Linear regression models

Example: Polynomial Curve Fitting

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

Basis Functions

Polynomial basis functions: These are global; a small change in xaffectsall basis functions.

Gaussianbasis functions:

Sigmoidal basis functions:

where

Gaussian and Sigmoidal basis functions are local; a small change in x only affect nearby basis functions. μj and s control location and scale (width or slope).

Minimize the sum squared error

Sum squared error

=0

Predict

http://www.lri.fr/~xlzhang/KAUST/CS340_slides/linear_regression_demo2.m

16

Batch gradient descent algorithm (1)

If derivative of E(w) w.r.t. wis not linear of w,

Batch gradient descent

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

Batch gradient descent algorithm (2)

Batch gradient descent example

Update the value of w according to the gradient

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

Batch gradient descent algorithm (3)

Batch gradient descent example

Iteratively approaches the optimum of the Error function

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

Statistical models for regression

- Machine Learning paradigms
- Regression Tree (CART)
- Neural Networks
- Support Vector Machine
- Strength
- flexibility within a nonparametric, assumption-free openwork that accommodates big data
- Weakness
- difficulty in interpreting the effects of each covariate

20

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

SVM Regression

Given training data

Find: ,

such that optimally describes the data:

Sum of errors

●

●

●

Subject to:

●

●

●

●

“Support vectors”

22

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

Outline

- What is regression?
- Minimizing sum-of-square error
- Overfittingand Cross validation for regression

24

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

Regression overfitting

- Which one is the best ?
- The one with best fit to the data ?
- How well is it going to predict future data drawn from the same distribution?

x

25

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

Outline

- What is regression?
- Minimizing sum-of-square error
- Overfittingand Cross validation for regression
- Test set method
- Leave-one-out
- K-fold cross validation

26

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

The test set method

- Pros
- very simple
- Can then simply choose the method with the best test-set score
- Cons
- Wastes data: we get an estimate of the best method to apply to 30% less data
- If we don’t have much data, our test-set might just be lucky or unlucky (“test-set estimator of performancehas high variance”)

30

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

Outline

- What is regression?
- Minimizing sum-of-square error
- Overfittingand Cross validation for regression
- Test set method
- Leave-one-out
- K-fold cross validation

31

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

LOOCV (Leave-one-out Cross Validation)

When you have done all points, report the mean error

36

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

LOOCV for Linear Regression

For k=1 to N

1. Let (xk,yk) be the k-th record

2. Temporarily remove (xk,yk) from the dataset

3. Train on the remaining N-1 data points

4. Note your error (xk,yk).

When you’ve done all points, report the mean error.

MSELOOCV = 2.12

37

LOOCV for Quadratic Regression

For k=1 to N

1. Let (xk,yk) be the k-th record

2. Temporarily remove (xk,yk) from the dataset

3. Train on the remaining N-1 data points

4. Note your error (xk,yk).

When you’ve done all points, report the mean error.

MSELOOCV = 0.962

38

LOOCV for Join the Dots

For k=1 to N

1. Let (xk,yk) be the k-th record

2. Temporarily remove (xk,yk) from the dataset

3. Train on the remaining N-1 data points

4. Note your error (xk,yk).

When you’ve done all points, report the mean error.

MSELOOCV = 3.33

39

Which one of validations ?

k-fold Cross Validation

gets the best of both worlds

40

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

Outline

- What is regression?
- Minimizing sum-of-square error
- Overfittingand Cross validation for regression
- Test set method
- Leave-one-out
- K-fold cross validation

41

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

k-fold Cross Validation

Randomly break the dataset into k partitions (in our example we’ll have k=3 partitions colored RedGreen and Blue)

42

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

k-fold Cross Validation

Randomly break the dataset into k partitions (in our example we’ll have k=3 partitions colored RedGreen and Blue)

For the red partition: Train on all the points not in the red partition. Find the test-set sum of errors on the red points.

43

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

k-fold Cross Validation

Randomly break the dataset into k partitions (in our example we’ll have k=3 partitions colored RedGreen and Blue)

For the red partition: Train on all the points not in the red partition. Find the test-set sum of errors on the red points.

For the green partition: Train on all the points not in the green partition. Find the test-set sum of errors on the green points.

44

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

k-fold Cross ValidationRandomly break the dataset into k partitions (in our example we’ll have k=3 partitions colored RedGreen and Blue)

For the red partition: Train on all the points not in the red partition. Find the test-set sum of errors on the red points.

For the green partition: Train on all the points not in the green partition. Find the test-set sum of errors on the green points.

For the blue partition: Train on all the points not in the blue partition. Find the test-set sum of errors on the blue points.

45

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

k-fold Cross ValidationRandomly break the dataset into k partitions (in our example we’ll have k=3 partitions colored RedGreen and Blue)

For the red partition: Train on all the points not in the red partition. Find the test-set sum of errors on the red points.

For the green partition: Train on all the points not in the green partition. Find the test-set sum of errors on the green points.

For the blue partition: Train on all the points not in the blue partition. Find the test-set sum of errors on the blue points.

Then report the mean error.

Linear Regression MSE3FOLD = 2.05

46

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

k-fold Cross ValidationRandomly break the dataset into k partitions (in our example we’ll have k=3 partitions colored RedGreen and Blue)

For the red partition: Train on all the points not in the red partition. Find the test-set sum of errors on the red points.

For the green partition: Train on all the points not in the green partition. Find the test-set sum of errors on the green points.

For the blue partition: Train on all the points not in the blue partition. Find the test-set sum of errors on the blue points.

Then report the mean error.

Quadratic Regression MSE3FOLD = 1.11

47

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

k-fold Cross ValidationRandomly break the dataset into k partitions (in our example we’ll have k=3 partitions colored RedGreen and Blue)

For the red partition: Train on all the points not in the red partition. Find the test-set sum of errors on the red points.

For the green partition: Train on all the points not in the green partition. Find the test-set sum of errors on the green points.

For the blue partition: Train on all the points not in the blue partition. Find the test-set sum of errors on the blue points.

Then report the mean error.

Join the Dots

MSE3FOLD = 2.93

48

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

CV-based Regression Algorithm Choice

- Choosing which regression algorithm to use
- Step 1: Compute 10-fold-CV error for six different model
- Step 2: Whichever algorithm gave best CV score: train it with all the data, and that’s the predictive model you’ll use.

49

Xiangliang Zhang, KAUST AMCS/CS 340: Data Mining

Download Presentation

Connecting to Server..