Linear regression models
This presentation is the property of its rightful owner.
Sponsored Links
1 / 34

Linear Regression Models PowerPoint PPT Presentation


  • 59 Views
  • Uploaded on
  • Presentation posted in: General

Linear Regression Models. Andy Wang CIS 5930-03 Computer Systems Performance Analysis. Linear Regression Models. What is a (good) model? Estimating model parameters Allocating variation Confidence intervals for regressions Verifying assumptions visually. What Is a (Good) Model?.

Download Presentation

Linear Regression Models

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Linear regression models

Linear Regression Models

Andy Wang

CIS 5930-03

Computer Systems

Performance Analysis


Linear regression models1

Linear Regression Models

  • What is a (good) model?

  • Estimating model parameters

  • Allocating variation

  • Confidence intervals for regressions

  • Verifying assumptions visually


What is a good model

What Is a (Good) Model?

  • For correlated data, model predicts response given an input

  • Model should be equation that fits data

  • Standard definition of “fits” is least-squares

    • Minimize squared error

    • Keep mean error zero

    • Minimizes variance of errors


Least squared error

Least-Squared Error

  • If then error in estimate for xi is

  • Minimize Sum of Squared Errors (SSE)

  • Subject to the constraint


Estimating model parameters

Estimating Model Parameters

  • Best regression parameters are

    where

  • Note error in book!


Parameter estimation example

Parameter EstimationExample

  • Execution time of a script for various loop counts:

  • = 6.8, = 2.32, xy = 88.7, x2 = 264

  • b0 = 2.32  (0.30)(6.8) = 0.28


Graph of parameter estimation example

Graph of Parameter Estimation Example


Allocating variation

Allocating Variation

  • If no regression, best guess of y is

  • Observed values of y differ from , giving rise to errors (variance)

  • Regression gives better guess, but there are still errors

  • We can evaluate quality of regression by allocating sources of errors


The total sum of squares

The Total Sum of Squares

  • Without regression, squared error is


The sum of squares from regression

The Sum of Squaresfrom Regression

  • Recall that regression error is

  • Error without regression is SST

  • So regression explains SSR = SST - SSE

  • Regression quality measured by coefficient of determination


Evaluating coefficient of determination

Evaluating Coefficientof Determination

  • Compute

  • Compute

  • Compute

  • where R = R(x,y) = correlation(x,y)


Example of coefficient of determination

Example of Coefficientof Determination

  • For previous regression example

    • y = 11.60, y2 = 29.88, xy = 88.7,

    • SSE = 29.88-(0.28)(11.60)-(0.30)(88.7) = 0.028

    • SST = 29.88-26.9 = 2.97

    • SSR = 2.97-.03 = 2.94

    • R2 = (2.97-0.03)/2.97 = 0.99


Standard deviation of errors

Standard Deviationof Errors

  • Variance of errors is SSE divided by degrees of freedom

    • DOF is n2 because we’ve calculated 2 regression parameters from the data

    • So variance (mean squared error, MSE) is SSE/(n2)

  • Standard dev. of errors is square root:(minor error in book)


Checking degrees of freedom

Checking Degreesof Freedom

  • Degrees of freedom always equate:

    • SS0 has 1 (computed from )

    • SST has n1 (computed from data and , which uses up 1)

    • SSE has n2 (needs 2 regression parameters)

    • So


Example of standard deviation of errors

Example ofStandard Deviation of Errors

  • For regression example, SSE was 0.03, so MSE is 0.03/3 = 0.01 and se = 0.10

  • Note high quality of our regression:

    • R2 = 0.99

    • se = 0.10


Confidence intervals for regressions

Confidence Intervals for Regressions

  • Regression is done from a single population sample (size n)

    • Different sample might give different results

    • True model is y = 0 + 1x

    • Parameters b0 and b1 are really means taken from a population sample


Calculating intervals for regression parameters

Calculating Intervalsfor Regression Parameters

  • Standard deviations of parameters:

  • Confidence intervals arewhere t has n - 2 degrees of freedom

    • Not divided by sqrt(n)


Example of regression confidence intervals

Example of Regression Confidence Intervals

  • Recall se = 0.13, n = 5, x2 = 264, = 6.8

  • So

  • Using 90% confidence level, t0.95;3 = 2.353


Regression confidence example cont d

Regression Confidence Example, cont’d

  • Thus, b0 interval is

    • Not significant at 90%

  • And b1 is

    • Significant at 90% (and would survive even 99.9% test)


Confidence intervals for predictions

Confidence Intervalsfor Predictions

  • Previous confidence intervals are for parameters

    • How certain can we be that the parameters are correct?

  • Purpose of regression is prediction

    • How accurate are the predictions?

    • Regression gives mean of predicted response, based on sample we took


Predicting m samples

Predicting m Samples

  • Standard deviation for mean of future sample of m observations at xpis

  • Note deviation drops as m 

  • Variance minimal at x =

  • Use t-quantiles with n–2 DOF for interval


Example of confidence of predictions

Example of Confidenceof Predictions

  • Using previous equation, what is predicted time for a single run of 8 loops?

  • Time = 0.28 + 0.30(8) = 2.68

  • Standard deviation of errors se = 0.10

  • 90% interval is then


Prediction confidence

Prediction Confidence


Verifying assumptions visually

Verifying Assumptions Visually

  • Regressions are based on assumptions:

    • Linear relationship between response y and predictor x

      • Or nonlinear relationship used in fitting

    • Predictor x nonstochastic and error-free

    • Model errors statistically independent

      • With distribution N(0,c) for constant c

  • If assumptions violated, model misleading or invalid


Testing linearity

Testing Linearity

  • Scatter plot x vs. y to see basic curve type

Linear

Piecewise Linear

Outlier

Nonlinear (Power)


Testing independence of errors

TestingIndependence of Errors

  • Scatter-plot i versus

  • Should be no visible trend

  • Example from our curve fit:


More on testing independence

More on Testing Independence

  • May be useful to plot error residuals versus experiment number

    • In previous example, this gives same plot except for x scaling

  • No foolproof tests

    • “Independence” test really disproves particular dependence

    • Maybe next test will show different dependence


Testing for normal errors

Testing for Normal Errors

  • Prepare quantile-quantile plot of errors

  • Example for our regression:


Testing for constant standard deviation

Testing for Constant Standard Deviation

  • Tongue-twister: homoscedasticity

  • Return to independence plot

  • Look for trend in spread

  • Example:


Linear regression can be misleading

Linear RegressionCan Be Misleading

  • Regression throws away some information about the data

    • To allow more compact summarization

  • Sometimes vital characteristics are thrown away

    • Often, looking at data plots can tell you whether you will have a problem


Example of misleading regression

Example ofMisleading Regression

IIIIIIIV

xyxyxyxy

108.04109.14107.4686.58

86.9588.1486.7785.76

137.58138.741312.7487.71

98.8198.7797.1188.84

118.33119.26117.8188.47

149.96148.10148.8487.04

67.2466.1366.0885.25

44.2643.1045.391912.50

1210.84129.13128.1585.56

74.8277.2676.4287.91

55.6854.7455.7386.89


What does regression tell us about these data sets

What Does Regression Tell Us About These Data Sets?

  • Exactly the same thing for each!

  • N = 11

  • Mean of y = 7.5

  • Y = 3 + .5 X

  • Standard error of regression is 0.118

  • All the sums of squares are the same

  • Correlation coefficient = .82

  • R2 = .67


Now look at the data plots

Now Look at the Data Plots

I

II

III

IV


White slide

White Slide


  • Login