Lab 11
Download
1 / 19

Lab 11 - PowerPoint PPT Presentation


  • 112 Views
  • Uploaded on

Lab 11. Multiple Regression Residuals and Influence. Influence analysis. Standardized B weights. Residuals analysis. Multiple Regression Syntax. Proc Reg; Model dv = iv1 iv2 / stb R influence ; Plot dv*iv1; Plot dv*iv2; Plot dv*p.; Plot p.*r.; Run ;. Example.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Lab 11' - starbuck


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Lab 11

Lab 11

Multiple Regression

Residuals and Influence


Multiple regression syntax

Influence analysis

Standardized B weights

Residuals analysis

Multiple Regression Syntax

Proc Reg;

Model dv = iv1 iv2 / stb R influence;

Plot dv*iv1;

Plot dv*iv2;

Plot dv*p.;

Plot p.*r.;

Run;


Example
Example

Record company wants to know if 1) airplay 2) attractiveness of band and 3) advertising budget contribute significant variance to record sales?


Example program
Example Program

data d2;

infile 'C:\WINDOWS\Desktop\lab11.txt';

input adverts sales airplay attract;

ProcReg;

Model sales = adverts airplay attract / stb R influence;

Plot sales*adverts;

Plot sales*airplay;

Plot sales*attract;

Plot sales*p.;

Plot p.*r.;

Run;







Example output
Example Output

Model: MODEL1

Dependent Variable: sales

Analysis of Variance

Sum of Mean

Source DF Squares Square F Value Pr > F

Model 3 1722754 574251 261.64 <.0001

Error 396 869150 2194.82212

Corrected Total 399 2591904

Root MSE 46.84893 R-Square 0.6647

Dependent Mean 193.20000 Adj R-Sq 0.6621

Coeff Var 24.24893

Parameter Estimates

Parameter Standard Standardized

Variable DF Estimate Error t Value Pr > |t| Estimate

Intercept 1 -26.61294 12.20619 -2.18 0.0298 0

adverts 1 0.08488 0.00487 17.43 <.0001 0.51085

airplay 1 3.36742 0.19542 17.23 <.0001 0.51199

attract 1 11.08634 1.71509 6.46 <.0001 0.19168


Output residual analysis cont
Output Residual Analysis(cont)

Model: MODEL1

Dependent Variable: sales

Output Statistics

Dep Var Predicted Std Error Std Error Student

Obs sales Value Mean Predict Residual Residual Residual

1 330.0000 229.9206 7.1963 100.0794 46.293 2.162

2 330.0000 229.9206 7.1963 100.0794 46.293 2.162

3 120.0000 228.9494 2.9642 -108.9494 46.755 -2.330

4 120.0000 228.9494 2.9642 -108.9494 46.755 -2.330

5 360.0000 291.5573 4.7662 68.4427 46.606 1.469

6 360.0000 291.5573 4.7662 68.4427 46.606 1.469

7 270.0000 262.9757 3.7127 7.0243 46.702 0.150

8 270.0000 262.9757 3.7127 7.0243 46.702 0.150

9 220.0000 225.7525 5.3483 -5.7525 46.543 -0.124

10 220.0000 225.7525 5.3483 -5.7525 46.543 -0.124

11 170.0000 141.0950 3.9487 28.9050 46.682 0.619

12 170.0000 141.0950 3.9487 28.9050 46.682 0.619


What to look for in your outlier results
What to look for in your outlier results

  • Studentized Residuals (takes into account that values of X further from the mean have larger standard errors) – want to identify values greater than 2.

  • Residuals – Look at values greater than 2 x root MSE, 2 x 47 = 94

  • Once you identify the outliers, you want to investigate their influence on your results


Influence output
Influence Output

Model: MODEL1

Dependent Variable: sales

Output Statistics

Cook's Hat Diag Cov

Obs -2-1 0 1 2 D RStudent H Ratio DFFITS

1 | |**** | 0.028 2.1720 0.0236 0.9866 0.3376

2 | |**** | 0.028 2.1720 0.0236 0.9866 0.3376

3 | ****| | 0.005 -2.3434 0.0040 0.9597 -0.1486

4 | ****| | 0.005 -2.3434 0.0040 0.9597 -0.1486

5 | |** | 0.006 1.4707 0.0104 0.9987 0.1504

6 | |** | 0.006 1.4707 0.0104 0.9987 0.1504

7 | | | 0.000 0.1502 0.0063 1.0163 0.0119

8 | | | 0.000 0.1502 0.0063 1.0163 0.0119

9 | | | 0.000 -0.1234 0.0130 1.0233 -0.0142

10 | | | 0.000 -0.1234 0.0130 1.0233 -0.0142

11 | |* | 0.001 0.6187 0.0071 1.0135 0.0523

12 | |* | 0.001 0.6187 0.0071 1.0135 0.0523


What to look for in your influence results
What to look for in your influence results

  • Leverage: an index of the importance of an observation for the regression equation. Function solely of X.

    • Denoted by Hat Diag (H). Want to look for values greater than 2(k+1)/N, where k = number of independent variables. Recall (k+1)/N is the average.

    • 2(3+1)/400 = .02


Influence cont
Influence (cont)

  • Cook’s D – measure of overall influence of a single case on the model. Look for values greater than .2.

  • DFBETA and standardized DFBETA, change in regression values when that case is deleted. You can evaluate the influence on both X and Y. Look for values that are large relative to the other values or look for values greater than:


Influence output cont
Influence Output (cont)

Model: MODEL1

Dependent Variable: sales

Output Statistics

-------------------DFBETAS-------------------

Obs Intercept adverts airplay attract

1 -0.2177 -0.1672 0.1088 0.2438

2 -0.2177 -0.1672 0.1088 0.2438

3 0.0089 -0.0889 0.0066 -0.0131

4 0.0089 -0.0889 0.0066 -0.0131

5 -0.0267 0.1228 0.0327 -0.0038

6 -0.0267 0.1228 0.0327 -0.0038

7 -0.0018 0.0086 0.0024 0.0001

8 -0.0018 0.0086 0.0024 0.0001

9 -0.0060 0.0008 -0.0100 0.0095

10 -0.0060 0.0008 -0.0100 0.0095

11 0.0465 0.0016 -0.0147 -0.0362

12 0.0465 0.0016 -0.0147 -0.0362


Delete outliers greater than 2 studentized residuals
Delete outliers greater than 2 studentized residuals

data d2;

infile 'C:\WINDOWS\Desktop\lab11.txt';

input adverts sales airplay attract;

if _n_ = 1 then delete;

if _n_ = 2 then delete;

if _n_ = 3 then delete;

if _n_ = 4 then delete;

ProcReg;

Model sales = adverts airplay attract / stb R influence;

Run;


Output with 4 outliers deleted
Output with 4 outliers deleted

Model: MODEL1

Dependent Variable: sales

Analysis of Variance

Sum of Mean

Source DF Squares Square F Value Pr > F

Model 3 1719469 573156 272.58 <.0001

Error 392 824249 2102.67719

Corrected Total 395 2543718

Root MSE 45.85496 R-Square 0.6760

Dependent Mean 192.87879 Adj R-Sq 0.6735

Coeff Var 23.77398

Parameter Estimates

Parameter Standard Standardized

Variable DF Estimate Error t Value Pr > |t| Estimate

Intercept 1 -21.41404 12.06981 -1.77 0.0768 0

adverts 1 0.08741 0.00480 18.20 <.0001 0.52814

airplay 1 3.32150 0.19177 17.32 <.0001 0.50771

attract 1 10.27947 1.70029 6.05 <.0001 0.17695


In class example
In class example

The dataset “Data11” contains the following 10 variables:

id na typeAas typeAii errorC learn errorS errorCmm think mngmt

Use PROC REG to regress errorC (error competence, higher more competent) on learn (learning from error, higher means quicker you learn from errors),think (thinking about errors, more time you put into thinking about your errors) and mngmt (Management’s orientation toward errors, higher values mean there are more consequences when errors are made).

Which variables are significant predictors of Error competence. How many outliers (studentized residuals greater than 2) did you identify? Delete the outliers. Are there change in the significance of the Beta weights or R-square?


ad