LECTURE 14 OUTLIERS AND MULTICOLLINEARITY

1 / 10

# LECTURE 14 OUTLIERS AND MULTICOLLINEARITY - PowerPoint PPT Presentation

LECTURE 14 OUTLIERS AND MULTICOLLINEARITY. OUTLIER ANALYSIS 1. VISUAL DISPLAY 2. INTERACTIVE INSPECTION: http://www.stat.uiuc.edu/~stat100/java/guess/PPApplet.html. OUTLIERS. LEVERAGE h ii = 1/n + (Score – M x )/ x 2 (single predictor) Should be close to 1/n

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about ' LECTURE 14 OUTLIERS AND MULTICOLLINEARITY' - leonard-rowe

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
LECTURE 14OUTLIERS AND MULTICOLLINEARITY
• OUTLIER ANALYSIS
• 1. VISUAL DISPLAY
• 2. INTERACTIVE INSPECTION:

http://www.stat.uiuc.edu/~stat100/java/guess/PPApplet.html

OUTLIERS
• LEVERAGE
• hii= 1/n + (Score – Mx)/x2 (single predictor)

Should be close to 1/n

• Centered: h*ii= hii- 1/n
OUTLIERS
• Test: t(case I deleted)= [resid(i)/ 1- hij] / [MSres(i)/(1- hij )]
• Where resid(i) = residual of Y-Ymni with case i removed
• SPSS- take case i out, run analysis with SAVE
OUTLIERS
• MAHALANOBIS (Euclidean) distance of DV score from centroid of IVs
• Cook’s D: C =  (Y – Yi)2 /[(k-1)*MSres]
• DFFITSi = (Y – Yi) /SQRT[MSresi hii]
OUTLIERS
• SPSS: GENERAL LINEAR MODEL OPTIONS: ‘SAVE’

(check ‘Leverage Values’ and ‘Cooks’ to get hii and C

Plot C and h against the cases

OUTLIERS – WHAT TO DO
• DELETE
• REVISE MODEL
• TRANSFORM VARIABLES (LOG, SQRT, LOGIT, ARCSIN, ETC.)
• ROBUST METHODS:
• LTS (LEAST TRIMMED SQUARES)
• VARIANT: WINDSORIZE (REMOVE TOP 5%, BOTTOM 5%)
• M-estimation: weight least squares for each case by deviation from regression line
MULTICOLLINEARITY
• EXACT COLLINEARITY: One IV is predicted perfectly from another set of IVs
• MULTICOLLINEARITY: high correlation between one IV and another or set of other IVs
MULTICOLLINEARITY Measures
• VIF- Variance Inflation Factor

VIF(i) = 1 / [ 1 – R2(i.1,2,3,…k)

Calculates the R-square for each predictor from all the rest of the predictors

• TOLERANCE

= 1 / VIF

• CONDITION INDEX

= max / min

= largest eigenvalue over smallest

CRITICAL CONDITIONS
• VIF- Variance Inflation Factor > 10
• TOLERANCE

= 1 / VIF < .10

• CONDITION INDEX > 30
FIXING MULTICOLLINEARITY
• REVISE MODEL
• NEW DATA
• RIDGE REGRESSION: SPSS Macro
• PRINCIPAL COMPONENTS REGRESSION
• STANDARDIZE PREDICTORS
• GET PRINCIPAL COMPONENT WEIGHTS
• CREATE NEW PRIN.COMP. SCORES, USE AS PREDICTORS