Download Presentation
Influential Points and Outliers

Loading in 2 Seconds...

1 / 14

# Influential Points and Outliers - PowerPoint PPT Presentation

Influential Points and Outliers. Debbi Amanti. OUTLIERS:. Data points two or three standard deviations from the mean of the data. Observations that differ significantly from the pattern of the REST OF THE DATA

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

## Influential Points and Outliers

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Influential Pointsand Outliers

Debbi Amanti

OUTLIERS:
• Data points two or three standard deviations from the mean of the data.
• Observations that differ significantly from the pattern of the REST OF THE DATA
• Observations that lie outside the overall pattern of the other observations.
OUTLIERS IN TERMS OF REGRESSION:
• Observations with large (in absolute value) residuals.
• Observations falling f a r from the regression line while not following the pattern of the relationship apparent in the others
• Residual=actual-fitted

Find the Inter Quartile Range a.k.a. IQR (Q3-Q1) and multiply this value by 1.5. An outlier for a data set is any point:

• Greater than Q3+1.5*(IQR)
• Less than Q1-1.5*(IQR)
INFLUENTIAL POINTS ARE:
• Points whose removal would greatly affect the association of two variables
• Points whose removal would significantly change the slope of an LSR line
• Points with a large moment (i.e they are far away from the rest of the data.)
• Usually outliers in the x direction.

The two graphs below show the same data – the one on the right with the removal of the green data point. As you can see, the removal of this point significantly affects the slope of the regression line. This is an influential point!

X DATA

IQR= 5

Q1=3 Q3=8

MAX=15.5 MIN=1

An outlier is any point:

> Q3+1.5*IQR=15.5

or

< Q1-1.5*IQR=-4.5

THERE ARE NO OUTLIERS IN THIS DATA SET!!!

Y DATA

IQR=5

Q1=4 Q3=9

MAX=10 MIN=2

An outlier is any point:

> Q3+1.5*IQR=16.5

or

< Q1-1.5*IQR=-3.5

THERE ARE NO OUTLIERS IN THIS DATA SET!!!

Using the same data as shown on the previous slide, let’s compare the x and y data sets for the presence of outliers:
!!!REMEMBER!!!

An observation does NOT have

to be an Outlier to be an

Influential Point!!

Nor does an observation need

to be an Influential Point in order

to be an Outlier!!

Given the five-number summary {8 21 35 43 77}, which of the following is correct?

A. There are no outliers

B. There are at least two outliers

C. There is not enough data to make any conclusion

D. There is exactly one outlier

E. There is at least one outlier

The correct answer is E

The five number summary gives you

{Min Q1 Median Q3 Max}

The IQR is calculated by Q3-Q1

So, the IQR for the given data is 43-21=22

An outlier for this data would be:

>Q3+1.5*IQR or <Q1-1.5*IQR

>43+(22*1.5)=76 or <21-(22*1.5)=-12

Since the max is 77, there must be at least oneoutlier in this data set, but we cannot conclude how many outliers without more data.

Given the following scatterplot and residual plot. Which of the following is true about the yellow data point?

I. It is an influential point

II. It is an outlier with respect to the regression model

II. It appears to be an outlier in the x direction

A. I only

B. I and II

C. I and III

D. None of the above

E. All of the above

The correct answer is c

I. Because this point has a large moment and is far from the rest of the data, it is an influential point. If this point was removed, the slope of the line would markedly change.

II. This point is not an outlier with respect to the model because as you can see in the residual plot, it does not have a large residual (It follows the regression pattern of the data).

III. By looking at both the scatterplot and the residual plot, you can see that the yellow point is an outlier in the x direction (far right of the rest of the data).

Resources used in this presentation include:
• Workshop Statistics by Allan Rossman
• The Basic Practice of Statistics by David S. Moore
• AMSCO’s AP Statistics by James Bohan
• Any further questions, email me at: debora_amanti@bbns.org