1 / 26

# STA 291 Spring 2010 - PowerPoint PPT Presentation

STA 291 Spring 2010. Lecture 5 Dustin Lueker. Measures of Central Tendency. Mean - Arithmetic Average . Median - Midpoint of the observations when they are arranged in increasing order. Notation: Subscripted variables n = # of units in the sample N = # of units in the population

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'STA 291 Spring 2010' - gypsy

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### STA 291Spring 2010

Lecture 5

Dustin Lueker

Mean - Arithmetic Average

Median - Midpoint of the observations when they are arranged in increasing order

Notation: Subscripted variables

n = # of units in the sample

N = # of units in the population

x = Variable to be measured

xi= Measurement of the ith unit

Mode - Most frequent value.

STA 291 Spring 2010 Lecture 5

STA 291 Spring 2010 Lecture 5

• Sample

• Variance

• Standard Deviation

• Population

• Variance

• Standard Deviation

STA 291 Spring 2010 Lecture 5

• Calculate the mean

• For each observation, calculate the deviation

• For each observation, calculate the squared deviation

• Add up all the squared deviations

• Divide the result by (n-1)

Or N if you are finding the population variance

(To get the standard deviation, take the square root of the result)

STA 291 Spring 2010 Lecture 5

• If the data is approximately symmetric and bell-shaped then

• About 68% of the observations are within one standard deviation from the mean

• About 95% of the observations are within two standard deviations from the mean

• About 99.7% of the observations are within three standard deviations from the mean

STA 291 Spring 2010 Lecture 5

STA 291 Spring 2010 Lecture 5

• The pth percentile (Xp) is a number such that p% of the observations take values below it, and (100-p)% take values above it

• 50th percentile = median

• 25th percentile = lower quartile

• 75th percentile = upper quartile

• The index of Lp

• (n+1)p/100

STA 291 Spring 2010 Lecture 5

• 25th percentile

• lower quartile

• Q1

• (approximately) median of the observations below the median

• 75th percentile

• upper quartile

• Q3

• (approximately) median of the observations above the median

STA 291 Spring 2010 Lecture 5

• Find the 25th percentile of this data set

• {3, 7, 12, 13, 15, 19, 24}

STA 291 Spring 2010 Lecture 5

• Use when the index is not a whole number

• Want to start with the closest index lower than the number found then go the distance of the decimal towards the next number

• If the index is found to be 5.4 you want to go to the 5th value then add .4 of the value between the 5th value and 6th value

• In essence we are going to the 5.4th value

STA 291 Spring 2010 Lecture 5

• Find the 40th percentile of the same data set

• {3, 7, 12, 13, 15, 19, 24}

• Must use interpolation

STA 291 Spring 2010 Lecture 5

• Five Number Summary

• Minimum

• Lower Quartile

• Median

• Upper Quartile

• Maximum

• Example

• minimum=4

• Q1=256

• median=530

• Q3=1105

• maximum=320,000.

• What does this suggest about the shape of the distribution?

STA 291 Spring 2010 Lecture 5

Interquartile Range (IQR)

• The Interquartile Range (IQR) is the difference between upper and lower quartile

• IQR = Q3 – Q1

• IQR = Range of values that contains the middle 50% of the data

• IQR increases as variability increases

• Murder Rate Data

• Q1= 3.9

• Q3 = 10.3

• IQR =

STA 291 Spring 2010 Lecture 5

• Displays the five number summary (and more) graphical

• Consists of a box that contains the central 50% of the distribution (from lower quartile to upper quartile)

• A line within the box that marks the median,

• And whiskersthat extend to the maximum and minimum values

• This is assuming there are no outliers in the data set

STA 291 Spring 2010 Lecture 5

• An observation is an outlier if it falls

• more than 1.5 IQR above the upper quartile

or

• more than 1.5 IQR below the lower quartile

STA 291 Spring 2010 Lecture 5

• Whiskers only extend to the most extreme observations within 1.5 IQR beyond the quartiles

• If an observation is an outlier, it is marked by an x, +, or some other identifier

STA 291 Spring 2010 Lecture 5

• Values

• Min = 148

• Q1 = 158

• Median = Q2 = 162

• Q3 = 182

• Max = 204

• Create a box plot

• STA 291 Spring 2010 Lecture 5

• On right-skewed distributions, minimum, Q1, and median will be “bunched up”, while Q3 and the maximum will be farther away.

• For left-skewed distributions, the “mirror” is true: the maximum, Q3, and the median will be relatively close compared to the corresponding distances to Q1 and the minimum.

• Symmetric distributions?

STA 291 Spring 2010 Lecture 5

• Value that occurs most frequently

• Does not need to be near the center of the distribution

• Not really a measure of central tendency

• Can be used for all types of data (nominal, ordinal, interval)

• Special Cases

• Data Set

• {2, 2, 4, 5, 5, 6, 10, 11}

• Mode =

• Data Set

• {2, 6, 7, 10, 13}

• Mode =

STA 291 Spring 2010 Lecture 5

• Mean

• Interval data with an approximately symmetric distribution

• Median

• Interval or ordinal data

• Mode

• All types of data

STA 291 Spring 2010 Lecture 5

• Mean is sensitive to outliers

• Median and mode are not

• Why?

• In general, the median is more appropriate for skewed data than the mean

• Why?

• In some situations, the median may be too insensitive to changes in the data

• The mode may not be unique

STA 291 Spring 2010 Lecture 5

• “How often do you read the newspaper?”

• Identify the mode

• Identify the median response

STA 291 Spring 2010 Lecture 5

• Statistics that describe variability

• Two distributions may have the same mean and/or median but different variability

• Mean and Median only describe a typical value, but not the spread of the data

• Range

• Variance

• Standard Deviation

• Interquartile Range

• All of these can be computed for the sample or population

STA 291 Spring 2010 Lecture 5

• Difference between the largest and smallest observation

• Very much affected by outliers

• A misrecorded observation may lead to an outlier, and affect the range

• The range does not always reveal different variation about the mean

STA 291 Spring 2010 Lecture 5

• Sample 1

• Smallest Observation: 112

• Largest Observation: 797

• Range =

• Sample 2

• Smallest Observation: 15033

• Largest Observation: 16125

• Range =

STA 291 Spring 2010 Lecture 5