- By
**gypsy** - Follow User

- 83 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'STA 291 Spring 2010' - gypsy

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Measures of Central Tendency

Mean - Arithmetic Average

Median - Midpoint of the observations when they are arranged in increasing order

Notation: Subscripted variables

n = # of units in the sample

N = # of units in the population

x = Variable to be measured

xi= Measurement of the ith unit

Mode - Most frequent value.

STA 291 Spring 2010 Lecture 5

Symbols

STA 291 Spring 2010 Lecture 5

Variance and Standard Deviation

- Sample
- Variance
- Standard Deviation

- Population
- Variance
- Standard Deviation

STA 291 Spring 2010 Lecture 5

Variance Step By Step

- Calculate the mean
- For each observation, calculate the deviation
- For each observation, calculate the squared deviation
- Add up all the squared deviations
- Divide the result by (n-1)
Or N if you are finding the population variance

(To get the standard deviation, take the square root of the result)

STA 291 Spring 2010 Lecture 5

Empirical Rule

- If the data is approximately symmetric and bell-shaped then
- About 68% of the observations are within one standard deviation from the mean
- About 95% of the observations are within two standard deviations from the mean
- About 99.7% of the observations are within three standard deviations from the mean

STA 291 Spring 2010 Lecture 5

Empirical Rule

STA 291 Spring 2010 Lecture 5

Percentiles

- The pth percentile (Xp) is a number such that p% of the observations take values below it, and (100-p)% take values above it
- 50th percentile = median
- 25th percentile = lower quartile
- 75th percentile = upper quartile

- The index of Lp
- (n+1)p/100

STA 291 Spring 2010 Lecture 5

Quartiles

- 25th percentile
- lower quartile
- Q1
- (approximately) median of the observations below the median

- 75th percentile
- upper quartile
- Q3
- (approximately) median of the observations above the median

STA 291 Spring 2010 Lecture 5

Example

- Find the 25th percentile of this data set
- {3, 7, 12, 13, 15, 19, 24}

STA 291 Spring 2010 Lecture 5

Interpolation

- Use when the index is not a whole number
- Want to start with the closest index lower than the number found then go the distance of the decimal towards the next number
- If the index is found to be 5.4 you want to go to the 5th value then add .4 of the value between the 5th value and 6th value
- In essence we are going to the 5.4th value

STA 291 Spring 2010 Lecture 5

Example

- Find the 40th percentile of the same data set
- {3, 7, 12, 13, 15, 19, 24}
- Must use interpolation

- {3, 7, 12, 13, 15, 19, 24}

STA 291 Spring 2010 Lecture 5

Data Summary

- Five Number Summary
- Minimum
- Lower Quartile
- Median
- Upper Quartile
- Maximum

- Example
- minimum=4
- Q1=256
- median=530
- Q3=1105
- maximum=320,000.
- What does this suggest about the shape of the distribution?

STA 291 Spring 2010 Lecture 5

Interquartile Range (IQR)

- The Interquartile Range (IQR) is the difference between upper and lower quartile
- IQR = Q3 – Q1
- IQR = Range of values that contains the middle 50% of the data
- IQR increases as variability increases

- Murder Rate Data
- Q1= 3.9
- Q3 = 10.3
- IQR =

STA 291 Spring 2010 Lecture 5

Box Plot

- Displays the five number summary (and more) graphical
- Consists of a box that contains the central 50% of the distribution (from lower quartile to upper quartile)
- A line within the box that marks the median,
- And whiskersthat extend to the maximum and minimum values
- This is assuming there are no outliers in the data set

STA 291 Spring 2010 Lecture 5

Outliers

- An observation is an outlier if it falls
- more than 1.5 IQR above the upper quartile
or

- more than 1.5 IQR below the lower quartile

- more than 1.5 IQR above the upper quartile

STA 291 Spring 2010 Lecture 5

Box Plot

- Whiskers only extend to the most extreme observations within 1.5 IQR beyond the quartiles
- If an observation is an outlier, it is marked by an x, +, or some other identifier

STA 291 Spring 2010 Lecture 5

Example Create a box plot

- Values
- Min = 148
- Q1 = 158
- Median = Q2 = 162
- Q3 = 182
- Max = 204

STA 291 Spring 2010 Lecture 5

5 Number Summary/Box Plot

- On right-skewed distributions, minimum, Q1, and median will be “bunched up”, while Q3 and the maximum will be farther away.
- For left-skewed distributions, the “mirror” is true: the maximum, Q3, and the median will be relatively close compared to the corresponding distances to Q1 and the minimum.
- Symmetric distributions?

STA 291 Spring 2010 Lecture 5

Mode

- Value that occurs most frequently
- Does not need to be near the center of the distribution
- Not really a measure of central tendency

- Can be used for all types of data (nominal, ordinal, interval)

- Does not need to be near the center of the distribution
- Special Cases
- Data Set
- {2, 2, 4, 5, 5, 6, 10, 11}
- Mode =

- Data Set
- {2, 6, 7, 10, 13}
- Mode =

- Data Set

STA 291 Spring 2010 Lecture 5

Mean vs. Median vs. Mode

- Mean
- Interval data with an approximately symmetric distribution

- Median
- Interval or ordinal data

- Mode
- All types of data

STA 291 Spring 2010 Lecture 5

Mean vs. Median vs. Mode

- Mean is sensitive to outliers
- Median and mode are not
- Why?

- Median and mode are not
- In general, the median is more appropriate for skewed data than the mean
- Why?

- In some situations, the median may be too insensitive to changes in the data
- The mode may not be unique

STA 291 Spring 2010 Lecture 5

Example

- “How often do you read the newspaper?”

- Identify the mode
- Identify the median response

STA 291 Spring 2010 Lecture 5

Measures of Variation

- Statistics that describe variability
- Two distributions may have the same mean and/or median but different variability
- Mean and Median only describe a typical value, but not the spread of the data

- Range
- Variance
- Standard Deviation
- Interquartile Range
- All of these can be computed for the sample or population

- Two distributions may have the same mean and/or median but different variability

STA 291 Spring 2010 Lecture 5

Range

- Difference between the largest and smallest observation
- Very much affected by outliers
- A misrecorded observation may lead to an outlier, and affect the range

- Very much affected by outliers
- The range does not always reveal different variation about the mean

STA 291 Spring 2010 Lecture 5

Example

- Sample 1
- Smallest Observation: 112
- Largest Observation: 797
- Range =

- Sample 2
- Smallest Observation: 15033
- Largest Observation: 16125
- Range =

STA 291 Spring 2010 Lecture 5

Download Presentation

Connecting to Server..