- 113 Views
- Updated On :

Section 1.2. Describing Distributions with Numbers. Quantitative Data. Measuring Center Mean Median Measuring Spread Quartiles Five Number Summary Standard deviation Boxplots. Measures of Center. The mean The arithmetic mean of a data set (average value) Denoted by :.

Related searches for Section 1.2

Download Presentation
## PowerPoint Slideshow about 'Section 1.2' - zuzela

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Section 1.2

Describing Distributions with Numbers

Quantitative Data

- Measuring Center
- Mean
- Median

- Measuring Spread
- Quartiles
- Five Number Summary
- Standard deviation

- Boxplots

Measures of Center

- The mean
- The arithmetic mean of a data set (average value)
- Denoted by :

Calculations

- Mean highway mileage for the 19 2-seaters:
- Average: 25.8 miles/gallon

- Issue here: Honda Insight 68 miles/gallon!
- Exclude it, the mean mileage: only 23.4 mpg
- What does this say about the mean?

- Problem: Mean can be easily influenced by outliers. It is NOT a resistant measure of center. Median
- Median is the midpoint of a distribution.
- Resistant or robust measure of center.
- i.e. not sensitive to extreme observations

Mean vs. Median NOT a resistant measure of center.

- In a symmetric distribution, mean = median
- In a skewed distribution, the mean is further out in the long tail than the median.
- Example: house prices are usually right skewed
- The mean price of existing houses sold in 2000 in Indiana was 176,200. (Mean chases the right tail)
- The median price of these houses was only 139,000.

Measures of spread NOT a resistant measure of center.

- Quartiles: Divides data into four parts (with the Median)
- pth percentile – p percent of the observations fall at or below it.
- Median – 50th percentile
- First Quartile (Q1) – 25th percentile (median of the lower half of data)
- Third Quartile (Q3) – 75th percentile (median of the upper half of data)

Calculating median NOT a resistant measure of center.

Always the (n+1)/2 observation from the ordered data

Example: Data: 1 2 3 4 5 6 7 8 9

(n+1)/2 = 5, so median is the 5thobservation

Median = 5

Example: Data: 1 2 3 4 5 6 7 8 9 10

(n+1)/2 = 5.5, so median is the 5.5thobservation

Median = average of 5 and 6 = 5.5

Calculating Quartiles: NOT a resistant measure of center.

Example: Data: 1 2 3 4 5 6 7 8 9

Median = 5 = “Q2”

Q1 is the median of the lower half =

Q3 is the median of the upper half =

Example: Data: 1 2 3 4 5 6 7 8 9 10

Median = 5.5

Q1 =

Q3 =

Five-Number Summary NOT a resistant measure of center.

- 5 numbers
- Minimum
- Q1
- Median
- Q3
- Maximum

Find the 5-Number Summaries NOT a resistant measure of center.

Example:

Data: 26 13 35 76 44 58

Example:

Data: 84 89 89 64 78

Boxplot NOT a resistant measure of center.

- Visual representation of the five-number summary.
- Central box: Q1 to Q3
- Line inside box: Median
- Extended straight lines: lowest to highest observation, except outliers

Find the 5 # summary and make a boxplot NOT a resistant measure of center.

Numbers of home runs that Hank Aaron hit in each of his 23 years in the Major Leagues:

10 12 13 20 24 26 27 29 30 32 34 34 38 39 39 40 40 44 44 44 44 45 47

Criterion for suspected outliers NOT a resistant measure of center.

- Interquartile Range (IQR) = Q3 - Q1
- Observation is a suspected outlier IF it is:
- greater than Q3 + 1.5*IQR
OR

- less than Q1 – 1.5*IQR

- greater than Q3 + 1.5*IQR

Criterion for suspected outliers NOT a resistant measure of center.

- Are there any outliers?

Criterion for suspected outliers NOT a resistant measure of center.

- Find 5 number summary:
Min Q1 Median Q3 Max

1 54.5 103.5 200 2631

- Are there any outliers?
- Q3 – Q1 = 200 – 54.5 = 145.5
- Times by 1.5: 145.5*1.5 = 218.25

- Add to Q3: 200 + 218.25 = 418.25
- Anything higher is a high outlier 7 obs.

- Subtract from Q1: 54.5 – 218.25 = -163.75
- Anything lower is a low outlier no obs.

Criterion for suspected outliers NOT a resistant measure of center.

- Seven high outliers circled…

Modified Boxplot NOT a resistant measure of center.

- Has outliers as dots or stars.
- The line extends only to the first non-outlier.

Standard deviation NOT a resistant measure of center.

- Deviation :
- Variance : s2
- Standard Deviation : s

DATA: 1792 1666 1362 1614 1460 1867 1439 NOT a resistant measure of center.

Mean = 1600

- Find the deviations from the mean:
Deviation1 = 1792 – 1600 = 192

Deviation2 = 1666 – 1600 = 66

…Deviation7 = 1439 – 1600 = -161

- Square the deviations.
- Add them up and divide the sum by n-1 = 6, this gives you s2.
- Take square root: Standard Deviation = s = 189.24

Properties of the standard deviation NOT a resistant measure of center.

- Standard deviation is always non-negative
- s = 0 when there is no spread
- s is not resistant to presence of outliers
- 5-number summary usually better describes a skewed distribution or a distribution with outliers.

- s is used when we use the mean
- Mean and standard deviation are usually used for reasonably symmetric distributions without outliers.

Find the mean and standard deviation. NOT a resistant measure of center.

Numbers of home runs that Hank Aaron hit in each of his 23 years in the Major Leagues:

13 27 26 44 30 39 40 34 45 44 24 32 44 39 29 44 38 47 34 40 20 12 10

Linear Transformations: changing units of measurements NOT a resistant measure of center.

- xnew = a + bxold
- Common conversions
- Distance: 100km is equivalent to 62 miles
- xmiles = 0 + 0.62xkm

- Weight: 1ounce is equivalent to 28.35 grams
- xg= 0 + 28.35 xoz ,

- Temperature:
- _

- Distance: 100km is equivalent to 62 miles

Linear Transformations NOT a resistant measure of center.

- Do not change shape of distribution
- However, change center and spread
Example: weights of newly hatched pythons:

- Ounces NOT a resistant measure of center.
- Mean weight = (1.13+…+1.16)/5 = 1.12 oz
- Standard deviation = 0.084

- Grams
- Mean weight =(32+…+33)/5 = 31.8 g
- or 1.12 * 28.35 = 31.8

- Standard deviation = 2.38
- or 28.35 * 0.084 = 2.38

- Mean weight =(32+…+33)/5 = 31.8 g

Effect of a linear transformation NOT a resistant measure of center.

- Multiplying each observation by a positive number b multiplies both measures of center (mean and median) and measures of spread (IQR and standard deviation) by b.
- Adding the same number a to each observation adds a to measures of center and to quartiles and other percentiles but does not change measures of spread (IQR and standard deviation)

Effects of Linear Transformations NOT a resistant measure of center.

- Your Transformation: xnew = a + b*xold
- meannew = a + b*mean
- mediannew = a + b*median
- stdevnew = |b|*stdev
- IQRnew = |b|*IQR
|b|= absolute value of b (value without sign)

Example NOT a resistant measure of center.

- Winter temperature recorded in Fahrenheit
- mean = 20
- stdev = 10
- median = 22
- IQR = 11

- Convert into Celsius:
- mean = -160/9 + 5/9 * 20 = -6.67 C
- stdev = 5/9 * 10 = 5.56
- median =
- IQR =

SAS tips NOT a resistant measure of center.

- “proc univariate” procedure generates all the descriptive summaries.
- For the time being, draw boxplots by hand from the 5-number summary
- Optional: proc boxplot.

- See plot.doc

Summary (1.2) NOT a resistant measure of center.

- Measures of location: Mean, Median, Quartiles
- Measures of spread: stdev, IQR
- Mean, stdev
- affected by extreme observations

- Median, IQR
- robust to extreme observations

- Five number summary and boxplot
- Linear Transformations

Download Presentation

Connecting to Server..