# Section 1.2 - PowerPoint PPT Presentation

1 / 32

Section 1.2. Describing Distributions with Numbers. Quantitative Data. Measuring Center Mean Median Measuring Spread Quartiles Five Number Summary Standard deviation Boxplots. Measures of Center. The mean The arithmetic mean of a data set (average value) Denoted by :.

## Related searches for Section 1.2

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Section 1.2

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

## Section 1.2

Describing Distributions with Numbers

### Quantitative Data

• Measuring Center

• Mean

• Median

• Quartiles

• Five Number Summary

• Standard deviation

• Boxplots

### Measures of Center

• The mean

• The arithmetic mean of a data set (average value)

• Denoted by :

### Calculations

• Mean highway mileage for the 19 2-seaters:

• Average: 25.8 miles/gallon

• Issue here: Honda Insight 68 miles/gallon!

• Exclude it, the mean mileage: only 23.4 mpg

• What does this say about the mean?

• Problem: Mean can be easily influenced by outliers. It is NOT a resistant measure of center. Median

• Median is the midpoint of a distribution.

• Resistant or robust measure of center.

• i.e. not sensitive to extreme observations

### Mean vs. Median

• In a symmetric distribution, mean = median

• In a skewed distribution, the mean is further out in the long tail than the median.

• Example: house prices are usually right skewed

• The mean price of existing houses sold in 2000 in Indiana was 176,200. (Mean chases the right tail)

• The median price of these houses was only 139,000.

• Quartiles: Divides data into four parts (with the Median)

• pth percentile – p percent of the observations fall at or below it.

• Median – 50th percentile

• First Quartile (Q1) – 25th percentile (median of the lower half of data)

• Third Quartile (Q3) – 75th percentile (median of the upper half of data)

### Calculating median

Always the (n+1)/2 observation from the ordered data

Example:Data: 1 2 3 4 5 6 7 8 9

(n+1)/2 = 5, so median is the 5thobservation

Median = 5

Example:Data: 1 2 3 4 5 6 7 8 9 10

(n+1)/2 = 5.5, so median is the 5.5thobservation

Median = average of 5 and 6 = 5.5

### Calculating Quartiles:

Example:Data: 1 2 3 4 5 6 7 8 9

Median = 5 = “Q2”

Q1 is the median of the lower half =

Q3 is the median of the upper half =

Example:Data: 1 2 3 4 5 6 7 8 9 10

Median = 5.5

Q1 =

Q3 =

• 5 numbers

• Minimum

• Q1

• Median

• Q3

• Maximum

### Find the 5-Number Summaries

Example:

Data: 26 13 35 76 44 58

Example:

Data: 84 89 89 64 78

### Boxplot

• Visual representation of the five-number summary.

• Central box: Q1 to Q3

• Line inside box: Median

• Extended straight lines: lowest to highest observation, except outliers

### Find the 5 # summary and make a boxplot

Numbers of home runs that Hank Aaron hit in each of his 23 years in the Major Leagues:

1012132024262729303234343839394040444444444547

### Criterion for suspected outliers

• Interquartile Range (IQR) = Q3 - Q1

• Observation is a suspected outlier IF it is:

• greater than Q3 + 1.5*IQR

OR

• less than Q1 – 1.5*IQR

### Criterion for suspected outliers

• Are there any outliers?

### Criterion for suspected outliers

• Find 5 number summary:

MinQ1MedianQ3Max

154.5103.52002631

• Are there any outliers?

• Q3 – Q1 = 200 – 54.5 = 145.5

• Times by 1.5: 145.5*1.5 = 218.25

• Add to Q3: 200 + 218.25 = 418.25

• Anything higher is a high outlier  7 obs.

• Subtract from Q1: 54.5 – 218.25 = -163.75

• Anything lower is a low outlier  no obs.

### Criterion for suspected outliers

• Seven high outliers circled…

### Modified Boxplot

• Has outliers as dots or stars.

• The line extends only to the first non-outlier.

### Standard deviation

• Deviation :

• Variance : s2

• Standard Deviation : s

DATA: 1792166613621614146018671439

Mean = 1600

• Find the deviations from the mean:

Deviation1 = 1792 – 1600 = 192

Deviation2 = 1666 – 1600 = 66

…Deviation7 = 1439 – 1600 = -161

• Square the deviations.

• Add them up and divide the sum by n-1 = 6, this gives you s2.

• Take square root: Standard Deviation = s = 189.24

### Properties of the standard deviation

• Standard deviation is always non-negative

• s = 0 when there is no spread

• s is not resistant to presence of outliers

• 5-number summary usually better describes a skewed distribution or a distribution with outliers.

• s is used when we use the mean

• Mean and standard deviation are usually used for reasonably symmetric distributions without outliers.

### Find the mean and standard deviation.

Numbers of home runs that Hank Aaron hit in each of his 23 years in the Major Leagues:

1327264430394034454424324439294438473440201210

### Linear Transformations: changing units of measurements

• xnew = a + bxold

• Common conversions

• Distance: 100km is equivalent to 62 miles

• xmiles = 0 + 0.62xkm

• Weight: 1ounce is equivalent to 28.35 grams

• xg= 0 + 28.35 xoz ,

• Temperature:

• _

### Linear Transformations

• Do not change shape of distribution

• However, change center and spread

Example: weights of newly hatched pythons:

• Ounces

• Mean weight = (1.13+…+1.16)/5 = 1.12 oz

• Standard deviation = 0.084

• Grams

• Mean weight =(32+…+33)/5 = 31.8 g

• or 1.12 * 28.35 = 31.8

• Standard deviation = 2.38

• or 28.35 * 0.084 = 2.38

### Effect of a linear transformation

• Multiplying each observation by a positive number b multiplies both measures of center (mean and median) and measures of spread (IQR and standard deviation) by b.

• Adding the same number a to each observation adds a to measures of center and to quartiles and other percentiles but does not change measures of spread (IQR and standard deviation)

### Effects of Linear Transformations

• Your Transformation: xnew = a + b*xold

• meannew = a + b*mean

• mediannew = a + b*median

• stdevnew = |b|*stdev

• IQRnew = |b|*IQR

|b|= absolute value of b (value without sign)

### Example

• Winter temperature recorded in Fahrenheit

• mean = 20

• stdev = 10

• median = 22

• IQR = 11

• Convert into Celsius:

• mean = -160/9 + 5/9 * 20 = -6.67 C

• stdev = 5/9 * 10 = 5.56

• median =

• IQR =

### SAS tips

• “proc univariate” procedure generates all the descriptive summaries.

• For the time being, draw boxplots by hand from the 5-number summary

• Optional: proc boxplot.

• See plot.doc

### Summary (1.2)

• Measures of location: Mean, Median, Quartiles

• Measures of spread: stdev, IQR

• Mean, stdev

• affected by extreme observations

• Median, IQR

• robust to extreme observations

• Five number summary and boxplot

• Linear Transformations