Section 1 2 l.jpg
This presentation is the property of its rightful owner.
Sponsored Links
1 / 32

Section 1.2 PowerPoint PPT Presentation


  • 63 Views
  • Uploaded on
  • Presentation posted in: General

Section 1.2. Describing Distributions with Numbers. Quantitative Data. Measuring Center Mean Median Measuring Spread Quartiles Five Number Summary Standard deviation Boxplots. Measures of Center. The mean The arithmetic mean of a data set (average value) Denoted by :.

Download Presentation

Section 1.2

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Section 1 2 l.jpg

Section 1.2

Describing Distributions with Numbers


Quantitative data l.jpg

Quantitative Data

  • Measuring Center

    • Mean

    • Median

  • Measuring Spread

    • Quartiles

    • Five Number Summary

    • Standard deviation

  • Boxplots


Measures of center l.jpg

Measures of Center

  • The mean

    • The arithmetic mean of a data set (average value)

    • Denoted by :


Calculations l.jpg

Calculations

  • Mean highway mileage for the 19 2-seaters:

    • Average: 25.8 miles/gallon

  • Issue here: Honda Insight 68 miles/gallon!

    • Exclude it, the mean mileage: only 23.4 mpg

    • What does this say about the mean?


Slide6 l.jpg

  • Problem: Mean can be easily influenced by outliers. It is NOT a resistant measure of center. Median

  • Median is the midpoint of a distribution.

    • Resistant or robust measure of center.

    • i.e. not sensitive to extreme observations


Mean vs median l.jpg

Mean vs. Median

  • In a symmetric distribution, mean = median

  • In a skewed distribution, the mean is further out in the long tail than the median.

  • Example: house prices are usually right skewed

    • The mean price of existing houses sold in 2000 in Indiana was 176,200. (Mean chases the right tail)

    • The median price of these houses was only 139,000.


Measures of spread l.jpg

Measures of spread

  • Quartiles: Divides data into four parts (with the Median)

  • pth percentile – p percent of the observations fall at or below it.

    • Median – 50th percentile

    • First Quartile (Q1) – 25th percentile (median of the lower half of data)

    • Third Quartile (Q3) – 75th percentile (median of the upper half of data)


Calculating median l.jpg

Calculating median

Always the (n+1)/2 observation from the ordered data

Example:Data: 1 2 3 4 5 6 7 8 9

(n+1)/2 = 5, so median is the 5thobservation

Median = 5

Example:Data: 1 2 3 4 5 6 7 8 9 10

(n+1)/2 = 5.5, so median is the 5.5thobservation

Median = average of 5 and 6 = 5.5


Calculating quartiles l.jpg

Calculating Quartiles:

Example:Data: 1 2 3 4 5 6 7 8 9

Median = 5 = “Q2”

Q1 is the median of the lower half =

Q3 is the median of the upper half =

Example:Data: 1 2 3 4 5 6 7 8 9 10

Median = 5.5

Q1 =

Q3 =


Five number summary l.jpg

Five-Number Summary

  • 5 numbers

    • Minimum

    • Q1

    • Median

    • Q3

    • Maximum


Find the 5 number summaries l.jpg

Find the 5-Number Summaries

Example:

Data: 26 13 35 76 44 58

Example:

Data: 84 89 89 64 78


Boxplot l.jpg

Boxplot

  • Visual representation of the five-number summary.

    • Central box: Q1 to Q3

    • Line inside box: Median

    • Extended straight lines: lowest to highest observation, except outliers


Find the 5 summary and make a boxplot l.jpg

Find the 5 # summary and make a boxplot

Numbers of home runs that Hank Aaron hit in each of his 23 years in the Major Leagues:

1012132024262729303234343839394040444444444547


Criterion for suspected outliers l.jpg

Criterion for suspected outliers

  • Interquartile Range (IQR) = Q3 - Q1

  • Observation is a suspected outlier IF it is:

    • greater than Q3 + 1.5*IQR

      OR

    • less than Q1 – 1.5*IQR


Criterion for suspected outliers17 l.jpg

Criterion for suspected outliers

  • Are there any outliers?


Criterion for suspected outliers18 l.jpg

Criterion for suspected outliers

  • Find 5 number summary:

    MinQ1MedianQ3Max

    154.5103.52002631

  • Are there any outliers?

    • Q3 – Q1 = 200 – 54.5 = 145.5

    • Times by 1.5: 145.5*1.5 = 218.25

  • Add to Q3: 200 + 218.25 = 418.25

    • Anything higher is a high outlier  7 obs.

  • Subtract from Q1: 54.5 – 218.25 = -163.75

    • Anything lower is a low outlier  no obs.


Criterion for suspected outliers19 l.jpg

Criterion for suspected outliers

  • Seven high outliers circled…


Modified boxplot l.jpg

Modified Boxplot

  • Has outliers as dots or stars.

  • The line extends only to the first non-outlier.


Standard deviation l.jpg

Standard deviation

  • Deviation :

  • Variance : s2

  • Standard Deviation : s


Slide22 l.jpg

DATA: 1792166613621614146018671439

Mean = 1600

  • Find the deviations from the mean:

    Deviation1 = 1792 – 1600 = 192

    Deviation2 = 1666 – 1600 = 66

    …Deviation7 = 1439 – 1600 = -161

  • Square the deviations.

  • Add them up and divide the sum by n-1 = 6, this gives you s2.

  • Take square root: Standard Deviation = s = 189.24


Properties of the standard deviation l.jpg

Properties of the standard deviation

  • Standard deviation is always non-negative

  • s = 0 when there is no spread

  • s is not resistant to presence of outliers

    • 5-number summary usually better describes a skewed distribution or a distribution with outliers.

  • s is used when we use the mean

    • Mean and standard deviation are usually used for reasonably symmetric distributions without outliers.


Find the mean and standard deviation l.jpg

Find the mean and standard deviation.

Numbers of home runs that Hank Aaron hit in each of his 23 years in the Major Leagues:

1327264430394034454424324439294438473440201210


Linear transformations changing units of measurements l.jpg

Linear Transformations: changing units of measurements

  • xnew = a + bxold

  • Common conversions

    • Distance: 100km is equivalent to 62 miles

      • xmiles = 0 + 0.62xkm

    • Weight: 1ounce is equivalent to 28.35 grams

      • xg= 0 + 28.35 xoz ,

    • Temperature:

      • _


Linear transformations l.jpg

Linear Transformations

  • Do not change shape of distribution

  • However, change center and spread

    Example: weights of newly hatched pythons:


Slide27 l.jpg

  • Ounces

    • Mean weight = (1.13+…+1.16)/5 = 1.12 oz

    • Standard deviation = 0.084

  • Grams

    • Mean weight =(32+…+33)/5 = 31.8 g

      • or 1.12 * 28.35 = 31.8

    • Standard deviation = 2.38

      • or 28.35 * 0.084 = 2.38


Effect of a linear transformation l.jpg

Effect of a linear transformation

  • Multiplying each observation by a positive number b multiplies both measures of center (mean and median) and measures of spread (IQR and standard deviation) by b.

  • Adding the same number a to each observation adds a to measures of center and to quartiles and other percentiles but does not change measures of spread (IQR and standard deviation)


Effects of linear transformations l.jpg

Effects of Linear Transformations

  • Your Transformation: xnew = a + b*xold

  • meannew = a + b*mean

  • mediannew = a + b*median

  • stdevnew = |b|*stdev

  • IQRnew = |b|*IQR

    |b|= absolute value of b (value without sign)


Example l.jpg

Example

  • Winter temperature recorded in Fahrenheit

    • mean = 20

    • stdev = 10

    • median = 22

    • IQR = 11

  • Convert into Celsius:

    • mean = -160/9 + 5/9 * 20 = -6.67 C

    • stdev = 5/9 * 10 = 5.56

    • median =

    • IQR =


Sas tips l.jpg

SAS tips

  • “proc univariate” procedure generates all the descriptive summaries.

  • For the time being, draw boxplots by hand from the 5-number summary

    • Optional: proc boxplot.

  • See plot.doc


Summary 1 2 l.jpg

Summary (1.2)

  • Measures of location: Mean, Median, Quartiles

  • Measures of spread: stdev, IQR

  • Mean, stdev

    • affected by extreme observations

  • Median, IQR

    • robust to extreme observations

  • Five number summary and boxplot

  • Linear Transformations


  • Login