Section 1 2
Download
1 / 32

Section 1.2 - PowerPoint PPT Presentation


  • 108 Views
  • Updated On :

Section 1.2. Describing Distributions with Numbers. Quantitative Data. Measuring Center Mean Median Measuring Spread Quartiles Five Number Summary Standard deviation Boxplots. Measures of Center. The mean The arithmetic mean of a data set (average value) Denoted by :.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Section 1.2' - zuzela


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Section 1 2 l.jpg

Section 1.2

Describing Distributions with Numbers


Quantitative data l.jpg
Quantitative Data

  • Measuring Center

    • Mean

    • Median

  • Measuring Spread

    • Quartiles

    • Five Number Summary

    • Standard deviation

  • Boxplots


Measures of center l.jpg
Measures of Center

  • The mean

    • The arithmetic mean of a data set (average value)

    • Denoted by :


Calculations l.jpg
Calculations

  • Mean highway mileage for the 19 2-seaters:

    • Average: 25.8 miles/gallon

  • Issue here: Honda Insight 68 miles/gallon!

    • Exclude it, the mean mileage: only 23.4 mpg

    • What does this say about the mean?


Slide6 l.jpg


Mean vs median l.jpg
Mean vs. Median NOT a resistant measure of center.

  • In a symmetric distribution, mean = median

  • In a skewed distribution, the mean is further out in the long tail than the median.

  • Example: house prices are usually right skewed

    • The mean price of existing houses sold in 2000 in Indiana was 176,200. (Mean chases the right tail)

    • The median price of these houses was only 139,000.


Measures of spread l.jpg
Measures of spread NOT a resistant measure of center.

  • Quartiles: Divides data into four parts (with the Median)

  • pth percentile – p percent of the observations fall at or below it.

    • Median – 50th percentile

    • First Quartile (Q1) – 25th percentile (median of the lower half of data)

    • Third Quartile (Q3) – 75th percentile (median of the upper half of data)


Calculating median l.jpg
Calculating median NOT a resistant measure of center.

Always the (n+1)/2 observation from the ordered data

Example: Data: 1 2 3 4 5 6 7 8 9

(n+1)/2 = 5, so median is the 5thobservation

Median = 5

Example: Data: 1 2 3 4 5 6 7 8 9 10

(n+1)/2 = 5.5, so median is the 5.5thobservation

Median = average of 5 and 6 = 5.5


Calculating quartiles l.jpg
Calculating Quartiles: NOT a resistant measure of center.

Example: Data: 1 2 3 4 5 6 7 8 9

Median = 5 = “Q2”

Q1 is the median of the lower half =

Q3 is the median of the upper half =

Example: Data: 1 2 3 4 5 6 7 8 9 10

Median = 5.5

Q1 =

Q3 =


Five number summary l.jpg
Five-Number Summary NOT a resistant measure of center.

  • 5 numbers

    • Minimum

    • Q1

    • Median

    • Q3

    • Maximum


Find the 5 number summaries l.jpg
Find the 5-Number Summaries NOT a resistant measure of center.

Example:

Data: 26 13 35 76 44 58

Example:

Data: 84 89 89 64 78


Boxplot l.jpg
Boxplot NOT a resistant measure of center.

  • Visual representation of the five-number summary.

    • Central box: Q1 to Q3

    • Line inside box: Median

    • Extended straight lines: lowest to highest observation, except outliers


Find the 5 summary and make a boxplot l.jpg
Find the 5 # summary and make a boxplot NOT a resistant measure of center.

Numbers of home runs that Hank Aaron hit in each of his 23 years in the Major Leagues:

10 12 13 20 24 26 27 29 30 32 34 34 38 39 39 40 40 44 44 44 44 45 47


Criterion for suspected outliers l.jpg
Criterion for suspected outliers NOT a resistant measure of center.

  • Interquartile Range (IQR) = Q3 - Q1

  • Observation is a suspected outlier IF it is:

    • greater than Q3 + 1.5*IQR

      OR

    • less than Q1 – 1.5*IQR


Criterion for suspected outliers17 l.jpg
Criterion for suspected outliers NOT a resistant measure of center.

  • Are there any outliers?


Criterion for suspected outliers18 l.jpg
Criterion for suspected outliers NOT a resistant measure of center.

  • Find 5 number summary:

    Min Q1 Median Q3 Max

    1 54.5 103.5 200 2631

  • Are there any outliers?

    • Q3 – Q1 = 200 – 54.5 = 145.5

    • Times by 1.5: 145.5*1.5 = 218.25

  • Add to Q3: 200 + 218.25 = 418.25

    • Anything higher is a high outlier  7 obs.

  • Subtract from Q1: 54.5 – 218.25 = -163.75

    • Anything lower is a low outlier  no obs.


Criterion for suspected outliers19 l.jpg
Criterion for suspected outliers NOT a resistant measure of center.

  • Seven high outliers circled…


Modified boxplot l.jpg
Modified Boxplot NOT a resistant measure of center.

  • Has outliers as dots or stars.

  • The line extends only to the first non-outlier.


Standard deviation l.jpg
Standard deviation NOT a resistant measure of center.

  • Deviation :

  • Variance : s2

  • Standard Deviation : s


Slide22 l.jpg

DATA: 1792 1666 1362 1614 1460 1867 1439 NOT a resistant measure of center.

Mean = 1600

  • Find the deviations from the mean:

    Deviation1 = 1792 – 1600 = 192

    Deviation2 = 1666 – 1600 = 66

    …Deviation7 = 1439 – 1600 = -161

  • Square the deviations.

  • Add them up and divide the sum by n-1 = 6, this gives you s2.

  • Take square root: Standard Deviation = s = 189.24


Properties of the standard deviation l.jpg
Properties of the standard deviation NOT a resistant measure of center.

  • Standard deviation is always non-negative

  • s = 0 when there is no spread

  • s is not resistant to presence of outliers

    • 5-number summary usually better describes a skewed distribution or a distribution with outliers.

  • s is used when we use the mean

    • Mean and standard deviation are usually used for reasonably symmetric distributions without outliers.


Find the mean and standard deviation l.jpg
Find the mean and standard deviation. NOT a resistant measure of center.

Numbers of home runs that Hank Aaron hit in each of his 23 years in the Major Leagues:

13 27 26 44 30 39 40 34 45 44 24 32 44 39 29 44 38 47 34 40 20 12 10


Linear transformations changing units of measurements l.jpg
Linear Transformations: changing units of measurements NOT a resistant measure of center.

  • xnew = a + bxold

  • Common conversions

    • Distance: 100km is equivalent to 62 miles

      • xmiles = 0 + 0.62xkm

    • Weight: 1ounce is equivalent to 28.35 grams

      • xg= 0 + 28.35 xoz ,

    • Temperature:

      • _


Linear transformations l.jpg
Linear Transformations NOT a resistant measure of center.

  • Do not change shape of distribution

  • However, change center and spread

    Example: weights of newly hatched pythons:


Slide27 l.jpg

  • Ounces NOT a resistant measure of center.

    • Mean weight = (1.13+…+1.16)/5 = 1.12 oz

    • Standard deviation = 0.084

  • Grams

    • Mean weight =(32+…+33)/5 = 31.8 g

      • or 1.12 * 28.35 = 31.8

    • Standard deviation = 2.38

      • or 28.35 * 0.084 = 2.38


Effect of a linear transformation l.jpg
Effect of a linear transformation NOT a resistant measure of center.

  • Multiplying each observation by a positive number b multiplies both measures of center (mean and median) and measures of spread (IQR and standard deviation) by b.

  • Adding the same number a to each observation adds a to measures of center and to quartiles and other percentiles but does not change measures of spread (IQR and standard deviation)


Effects of linear transformations l.jpg
Effects of Linear Transformations NOT a resistant measure of center.

  • Your Transformation: xnew = a + b*xold

  • meannew = a + b*mean

  • mediannew = a + b*median

  • stdevnew = |b|*stdev

  • IQRnew = |b|*IQR

    |b|= absolute value of b (value without sign)


Example l.jpg
Example NOT a resistant measure of center.

  • Winter temperature recorded in Fahrenheit

    • mean = 20

    • stdev = 10

    • median = 22

    • IQR = 11

  • Convert into Celsius:

    • mean = -160/9 + 5/9 * 20 = -6.67 C

    • stdev = 5/9 * 10 = 5.56

    • median =

    • IQR =


Sas tips l.jpg
SAS tips NOT a resistant measure of center.

  • “proc univariate” procedure generates all the descriptive summaries.

  • For the time being, draw boxplots by hand from the 5-number summary

    • Optional: proc boxplot.

  • See plot.doc


Summary 1 2 l.jpg
Summary (1.2) NOT a resistant measure of center.

  • Measures of location: Mean, Median, Quartiles

  • Measures of spread: stdev, IQR

  • Mean, stdev

    • affected by extreme observations

  • Median, IQR

    • robust to extreme observations

  • Five number summary and boxplot

  • Linear Transformations


ad