Sta 291 spring 2010
This presentation is the property of its rightful owner.
Sponsored Links
1 / 26

STA 291 Spring 2010 PowerPoint PPT Presentation


  • 53 Views
  • Uploaded on
  • Presentation posted in: General

STA 291 Spring 2010. Lecture 5 Dustin Lueker. Measures of Central Tendency. Mean - Arithmetic Average . Median - Midpoint of the observations when they are arranged in increasing order. Notation: Subscripted variables n = # of units in the sample N = # of units in the population

Download Presentation

STA 291 Spring 2010

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Sta 291 spring 2010

STA 291Spring 2010

Lecture 5

Dustin Lueker


Measures of central tendency

Measures of Central Tendency

Mean - Arithmetic Average

Median - Midpoint of the observations when they are arranged in increasing order

Notation: Subscripted variables

n = # of units in the sample

N = # of units in the population

x = Variable to be measured

xi= Measurement of the ith unit

Mode - Most frequent value.

STA 291 Spring 2010 Lecture 5


Symbols

Symbols

STA 291 Spring 2010 Lecture 5


Variance and standard deviation

Variance and Standard Deviation

  • Sample

    • Variance

    • Standard Deviation

  • Population

    • Variance

    • Standard Deviation

STA 291 Spring 2010 Lecture 5


Variance step by step

Variance Step By Step

  • Calculate the mean

  • For each observation, calculate the deviation

  • For each observation, calculate the squared deviation

  • Add up all the squared deviations

  • Divide the result by (n-1)

    Or N if you are finding the population variance

    (To get the standard deviation, take the square root of the result)

STA 291 Spring 2010 Lecture 5


Empirical rule

Empirical Rule

  • If the data is approximately symmetric and bell-shaped then

    • About 68% of the observations are within one standard deviation from the mean

    • About 95% of the observations are within two standard deviations from the mean

    • About 99.7% of the observations are within three standard deviations from the mean

STA 291 Spring 2010 Lecture 5


Empirical rule1

Empirical Rule

STA 291 Spring 2010 Lecture 5


Percentiles

Percentiles

  • The pth percentile (Xp) is a number such that p% of the observations take values below it, and (100-p)% take values above it

    • 50th percentile = median

    • 25th percentile = lower quartile

    • 75th percentile = upper quartile

  • The index of Lp

    • (n+1)p/100

STA 291 Spring 2010 Lecture 5


Quartiles

Quartiles

  • 25th percentile

    • lower quartile

    • Q1

    • (approximately) median of the observations below the median

  • 75th percentile

    • upper quartile

    • Q3

    • (approximately) median of the observations above the median

STA 291 Spring 2010 Lecture 5


Example

Example

  • Find the 25th percentile of this data set

    • {3, 7, 12, 13, 15, 19, 24}

STA 291 Spring 2010 Lecture 5


Interpolation

Interpolation

  • Use when the index is not a whole number

  • Want to start with the closest index lower than the number found then go the distance of the decimal towards the next number

  • If the index is found to be 5.4 you want to go to the 5th value then add .4 of the value between the 5th value and 6th value

    • In essence we are going to the 5.4th value

STA 291 Spring 2010 Lecture 5


Example1

Example

  • Find the 40th percentile of the same data set

    • {3, 7, 12, 13, 15, 19, 24}

      • Must use interpolation

STA 291 Spring 2010 Lecture 5


Data summary

Data Summary

  • Five Number Summary

    • Minimum

    • Lower Quartile

    • Median

    • Upper Quartile

    • Maximum

  • Example

    • minimum=4

    • Q1=256

    • median=530

    • Q3=1105

    • maximum=320,000.

      • What does this suggest about the shape of the distribution?

STA 291 Spring 2010 Lecture 5


Interquartile range iqr

Interquartile Range (IQR)

  • The Interquartile Range (IQR) is the difference between upper and lower quartile

    • IQR = Q3 – Q1

    • IQR = Range of values that contains the middle 50% of the data

    • IQR increases as variability increases

  • Murder Rate Data

    • Q1= 3.9

    • Q3 = 10.3

    • IQR =

STA 291 Spring 2010 Lecture 5


Box plot

Box Plot

  • Displays the five number summary (and more) graphical

  • Consists of a box that contains the central 50% of the distribution (from lower quartile to upper quartile)

  • A line within the box that marks the median,

  • And whiskersthat extend to the maximum and minimum values

    • This is assuming there are no outliers in the data set

STA 291 Spring 2010 Lecture 5


Outliers

Outliers

  • An observation is an outlier if it falls

    • more than 1.5 IQR above the upper quartile

      or

    • more than 1.5 IQR below the lower quartile

STA 291 Spring 2010 Lecture 5


Box plot1

Box Plot

  • Whiskers only extend to the most extreme observations within 1.5 IQR beyond the quartiles

  • If an observation is an outlier, it is marked by an x, +, or some other identifier

STA 291 Spring 2010 Lecture 5


Example2

Example

  • Values

    • Min = 148

    • Q1 = 158

    • Median = Q2 = 162

    • Q3 = 182

    • Max = 204

  • Create a box plot

  • STA 291 Spring 2010 Lecture 5


    5 number summary box plot

    5 Number Summary/Box Plot

    • On right-skewed distributions, minimum, Q1, and median will be “bunched up”, while Q3 and the maximum will be farther away.

    • For left-skewed distributions, the “mirror” is true: the maximum, Q3, and the median will be relatively close compared to the corresponding distances to Q1 and the minimum.

    • Symmetric distributions?

    STA 291 Spring 2010 Lecture 5


    Sta 291 spring 2010

    Mode

    • Value that occurs most frequently

      • Does not need to be near the center of the distribution

        • Not really a measure of central tendency

      • Can be used for all types of data (nominal, ordinal, interval)

    • Special Cases

      • Data Set

        • {2, 2, 4, 5, 5, 6, 10, 11}

        • Mode =

      • Data Set

        • {2, 6, 7, 10, 13}

        • Mode =

    STA 291 Spring 2010 Lecture 5


    Mean vs median vs mode

    Mean vs. Median vs. Mode

    • Mean

      • Interval data with an approximately symmetric distribution

    • Median

      • Interval or ordinal data

    • Mode

      • All types of data

    STA 291 Spring 2010 Lecture 5


    Mean vs median vs mode1

    Mean vs. Median vs. Mode

    • Mean is sensitive to outliers

      • Median and mode are not

        • Why?

    • In general, the median is more appropriate for skewed data than the mean

      • Why?

    • In some situations, the median may be too insensitive to changes in the data

    • The mode may not be unique

    STA 291 Spring 2010 Lecture 5


    Example3

    Example

    • “How often do you read the newspaper?”

    • Identify the mode

    • Identify the median response

    STA 291 Spring 2010 Lecture 5


    Measures of variation

    Measures of Variation

    • Statistics that describe variability

      • Two distributions may have the same mean and/or median but different variability

        • Mean and Median only describe a typical value, but not the spread of the data

      • Range

      • Variance

      • Standard Deviation

      • Interquartile Range

        • All of these can be computed for the sample or population

    STA 291 Spring 2010 Lecture 5


    Range

    Range

    • Difference between the largest and smallest observation

      • Very much affected by outliers

        • A misrecorded observation may lead to an outlier, and affect the range

    • The range does not always reveal different variation about the mean

    STA 291 Spring 2010 Lecture 5


    Example4

    Example

    • Sample 1

      • Smallest Observation: 112

      • Largest Observation: 797

      • Range =

    • Sample 2

      • Smallest Observation: 15033

      • Largest Observation: 16125

      • Range =

    STA 291 Spring 2010 Lecture 5


  • Login