SESSION 19 & 20

1 / 20

# SESSION 19 & 20 - PowerPoint PPT Presentation

SESSION 19 & 20. Last Update 16 th March 2011. Measures of Dispersion Measures of Variability - Grouped Data -. Learning Objectives. All measures for grouped data: Measures of relative standing: Median, Quartiles, Deciles and Percentiles Measures of dispersion: Range

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'SESSION 19 & 20' - ruana

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### SESSION 19 & 20

Last Update

16th March 2011

Measures of Dispersion

Measures of Variability

- Grouped Data -

Learning Objectives

All measures for grouped data:

• Measures of relative standing: Median, Quartiles, Deciles and Percentiles
• Measures of dispersion: Range
• Measures of variability: Variance and Standard Deviation
• Empirical Rule and Chebysheff’sTheroem
• Coefficient of Variation
Percentiles

We can determine any percentile for grouped data using the following formula:

For quartiles, the formula ‘simplifies’ to:

Where m = 1, 2 , 3 or 4 for the first, second, third and fourth quartile

Calculation of Percentile
• Calculate the less than cumulative frequencies f(<) from the observed frequencies f
• Use the following formula to determine the location of the Pth percentile:

Lp = (n + 1) * (P / 100)

• Locate the interval Lp falls into
Calculation of Percentile
• Determine the following parameters
• Apply formula for Pth Percentile
Percentile: An example

Let us assume the following grouped data is to be assessed:

C = Upper + 1 – Lower

C = 49 + 1 – 40 = 10

Percentile: An example

If the data is interval (student marks approximately are), inequalities in the intervals may be more appropriate.

This example comes from your student manual. The intervals on the right including inequalities may be somewhat more intuitive

C = Upper + 1 – Lower

C = 49 + 1 – 40 = 10

C = Upper – Lower

C= 50 – 40 = 10

Solution – Step 1

Use the formula for the calculation to determine what interval the median falls into. Since 6 < 9.75 < 20, the median interval is 50 to < 60. Beware that the median interval is to be looked up in the cumulative frequency column, not the interval column!

Solution – Step 2

Read of the parameters required for the median formula for grouped data. The formula:

Now yields:

It is left as an exercise to confirm that the formula for Q yields the same result.

Variance

Using the midpoints allows us to calculate the variance of grouped data as well. In the case of interval data, as with the mean, the original data is to be preferred to the grouped data. For ordinal or nominal data the variance has no probabilistic meaning! Measures of relative standing (i.e. percentiles) may be used for ordinal data. There are no measures of variability for nominal data (Example: 1 = married, 2 = single, 3 = divorced, 4 = widowed).

Calculation of Variance
• Determine the interval midpoints x
• Multiply the observed frequencies f with the interval midpoints (fx)
• Sum the results from 2. and divide by n (Steps 1 to 3 are identical to calculating the mean for grouped data)
• Square x and multiply by f yielding fx2
Calculation of Variance
• Use the following formula to determine the variance for grouped data (sample):

And for the population:

Note that x denotes the midpoints here and not the actual observations.

Variance: An example

Let us assume the following grouped data is to be assessed:

Solution – Step 2

Using the formula

yields:

As before, the square root yields the standard deviation.

Empirical Rule

In normal bell-shaped frequency distribution polygons, we find the following:

• Approx. 68.2% of all observations fall within one standard deviation of the mean
• Approx. 95.4% of all observations fall within two standard deviations of the mean
• Approx. 99.7% of all observations fall within three standard deviations of the mean

x

95,44%

68,26%

- 2s

- 1s

+ 1s

+ 2s

x

Chebycheff’s Theorem

The Chebycheff Theorem is a more general alternative to the empirical rule, which applies to all shapes of histograms.

The proportion of observations that lie within k standard deviations of the mean is at least:

1 – 1 / k2for k > 1

Where k denotes the standard deviations away from the mean

Chebycheff’s Theorem - Example

The Empirical Rule provides approximate proportions under the assumption of a bell-shaped normal distribution, whereas Chebycheff’s Theorem provides lower bounds on the approximations for any types of distribution. Consequently, the tail-ends of the distribution are further apart. Chebycheff is not relevant to your examination!

Coefficient of Variation

The coefficient of variation of a set of observations is the standard deviation divided by their mean:

By relating the standard deviation to its mean one can make a statement about the variability of the data. Compare a standard deviation of 10 to a mean of 100 and a mean of 1,000,000!