- 168 Views
- Updated On :
- Presentation posted in: General

Chapter 3. Using Numbers to Describe Distributions of Data. With one data point clearly the central location is at the point itself. Measures of Central Location. The measure of central location reflects the locations of all the actual data points. How?. With two data points,

Chapter 3

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Using Numbers to Describe Distributions of Data

With one data point

clearly the central

location is at the point

itself.

Measures of Central Location

- The measure of central location reflects the locations of all the actual data points.
- How?

With two data points,

the central location

should fall in the middle

between them (in order

to reflect the location of

both of them).

But if the third data point

appears on the left hand-side

of the midrange, it should “pull”

the central location to the left.

Sum of the observations

Number of observations

Mean =

The Arithmetic Mean

- This is the most popular and useful measure of central location

- This is often called the average.

Useful Notation

x: lowercase letter x - represents any measurement in a sample of data.

n: lowercase letter n – number of measurements in a sample

∑: uppercase Greek letter sigma – represents sum

∑x: - “add all the measurements in a sample.

: – lowercase x with a bar over it – denotes the sample mean

µ: lowercase Greek letter mu – denotes the population mean

The Arithmetic Mean

Sample mean

Population mean

Sample size

Population size

The Arithmetic Mean

- Example 1

The reported time on the Internet of 10 adults are 0, 7, 12, 5, 33, 14, 8, 0, 9, 22 hours. Find the mean time on the Internet.

0

7

22

11.0

.

Find the median of the time on the internetfor the 10 adults of example 3.1

Suppose only 9 adults were sampled (exclude, say, the longest time (33))

Comment

Even number of observations

0, 0, 5, 7, 8,9, 12, 14, 22, 33

The Median

- The Median of a set of observations is the value that falls in the middle when the observations are arranged in order of magnitude.

Odd number of observations

8

8.5,

0, 0, 5, 7, 89, 12, 14, 22

0, 0, 5, 7, 8,9, 12, 14, 22, 33

Measures of Center

1) Sample Mean: where n is the sample size.

2) Sample Median:

First, put the data in order.

Then,

the middle number for odd sample sizes

median =

the average of the two middle values for

even sample sizes

Examples – Time to Complete an Exam

A random sample of times, in minutes, to complete a statistics exam yielded the following times. Compute the mean and median for this data.

33, 29, 45, 60, 42, 19, 52, 38, 36

The mean is minutes

Recall, we must rank (sort) the data before finding the median.

19, 29, 33, 36, 38, 42, 45, 52, 60

Since there are 9 (odd) data points, the 5th point is the median.

The median is 38 minutes.

Examples – Miles Jogged Last Week

A random sample of 12 joggers were asked to keep track of the distance they ran (in miles) over a week’s time.

Compute the mean and median for this data.

5.5, 7.2, 1.6, 22.0, 8.7, 2.8, 5.3, 3.4, 12.5, 18.6, 8.3, 6.6

miles

Examples – Miles Jogged Last Week (Cont)

A random sample of 12 joggers were asked to keep track of the distance they ran (in miles) over a week’s time.

Compute the mean and median for this data.

5.5, 7.2, 1.6, 22.0, 8.7, 2.8, 5.3, 3.4, 12.5, 18.6, 8.3, 6.6

Recall, we must rank (sort) the data before finding the median.

1.6, 2.8, 3.4, 5.3, 5.5, 6.6, 7.2, 8.3, 8.7, 12.5, 18.6, 22.0

Since there are 12 (even) data points, the median is the average of the 6th and 7th points.

The median is 6.9 miles.

Mean and Median Comparisons

Recall the mean (8.54 miles) is larger than the median (6.9 miles) for this data. This occurs when the data is skewed to the right.

Mean and Median Comparisons

If the data is symmetric, the mean and the median are approximately the same.

If the data is skewed to the right, the mean is larger than the median.

If the data is skewed to the left, the mean is smaller than the median.

mean = -0.0373 mean = 10.71 mean = 4.829

median = -0.0173median = 7.75median = 6.629

Relationship among Mean, Median, and Mode

- If a distribution is symmetrical, the mean, median and mode coincide

- If a distribution is asymmetrical, and skewed to the left or to the right, the three measures differ.

A positively skewed distribution

(“skewed to the right”)

Mode

Mean

Median

3.2 Measures of variability

- Measures of central location fail to tell the whole story about the distribution.
- A question of interest still remains unanswered:

How much are the observations spread out

around the mean value?

3.2 Measures of variability

Observe two hypothetical

data sets:

Small variability

The average value provides

a good representation of the

observations in the data set.

This data set is now

changing to...

3.2 Measures of variability

Observe two hypothetical

data sets:

Small variability

The average value provides

a good representation of the

observations in the data set.

Larger variability

The same average value does not

provide as good representation of the

observations in the data set as before.

?

?

?

- The range

- The range of a set of observations is the difference between the largest and smallest observations.
- Its major advantage is the ease with which it can be computed.
- Its major shortcoming is its failure to provide information on the dispersion of the observations between the two end points.

But, how do all the observations spread out?

The range cannot assist in answering this question

Range

Largest

observation

Smallest

observation

Notation for Samples and Populations

Recall, we will use statistics to make inference about population values.

Sample Descriptive Population Descriptive

Measures Measures

= sample meanm = population mean

s2 = sample variances2 = population variance

s = sample standard s = population standard

deviation deviation

- This measure reflects the dispersion of all the observations
- The variance of a population of size N x1, x2,…,xN whose mean is m is defined as
- The variance of a sample of n observationsx1, x2, …,xn whose mean is is defined as

The Variance

Sum = 0

Sum = 0

Why not use the sum of deviations?

Consider two small populations:

9-10= -1

A measure of dispersion

Should agrees with this

observation.

11-10= +1

Can the sum of deviations

Be a good measure of dispersion?

The sum of deviations is zero for both populations, therefore, is not a good measure of dispersion.

8-10= -2

A

12-10= +2

8

9

10

11

12

…but measurements in B

are more dispersed

then those in A.

The mean of both

populations is 10...

4-10 = - 6

16-10 = +6

B

7-10 = -3

13-10 = +3

4

7

10

13

16

The Variance

Let us calculate the variance of the two populations

Why is the variance defined as

the average squared deviation?

Why not use the sum of squared

deviations as a measure of

variation instead?

After all, the sum of squared

deviations increases in

magnitude when the variation

of a data set increases!!

The Variance

Let us calculate the sum of squared deviations for both data sets

Which data set has a larger dispersion?

Data set B

is more dispersed

around the mean

A

B

1

2

3

1

3

5

SumA = (1-2)2 +…+(1-2)2 +(3-2)2 +… +(3-2)2= 10

SumB = (1-3)2 + (5-3)2 = 8

The Variance

SumA > SumB. This is inconsistent with the observation that set B is more dispersed.

A

B

1

3

1

2

3

5

The Variance

However, when calculated on “per observation” basis (variance), the data set dispersions are properly ranked.

sA2 = SumA/N = 10/5 = 2

sB2 = SumB/N = 8/2 = 4

A

B

1

3

1

2

3

5

The Variance

- Example
- The following sample consists of the number of jobs six students applied for: 17, 15, 23, 7, 9, 13. Finds its mean and variance

- Solution

Standard Deviation

- The standard deviation of a set of observations is the square root of the variance .

Properties of the Standard Deviation, s

1. s measures the variability is a sample of measurements. It is a measure of how much the sample values deviate from the sample mean.

2. s is a nonnegative number. If all the numbers in a sample are equal, the value of the standard deviation will be zero. This is the smallest possible value for the standard deviation.

3. When comparing 2 samples of data, the sample that is more variable will have a larger standard deviation.

Standard Deviation

- Example: To examine the consistency of shots for a new innovative golf club, a golfer was asked to hit 150 shots, 75 with a currently used (7-iron) club, and 75 with the new club.
- The distances were recorded.
- Which 7-iron is more consistent?

Standard Deviation

- Example– solution

Excel printout, from the “Descriptive Statistics” sub-menu.

The innovation club is more consistent, and because the means are close, is considered a better club

- For sets of quantitative data that result from real-life experiments, the following statements are generally true:
- 1. Most of the measurements will be within 2 standard deviations of the mean
- 2. All, or almost all of the measurements will be within 3 standard deviations of the mean.

Interpreting Standard Deviation

- The standard deviation can be used to
- compare the variability of several distributions
- make a statement about the general shape of a distribution.

- The empirical rule: If a sample of observations has a mound-shaped distribution, the interval

Empirical Rule Example

A sample of n=40 students asked for their one-way commute times to campus yielded a mean of 13.6 minutes with a standard deviation of 2.1 minutes.

Empirical Rule:

Most students drive between 9.4 and 17.8 minutes to campus.

Almost all students drive between 7.3 and 19.9 minutes to campus.

Empirical Rule Example #2

The construction time for a 3-bedroom house for a local builder is known to follow a mound-shaped and symmetric distribution with a mean of 84 days and a standard deviation of 7 days.

a) Most 3-bedroom houses take between 70 and 98 days to be completed for this builder.

b) Almost all 3-bedroom houses take between 63 and 105 days to be completed for this builder.

Measures of Variability (Spread) for Samples

We wish to quantify how spread out from the center the data is.

Sample range: R = largest value – smallest value

Sample variance:

Sample standard deviation:

StatCrunch will be used to calculate the standard deviation for most of our data sets.

A Complete Analysis for a Data Set

Bone density loss measurements were taken for a sample of 125 women aged 50 or over. Complete an analysis of the data and describe the results.

This data was entered into StatCrunch. First, we generated the basic descriptive statistics by the commands:

Stat > Basic Statistics > Display Descriptive Statistics

With the cursor in the variables box, double click the variable “Bone Density Loss”. Then click OK.

Descriptive Statistics: Bone Density Loss

Variable N Mean Median TrMean StDev SE Mean

Bone Den 12535.00836.000 35.071 7.684 0.687

Variable Minimum Maximum Q1 Q3

Bone Den 15.000 53.000 30.000 41.000

A Complete Analysis for a Data Set (Cont)

Descriptive Statistics: Bone Density Loss (Modified)

Variable N Mean Median StDev Minimum Maximum

Bone Den 12535.00836.0007.684 15.000 53.000

The sample mean is 35.008 and the median is 36, so we expect a roughly symmetric or slightly skewed left distribution. The typical bone density loss is around 35 to 36 units. The histogram is given below

A Complete Analysis for a Data Set (Cont)

Descriptive Statistics: Bone Density Loss (Modified)

Variable N Mean Median StDev Minimum Maximum

Bone Den 12535.00836.0007.684 15.000 53.000

35.008-2(7.684) = 19.640

35.008+2(7.684) = 50.376

Most women aged 50 and over have between

19.640 and 50.376 units of bone density loss.

Out of the 125 measurements in the sample, 118 were between these two numbers. This represents 94.4 % of the data points, so this tends to agree with the empirical rule.

The range is 53-15=38. Now 38/4 = 9.5. Since 7.684 and 9.5 are not drastically different values, then s was probably calculated properly.