Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer

1 / 67

# Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer - PowerPoint PPT Presentation

Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer. Chapter 4 Numerical Descriptors of Data. Doing Statistics for Business. Chapter 4 Objectives Numerical Measures of Center: The Mean, the Median, and the Mode

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer' - merrill

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### DoingStatistics for BusinessData, Inference, and Decision MakingMarilyn K. PelosiTheresa M. Sandifer

Chapter 4

Numerical

Descriptors of Data

Chapter 4 Objectives

• Numerical Measures of Center: The Mean, the Median, and the Mode
• Numerical Measures of Variability: The Range & the Standard Deviation
• Describing a Set of Data: The Empirical Rule & Boxplots

Chapter 4 Objectives (con’t)

• Measures of Relative Standing:

Percentiles

Percentile Rank

• Identifying Outliers:

z-scores

Boxplots

A Statistic is a numerical descriptor that

is calculated from sample data and is

used to describe the sample. Statistics

are usually represented by Roman

letters.

A Parameter is a numerical descriptor

that is used to describe a population.

Parameters are usually represented by

Greek letters.

The Sample Mean is the center of

balance of a set of data, and is found by

adding up all of the data values and

dividing by the number of observations.

The Population Mean is represented by

the Greek letter  (mu).

TRY IT NOW!

Restaurant Table Times

Calculating the Sample Mean

A restaurant is trying to decide whether it has an adequate number of

tables available. The restaurant owner decides that she would like some

information on the amount of time a table is occupied by a customer. She

collects data on the length of time a customer occupies a table for a

random sample of 10 customers and obtains the following data.

TRY IT NOW!

Restaurant Table Times

Calculating the Sample Mean (con’t)

.

Calculate the sample mean for the length of time a table is occupied.

The Sample Median is the value of the

middle observation in an ordered set

of data.

TRY IT NOW!

Town Hall Traffic

Calculating the Sample Median

In the past few years the town council of a small town has received

complaints that it has become increasingly difficult to cross the main

street in town near the library. The council decides to look at traffic flow

on the street. It selects a site directly in front of the library where most

people try to cross the road and records the number of cars that pass the

point in a two-minute period.

TRY IT NOW!

Town Hall Traffic

Calculating the Sample Median(con’t)

This is done for 10 two-minute periods at 3:00 p.m. over several weeks

and the following data are obtained.

Number of cars 20 27 29 28 37 23 21 28 29 28

Find the median number of cars that pass the site in two minutes.

Remember to SORT the data before you locate the median!

Figure 4.2 Mean and

Median for a Symmetric

Distribution

Figure 4.3 Mean and

Median for Skewed

Distributions

Left skew Right skew

TRY IT NOW!

Airline Cancellations

Comparing the Mean and the Median

An airline company is wondering about the number of cancellations that it

random sample of 15 days from the first quarter of the year and obtains

the following data:

# of cancellations 4 9 9 12 12 13 14 14 15 15 16 16 17 17 24

TRY IT NOW!

Airline Cancellations

Comparing the Mean and the Median

(con’t)

Find the mean and median for the # of cancellations for the commuter

f light.

When compared, do the data appear symmetric or skewed?

Make a dotplot of the data.

From the dotplot, do the data appear symmetric or skewed?

Note: the data have been sorted for you.

Discovery Exercise 4.1

The Trimmed Mean

Part I. Investigating the Data

In a report to the administration of a large university,

the Psychology Department states that the average class size

is greater than the 35 students per class allowed by the university charter.

The report indicates that the mean class size is 39.4.

Discovery Exercise 4.1

The Trimmed Mean

Part I. Investigating the Data (con’t)

No data are appended to the report, but you can obtain the current

enrollments easily. The data you find are:

3 14 22 26 42

3 15 23 27 45

5 15 24 28 45

9 17 24 28 190

11 21 25 36 193

13 22 26 38 193

Discovery Exercise 4.1

The Trimmed Mean

Part I. Investigating the Data (con’t)

A. Do you think that the mean is a god measure of center

for these data? Why or why not?

B. By simply studying the data, what do you think a typical class size for the Psychology Department is?

C. What is the median of the data? Is this closer to what you thought?

D. Compare the mean and median. What doe the comparison lead you to believe about the data?

E. Display the data graphically. Do you still think the same thing?

The Sample Mode is the data value that

has the highest frequency of occurrence

in the sample.

The Modal Class is the class interval in

a frequency distribution or histogram

that has the highest frequency.

Figure 4.4 Histogram of Bimodal Data

Discovery Exercise 4.2

Investigating Variability

The table contains air-quality data collected by the

Environmental Protection Agency. The data show the number

of days in which the ozone level was dangerous for 14 major U.S. cities

in 2000.

City Number of unhealthy days

Atlanta 18

Boston 0

Chicago 0

Dallas 5

Denver 0

Houston 94

Kansas City 0

Discovery Exercise 4.2

Investigating Variability (con’t)

City Number of unhealthy days

Los Angeles 1

New York 13

Pittsburgh 3

San Francisco 0

Seattle 0

Washington, DC 0

Discovery Exercise 4.2

Investigating Variability (con’t)

A. Display these data using a dotplot.

B. Find the typical number of unhealthy days by calculating the average

value.

C. Can you expect every observation to be typical? Why not?

A Sample Range, R, is the difference

between the maximum and minimum

observations in the sample.

TRY IT NOW!

Restaurant Table Time

Calculating the Sample Range

The restaurant looking at the turnaround time for its tables, wonders how

variable the occupation time for a table really is. The data the restaurant

Time (min) 59.3 58.6 62.7 65.4 59.0 67.3 62.8 68.1 59.4 63.7

TRY IT NOW!

Restaurant Table Time

Calculating the Sample Range (con’t)

What is the range of turnaround times?

Previously you calculated the mean turnaround time to be 62.6 minutes.

Using this information and the value for the range, what would the

restaurant expect as its lowest turnaround time? Its highest turnaround

time?

The Sample Variance, s2, is the average

of the squared deviations of the data

values from the sample mean.

The Sample Standard Deviation, s, is

the positive square root of the sample

variance.

The population variance and standard

deviation are represented by the Greek

letter, (sigma), where 2 is the

population variance and  is the

population standard deviation

The Empirical Rule says that for a

mound-shaped, symmetric distribution:

• about 68% of all data values are within one standard deviation of the mean
• about 95% of all observations are within two standard deviations of the mean
• almost all (more than 99%) of the observations are within three standard deviations of the mean.

TRY IT NOW!

Town Hall Traffic Flow

Calculating the Sample Variance

and Standard Deviation

The town council looking at the traffic flow problem has seen reports that

use the standard deviation, and wants to use it to describe the variability of

traffic flow. The data are:

Number of Cars 20 27 29 28 37 23 21 28 29 28

What is the sample standard deviation of the traffic flow?

Use whatever method you feel most comfortable with. If you have a stat. calc. learn how to use it now

Figure 4.5 The Empirical Rule

TRY IT NOW!

Loan Processing

The Empirical Rule

Errors in filling out loan applications can lead to delays in having the

loans approved. Bank employees must contact the applicants to correct

the errors. This sometimes requires multiple contacts. To understand the

extent to which the errors affect the application process a bank collected

data on the number of follow-up contacts required before a loan could be

processed.

TRY IT NOW!

Loan Processing

The Empirical Rule (con’t)

The bank looked at 25 different applications and found:

0 1 2 3 4

0 2 2 4 4

1 2 3 4 5

1 2 3 4 5

1 2 3 4 7

Make a dotplot of the data.

TRY IT NOW!

Loan Processing

The Empirical Rule (con’t)

From the dotplot, do you think that the assumption that the data

have a symmetric, bell-shaped distribution is a reasonable one?

Find the mean and standard deviation of the data.

According to the empirical rule, between what two values should 68% of

the observations fall?

Between what two values should 95% of the observations fall?

Between what two values should more than 99% of the observations fall?

A z-score measures the number of

standard deviations that a data value is

from the mean.

TRY IT NOW!

Town Hall Traffic

Calculating z-Scores

The town that was looking at traffic flow in front of the town hall

wonders if the observation of 37 cars is unusual. Although the town

officials know that their sample size of 10 cars is not large enough to

ensure accuracy; they want to use z-scores to look at the data:

Number of Cars 20 27 29 28 37 23 21 28 29 28

What is the z-score for the observation of 37 cars?

TRY IT NOW!

Town Hall Traffic

Calculating z-Scores

Comparing the z-score to the empirical rule, do you think

that the value is unusual?

The Pth Percentile of a data set is the

value that has p% of the data at or

below it.

The Percentile Rank of a value is the

percentage of the data in the sample that

are at or below the value of interest.

TRY IT NOW!

Aptitude Test Scores

Calculating the Percentile Rank

A group of employees at a manufacturing facility take a test

to determine their aptitude for training. The tests are scored on a

400-point scale and are shown here in increasing order:

185 227 241 257 281 299 314 329

195 228 243 261 283 304 318 333

196 234 248 269 283 307 319 335

199 238 250 271 291 309 322 349

223 241 253 272 297 310 328 353

TRY IT NOW!

Aptitude Test Scores

Calculating the Percentile Rank

One of the employees who scored 283 wants to know how

he stands relative to the other employees who took the exam.

What is the percentile rank for the employee’s score?

What is the percentile rank of the employee that scored 319?

The first quartile, Q1, is the value in

the sample that has 25% of the data at

or below it.

The third quartile, Q3, is the value in

the sample that has 75% of the data at

or below it.

TRY IT NOW!

Training Aptitude

Finding the Quartiles

The company looking at training aptitude wants to give

employees who scored in the top 25% on the test the opportunity

to attend a seminar on training. The test scores are:

185 227 241 257 281 299 314 329

195 228 243 261 283 304 318 333

196 234 248 269 283 307 319 335

199 238 250 271 291 309 322 349

223 241 253 272 297 310 328 353

TRY IT NOW!

Training Aptitude

Finding the Quartiles (con’t)

In the sample, what is the cutoff score for those people who

will be able to attend the seminar?

Hint: the value that defines the top 25% is the same as the value that defines the

bottom 75%.

Suppose that the company decides that the employees who scored in the

bottom 25% need some additional classes on team building. What is the

cutoff score for those employees who need the classes on team building?

A Boxplot or Box and Whisker diagram

is a graphical display that uses summary

statistics to display the distribution of a

set of data.

A Interquartile Range (IQR) is the

difference between the third and first

quartiles Q3 - Q1.

Figure 4.6

Box Portion of Boxplot

Figure 4.7

Boxplot with Whiskers

The Inner Fences of aboxplot are

located at Q1 - 1.5 (IQR) and Q3+ 1.5 (IQR).

The Outer Fences of aboxplot are

located at Q1 - 3 (IQR) and Q3+ 3 (IQR).

Figure 4.8 Boxplots for Skewed Data

TRY IT NOW!

Training Aptitude

Finding the Quartiles

The company that administered the training aptitude test to

its employees would like a better picture of how the employees

performed on the test. The data are:

185 227 241 257 281 299 314 329

195 228 243 261 283 304 318 333

196 234 248 269 283 307 319 335

199 238 250 271 291 309 322 349

223 241 253 272 297 310 328 353

TRY IT NOW!

Training Aptitude

Finding the Quartiles (con’t)

In the previous exercise, you found the first and third quartiles

of the data set. Use these values to complete the calculations

needed for a boxplot.

Draw a complete boxplot of the data.

Were there any outliers? If so, which data values were they?

The basics of creating a chart in Excel,

using the Chart Wizard.

1.Highlight the data (Frequency table) that you want to graph.

2. Invoke the Chart Wizard by clicking on the icon on the toolbar.

3. Follow the directions and hints from the Chart Wizard.

4. Edit the graph to include any other features or changes you want.

Calculating Summary Statistics in Excel

1. Position the cursor in the textbox labeled Input Range and highlight the range of data for which you want to calculate summary statistics.

2. Specify location for the output, either a section of the current worksheet, or a new worksheet or workbook. Click on the radio button for your choice. If you select Output Range, you must specify a location on the worksheet.

Calculating Summary Statistics in Excel

(con’t)

2. Position the cursor in the textbox for Output Range and click on the cell where you want the upper left corner of the results to appear. If you want to put the results in a new worksheet, you have the option of giving the sheet a name in the textbox or just letting Excel create a new, numbered sheet.

3. Click on the box labeled Summary statistics and finally click on OK. The output does not include the quartiles. When a statistic cannot be computed, the output will read N/A.

Figure 4.10Descriptive Statistics Dialog Box

Figure 4.11Output from Tools>Data Analysis>Descriptive Statistics

Making a Boxplot in KaddStat (note Excel

does not include boxplots as part of the graphs it

can create)

1.From the KADD menu select Boxplots. The Boxplot Dialog Box will open.

2. Position the cursor in the textbox labeled Input Range and highlight the cells that contain the data.

3. Indicate where you want the boxplot to appear.

4. Click OK.

Figure 4.13The Boxplot Dialog Box

Figure 4.14Finished Boxplot for Golf Ball Data

Chapter 4 Summary

In this chapter you have learned:

• There are many ways to describe a set of data using sample statistics. No single number will do the job, nor is there any standard way to proceed.
• The measures that you choose must reflect the characteristics of the data itself.