Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer

DoingStatistics for BusinessData, Inference, and Decision MakingMarilyn K. PelosiTheresa M. Sandifer Chapter 4 Numerical Descriptors of Data

DoingStatistics for Business Chapter 4 Objectives • Numerical Measures of Center: The Mean, the Median, and the Mode • Numerical Measures of Variability: The Range & the Standard Deviation • Describing a Set of Data: The Empirical Rule & Boxplots

DoingStatistics for Business Chapter 4 Objectives (con’t) • Measures of Relative Standing: Percentiles Percentile Rank • Identifying Outliers: z-scores Boxplots

Doing Statistics for Business A Statistic is a numerical descriptor that is calculated from sample data and is used to describe the sample. Statistics are usually represented by Roman letters.

Doing Statistics for Business A Parameter is a numerical descriptor that is used to describe a population. Parameters are usually represented by Greek letters.

Doing Statistics for Business The Sample Mean is the center of balance of a set of data, and is found by adding up all of the data values and dividing by the number of observations.

Doing Statistics for Business The Population Mean is represented by the Greek letter  (mu).

DoingStatistics for Business TRY IT NOW! Restaurant Table Times Calculating the Sample Mean A restaurant is trying to decide whether it has an adequate number of tables available. The restaurant owner decides that she would like some information on the amount of time a table is occupied by a customer. She collects data on the length of time a customer occupies a table for a random sample of 10 customers and obtains the following data.

DoingStatistics for Business TRY IT NOW! Restaurant Table Times Calculating the Sample Mean (con’t) . Calculate the sample mean for the length of time a table is occupied.

Doing Statistics for Business The Sample Median is the value of the middle observation in an ordered set of data.

DoingStatistics for Business TRY IT NOW! Town Hall Traffic Calculating the Sample Median In the past few years the town council of a small town has received complaints that it has become increasingly difficult to cross the main street in town near the library. The council decides to look at traffic flow on the street. It selects a site directly in front of the library where most people try to cross the road and records the number of cars that pass the point in a two-minute period.

DoingStatistics for Business TRY IT NOW! Town Hall Traffic Calculating the Sample Median(con’t) This is done for 10 two-minute periods at 3:00 p.m. over several weeks and the following data are obtained. Number of cars 20 27 29 28 37 23 21 28 29 28 Find the median number of cars that pass the site in two minutes. Remember to SORT the data before you locate the median!

DoingStatistics for Business Figure 4.2 Mean and Median for a Symmetric Distribution

DoingStatistics for Business Figure 4.3 Mean and Median for Skewed Distributions Left skew Right skew

DoingStatistics for Business TRY IT NOW! Airline Cancellations Comparing the Mean and the Median An airline company is wondering about the number of cancellations that it receives for a particular business commuter flight. The airline takes a random sample of 15 days from the first quarter of the year and obtains the following data: # of cancellations 4 9 9 12 12 13 14 14 15 15 16 16 17 17 24

DoingStatistics for Business TRY IT NOW! Airline Cancellations Comparing the Mean and the Median (con’t) Find the mean and median for the # of cancellations for the commuter f light. When compared, do the data appear symmetric or skewed? Make a dotplot of the data. From the dotplot, do the data appear symmetric or skewed? Note: the data have been sorted for you.

Doing Statistics for Business Discovery Exercise 4.1 The Trimmed Mean Part I. Investigating the Data In a report to the administration of a large university, the Psychology Department states that the average class size is greater than the 35 students per class allowed by the university charter. The report indicates that the mean class size is 39.4.

Doing Statistics for Business Discovery Exercise 4.1 The Trimmed Mean Part I. Investigating the Data (con’t) No data are appended to the report, but you can obtain the current enrollments easily. The data you find are: 3 14 22 26 42 3 15 23 27 45 5 15 24 28 45 9 17 24 28 190 11 21 25 36 193 13 22 26 38 193

Doing Statistics for Business Discovery Exercise 4.1 The Trimmed Mean Part I. Investigating the Data (con’t) A. Do you think that the mean is a god measure of center for these data? Why or why not? B. By simply studying the data, what do you think a typical class size for the Psychology Department is? C. What is the median of the data? Is this closer to what you thought? D. Compare the mean and median. What doe the comparison lead you to believe about the data? E. Display the data graphically. Do you still think the same thing?

Doing Statistics for Business The Sample Mode is the data value that has the highest frequency of occurrence in the sample.

Doing Statistics for Business The Modal Class is the class interval in a frequency distribution or histogram that has the highest frequency.

Doing Statistics for Business Figure 4.4 Histogram of Bimodal Data

Doing Statistics for Business Discovery Exercise 4.2 Investigating Variability The table contains air-quality data collected by the Environmental Protection Agency. The data show the number of days in which the ozone level was dangerous for 14 major U.S. cities in 2000. City Number of unhealthy days Atlanta 18 Boston 0 Chicago 0 Dallas 5 Denver 0 Houston 94 Kansas City 0

Doing Statistics for Business Discovery Exercise 4.2 Investigating Variability (con’t) City Number of unhealthy days Los Angeles 1 New York 13 Philadelphia 2 Pittsburgh 3 San Francisco 0 Seattle 0 Washington, DC 0

Doing Statistics for Business Discovery Exercise 4.2 Investigating Variability (con’t) A. Display these data using a dotplot. B. Find the typical number of unhealthy days by calculating the average value. C. Can you expect every observation to be typical? Why not?

Doing Statistics for Business A Sample Range, R, is the difference between the maximum and minimum observations in the sample.

DoingStatistics for Business TRY IT NOW! Restaurant Table Time Calculating the Sample Range The restaurant looking at the turnaround time for its tables, wonders how variable the occupation time for a table really is. The data the restaurant had collected are: Time (min) 59.3 58.6 62.7 65.4 59.0 67.3 62.8 68.1 59.4 63.7

DoingStatistics for Business TRY IT NOW! Restaurant Table Time Calculating the Sample Range (con’t) What is the range of turnaround times? Previously you calculated the mean turnaround time to be 62.6 minutes. Using this information and the value for the range, what would the restaurant expect as its lowest turnaround time? Its highest turnaround time?

Doing Statistics for Business The Sample Variance, s2, is the average of the squared deviations of the data values from the sample mean.

Doing Statistics for Business The Sample Standard Deviation, s, is the positive square root of the sample variance.

Doing Statistics for Business The population variance and standard deviation are represented by the Greek letter, (sigma), where 2 is the population variance and  is the population standard deviation

Doing Statistics for Business The Empirical Rule says that for a mound-shaped, symmetric distribution: • about 68% of all data values are within one standard deviation of the mean • about 95% of all observations are within two standard deviations of the mean • almost all (more than 99%) of the observations are within three standard deviations of the mean.

DoingStatistics for Business TRY IT NOW! Town Hall Traffic Flow Calculating the Sample Variance and Standard Deviation The town council looking at the traffic flow problem has seen reports that use the standard deviation, and wants to use it to describe the variability of traffic flow. The data are: Number of Cars 20 27 29 28 37 23 21 28 29 28 What is the sample standard deviation of the traffic flow? Use whatever method you feel most comfortable with. If you have a stat. calc. learn how to use it now

DoingStatistics for Business Figure 4.5 The Empirical Rule

DoingStatistics for Business TRY IT NOW! Loan Processing The Empirical Rule Errors in filling out loan applications can lead to delays in having the loans approved. Bank employees must contact the applicants to correct the errors. This sometimes requires multiple contacts. To understand the extent to which the errors affect the application process a bank collected data on the number of follow-up contacts required before a loan could be processed.

DoingStatistics for Business TRY IT NOW! Loan Processing The Empirical Rule (con’t) The bank looked at 25 different applications and found: 0 1 2 3 4 0 2 2 4 4 1 2 3 4 5 1 2 3 4 5 1 2 3 4 7 Make a dotplot of the data.

DoingStatistics for Business TRY IT NOW! Loan Processing The Empirical Rule (con’t) From the dotplot, do you think that the assumption that the data have a symmetric, bell-shaped distribution is a reasonable one? Find the mean and standard deviation of the data. According to the empirical rule, between what two values should 68% of the observations fall? Between what two values should 95% of the observations fall? Between what two values should more than 99% of the observations fall?

Doing Statistics for Business A z-score measures the number of standard deviations that a data value is from the mean.

DoingStatistics for Business TRY IT NOW! Town Hall Traffic Calculating z-Scores The town that was looking at traffic flow in front of the town hall wonders if the observation of 37 cars is unusual. Although the town officials know that their sample size of 10 cars is not large enough to ensure accuracy; they want to use z-scores to look at the data: Number of Cars 20 27 29 28 37 23 21 28 29 28 What is the z-score for the observation of 37 cars?

DoingStatistics for Business TRY IT NOW! Town Hall Traffic Calculating z-Scores Comparing the z-score to the empirical rule, do you think that the value is unusual?

Doing Statistics for Business The Pth Percentile of a data set is the value that has p% of the data at or below it.

Doing Statistics for Business The Percentile Rank of a value is the percentage of the data in the sample that are at or below the value of interest.

DoingStatistics for Business TRY IT NOW! Aptitude Test Scores Calculating the Percentile Rank A group of employees at a manufacturing facility take a test to determine their aptitude for training. The tests are scored on a 400-point scale and are shown here in increasing order: 185 227 241 257 281 299 314 329 195 228 243 261 283 304 318 333 196 234 248 269 283 307 319 335 199 238 250 271 291 309 322 349 223 241 253 272 297 310 328 353

DoingStatistics for Business TRY IT NOW! Aptitude Test Scores Calculating the Percentile Rank One of the employees who scored 283 wants to know how he stands relative to the other employees who took the exam. What is the percentile rank for the employee’s score? What is the percentile rank of the employee that scored 319?

Doing Statistics for Business The first quartile, Q1, is the value in the sample that has 25% of the data at or below it.

Doing Statistics for Business The third quartile, Q3, is the value in the sample that has 75% of the data at or below it.

DoingStatistics for Business TRY IT NOW! Training Aptitude Finding the Quartiles The company looking at training aptitude wants to give employees who scored in the top 25% on the test the opportunity to attend a seminar on training. The test scores are: 185 227 241 257 281 299 314 329 195 228 243 261 283 304 318 333 196 234 248 269 283 307 319 335 199 238 250 271 291 309 322 349 223 241 253 272 297 310 328 353

DoingStatistics for Business TRY IT NOW! Training Aptitude Finding the Quartiles (con’t) In the sample, what is the cutoff score for those people who will be able to attend the seminar? Hint: the value that defines the top 25% is the same as the value that defines the bottom 75%. Suppose that the company decides that the employees who scored in the bottom 25% need some additional classes on team building. What is the cutoff score for those employees who need the classes on team building?

Doing Statistics for Business A Boxplot or Box and Whisker diagram is a graphical display that uses summary statistics to display the distribution of a set of data.

Doing Statistics for Business A Interquartile Range (IQR) is the difference between the third and first quartiles Q3 - Q1.

Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer