Chapter 2 Methods for Describing Sets of Data
Objectives • Describe Data using Graphs • Describe Data using Numerical Measures
Describing Qualitative Data • Summary Table • Bar Graph • Pie Chart
Describing Qualitative Data • Qualitative data are nonnumeric in nature • Best described by using Classes
Describing Qualitative Data • A Class is one of the categories into which qualitative data can be classified. • The Class Frequency is the number of observations in the data set falling in a particular class. • The Class Relative Frequency is the class frequency divided by the total number of observations in the data set. • TheClass Percentageis the class relative frequency multiplied by 100.
Describing Qualitative Data – Displaying Descriptive Measures • Summary Table Class Frequency Class percentage = class relative frequency x 100
Describing Qualitative Data – Qualitative Data Displays • A Bar Graph is a graphical device for depicting qualitative data. • On the horizontal axis we specify the labels that are used for each of the classes. • A frequency, relative frequency, or percent frequency scale can be used for the vertical axis. • Using a bar of fixed width drawn above each class label, we extend the height appropriately. • The bars are separated to emphasize the fact that each class is a separate category.
Describing Qualitative Data – Qualitative Data Displays • The Pie Chart is a commonly used graphical device for presenting relative frequency distributions for qualitative data. • First draw a circle; then use the relative frequencies to subdivide the circle into sectors that correspond to the relative frequency for each class. • Since there are 360 degrees in a circle, a class with a relative frequency of .25 would consume .25(360) = 90 degrees of the circle.
Describing Qualitative Data – Example: Problem 2.5, p. 37 • A qualitative variable with three classes (X, Y, and Z) is measured for each of 20 units randomly sampled from a target population. The data (observed class for each unit) are listed below: Y X X Z X Y Y Y X X Z X Y Y X Z Y Y Y X
Describing Qualitative Data – Example: Problem 2.5, p. 37 (continued) • Compute the frequency for each of the three classes. • Compute the relative frequency for each of the three classes. • Display the results, part a, in a frequency bar graph. • Display the results, part b, in a pie chart. • Use MINITAB to construct a Bar graph and a Pie chart.
Graphical Methods for Describing Quantitative Data • Dot Plots • Stem-and-Leaf Displays • Frequency (Relative Frequency) Distributions • Histograms
Graphical Methods for Describing Quantitative Data • Dot Plots are displays where each data point is represented by a dot over its location on the number line. • Usually, dot plots are used only with small sample sizes.
Graphical Methods for Describing Quantitative Data • The Stem-and-Leaf Display is a useful tool for displaying (and ordering) moderate amounts of data. • We make a stem-and-leaf display by dividing each of the numbers we are working with into two parts - a stem and a leaf. • We display horizontal rows of leaves attached to a vertical column of stems.
Graphical Methods for Describing Quantitative Data: Example • Consider the following data set: 32 59 39 43 21 27 42 29 37 36 35 31 39 Construct a stem-and-leaf display for the given data set.
Graphical Methods for Describing Quantitative Data: Example (continued) Stem Leaf 2 1 7 9 3 1 2 5 6 7 9 4 2 3 8 5 9
Graphical Methods for Describing Quantitative Data: In constructing a stem-and-leaf display, we include all possible stems between the largest and the smallest stems, even if some of them do not have leaves. If we believe the original stem-and-leaf display has condensed the data too much, we can stretch the display by splitting each stem into two halves. Whenever a stem value is stated twice, the first value corresponds to leaf values of 0-4, and the second values corresponds to values of 5-9.
Graphical Methods for Describing Quantitative Data: Example (continued) Stem Leaf 2 1 2 7 9 3 1 2 3 5 6 7 8 4 2 3 4 8 5 5 9
Graphical Methods for Describing Quantitative Data • Leaf Units: • A single digit is used to define each leaf. • In the preceding example, the leaf unit was 1. • Leaf units may be 100, 10, 1, 0.1, and so on. • Where the leaf unit is not shown, it is assumed to equal 1.
Graphical Methods for Describing Quantitative Data: Example If we have data with values such as: 8.6 11.7 9.4 9.1 10.2 11.0 8.8 a stem-and-leaf display of these data will be: Leaf Unit = 0.1 Stem Leaf 8 6 8 9 1 4 10 2 11 0 7
Graphical Methods for Describing Quantitative Data A Frequency Distribution is a tabular summary of data showing the frequency (or number) of items in each of several non-overlapping classes. Guidelines for Constructing a Frequency Distribution: First step: select the classes of equal length. Second step: count the number of measurements that fall within each class. Third step: list the classes and their associated frequencies in a table.
Graphical Methods for Describing Quantitative Data • Guidelines for Selecting Number of Classes: • There is no unique way of selecting classes! • Classes must include all data values. • Use between 5 and 20 classes. • Data sets with a larger number of elements usually require a larger number of classes. • An observation that falls on the borderline of a class interval should be classified into the next highest interval.
Graphical Methods for Describing Quantitative Data • Histograms can be used to display either the frequency or relative frequency distributions. • To construct a histogram, place the variable of interest on the horizontal axis and draw a rectangle above each class interval with its height corresponding to the interval’s frequency, relative frequency, or percent frequency.
Graphical Methods for Describing Quantitative Data • More on Histograms
Graphical Methods for Describing Quantitative Data: Example • Using data given below, construct a frequency distribution, relative frequency distribution, and a histogram plot. 5.5 14.5 6.0 5.5 5.3 5.8 11.0 6.1 7.0 14.5 10.4 4.6 4.3 7.2 10.5 6.5 3.3 7.0 4.1 6.2 10.4 4.9
Graphical Methods for Describing Quantitative Data using MINITAB • The Data
Graphical Methods for Describing Quantitative Data using MINITAB • All graphs must have proper titles and labels. • The title should clearly describe the measurement being displayed and the items upon which the measurement is made. A generic form of title is “Distribution of Name of Measurement for Describe the Items”. You must substitute the correct words for the problem at hand for the items in italics. • The label for the column containing the counts of items should clearly describe the items. A generic label is “Number of Describe the Items”. You must substitute the correct words for the problem at hand for the items in italics.
Graphical Methods for Describing Quantitative Data using MINITAB • Stem-and-Leaf Display
Graphical Methods for Describing Quantitative Datausing MINITAB • Histogram
Summation Notation • Used to simplify summation instructions • Each observation in a data set is identified by a subscript x1, x2, x3, x4, x5, …. xn • Notation used to sum the above numbers together is
Summation Notation • Data set of 1, 2, 3, 4 • Are these the same? and
Numerical Measures of Central Tendency • Central Tendency – tendency of data to center about certain numerical values • Three commonly used measures of Central Tendency: - Mean - Median - Mode
Numerical Measures of Central Tendency The Mean: • Arithmetic average of the elements of the data set • Sample mean denoted by • Population mean denoted by • Calculated as and
Numerical Measures of Central Tendency The Median: • Middle number when n observations are arranged in increasing order. • Median is denoted by M. • If n is odd, M is the middle number. • If n is even, the M is the mean of the middle two numbers.
Numerical Measures of Central Tendency The Mode: • The most frequently occurring value in the data set • Data set can be multi-modal – have more than one mode • Data displayed in a histogram will have a modalclass – the class with the largest frequency
Numerical Measures of Central Tendency • The Data set: 1 3 5 6 8 8 9 11 12 • Mean: • Median: since there are nine observations in the data set, the median is the middle number M=8 • Mode: 8
Numerical Measures of Variability • Variability – the spread of the data across possible values • Three commonly used measures of Variability - Range - Variance - Standard Deviation
Numerical Measures of Variability The Range: • The Range of a data set is the difference between the largest and smallest data values. • It is very sensitive to extreme values. • It loses sensitivity when data sets are large.
Numerical Measures of Variability • The Sample Variance (s2) • The sum of the squared deviations from the mean divided by (n-1). Expressed as units squared • Why square the deviations? The sum of the deviations from the mean is zero.
Numerical Measures of Variability • The Sample Standard Deviation (s) is the positive square root of the sample variance • Expressed in the original units of measurement
Numerical Measures of VariabilityShortcut Formula for Variance • We will be using a shortcut formula to determine the sample variance:
Numerical Measures of VariabilityExample • Calculate the range, variance, and standard deviation for the following sample: 4, 2, 1, 0,1
Interpreting the Standard Deviation • The standard deviation is the most common way of measuring the variation in a data set. • For two data sets, the one that has the greater variability will have the larger standard deviation. • The standard deviation is also useful in describing a single distribution of measurements. This description will be accomplished by examining two statements – the Empirical Rule and Chebyshev’s Theorem.
Interpreting the Standard Deviation:The Empirical Rule If a set of measurements has a mound–shaped distribution, then: • The interval from to will contain approximately 68% of the measurements. • The interval from to will contain approximately 95% of the measurements. • The interval from to will contain approximately all the measurements.
Interpreting the Standard Deviation:Chebyshev’s Theorem • For any set of measurements and any number k1, the interval from to will contain at least (1 - 1/k2)*100% of the measurements. • Chebyshev’s Theorem applies to all possible distributions. It is very conservative. • Chebyshev’s Theorem gives the minimum proportion of the measurements that will lie within k standard deviation of their mean.
Interpreting the Standard Deviation:Example 1 • The following is a sample of 25 measurements: 7 6 6 11 8 9 11 9 10 8 7 7 5 9 10 7 7 7 7 9 12 10 10 8 6 For this sample the mean is 8.24, the standard deviation is 1.83.
Interpreting the Standard Deviation:Example 1 (continued) • Count the number of measurements within 1, 2, and 3 standard deviations. Express each count as a percentage of the total number of measurements. • Compare the percentages found in part a to the percentages given by the Empirical Rule and Chebyshev’s Theorem.
Interpreting the Standard Deviation:Example 2 • A manufacture of compact fluorescent light bulbs claims that the average life length for its bulbs is 500 hours, standard deviation is 25, and frequency distribution for the life length is mound shaped. a. Approximately what percentage of the manufacturer’s bulbs will last more than 475 hours? b. Approximately what percentage of the manufacturer’s bulbs will last less than 450 hours? c. Suppose your bulb lasts 445 hours. What could you infer about the manufacturer’s claim?
Numerical Measures of Relative Standing • Descriptive measures of relationship of a measurement to the rest of the data • Common measures: - percentile ranking or percentile score - z-score