Business Statistics for Managerial Decision Making Examining Distributions
Introduction • Descriptive Statistics • Methods that organize and summarize data aid in effective presentation and increased understanding. • Bar charts, tabular displays, various plots of economic data, averages and percentages. • Often the individuals or objects studied by an investigator come from a much larger collection, and the researcher’s interest goes beyond just data summarization.
Introduction • Population • The entire collection of individuals or objects about which information is desired. • Sample • A subset of the population selected in some prescribed manner for study.
Introduction • Inferential Statistics • Involves generalizing from a sample to the population from which it was selected. • This type of generalization involves some risk, since a conclusion about the population will be reached based on the basis of available, but incomplete, information. • An important aspect in the development of inference techniques involves quantifying the associated risks.
Individuals and variables • Individuals • are the objects described by a set of data. • They may be people, but they may also be business firms, common stocks, or other objects. • A Variable • is any characteristic of an individual. • A variable can take different values for different individuals.
Categorical & Quantitative Variables • A Categorical Variable places an individual into one of several groups or categories. • A Quantitative Variable takes numerical values for which arithmetic operations such as adding and averaging make sense. • The distribution of a variable tell us what values it takes and how often it takes these values.
Discrete and Continuous Variable • With numerical data (quantitative variables), it is useful to make a further distinction. • Numerical data is discrete if the possible values are isolated points on the number line. • Numerical data is continuous if the set of possible values form an entire interval on the number line.
Stem plot • To make a stem plot: • Separate each observation into a stem consisting of all but the final (rightmost) digit and a leaf, the final digit. Stems may have as many digits as needed, but each leaf contains only a single digit. • Write the stems in a vertical column with the smallest at the top, and draw a vertical line at the right of this column. • Write each leaf in the row to the right of its stem, in increasing order out from the stem.
Frequency Distribution • A frequency distribution for categorical data is a table that displays the categories, frequencies, and relative frequencies. • Example • The increasing emphasis on exercise has resulted in an increase of sport related injuries. A listing of the 82 sample observations would look something like this: F, Sp, Sp, Co, F, L, F, Ch, De, L, Sp, Di, St, Cn,…
Frequency Distribution • The following coding is used: • Sp = Sprain, St = Strain, Di = dislocation, Co = Contusion, L = laceration, Cn = Concussion, F = fracture, Ch = chronic, De = dental
Frequency Distribution for Discrete Numerical Data • Discrete numerical data almost always results from counting. • In such cases, each observation is a whole number. • For example, if the possible values are 0, 1, 2, 3, …, then these are listed in column, and a running tally is kept as a single pass is made through the data
Frequency Distribution for Discrete Numerical Data • Example • A sample of 708 bus drivers employed by public corporations was selected, and the number of traffic accidents in which each was involved during a 4-year period was determined. A listing of the 708 sample observations would look something like this: 3, 0, 6, 0, 0, 2, 1, 4, 1, …
Frequency Distributions for Continuous Data • The difficulty with continuous data, such as observations on the unemployment rate by state, is that there is no natural categories. • Therefore we define our own categories. by marking off some intervals on horizontal unemployment rate axis as picture below. 1.009.00
Frequency Distributions for Continuous Data • If the smallest rate were 1.5%, and the largest was 8.9%, we might use the intervals of width 1% with the first one starting at 1 and the last one ending at 9. • Each data value should fall in exactly one of these intervals.
Histograms • Mark the boundaries of the class intervals on a horizontal axis. • Draw a vertical scale marked with either relative frequencies or frequencies. • The rectangle corresponding to a particular interval is drawn directly above the interval. • The height of each rectangle is then the class frequency or relative frequency.
Examining a Distribution • In any graph of data, look for overall pattern and for striking deviation from that pattern. • You can describe the overall pattern of a histogram by its shape, center, and spread. • An important kind of deviation is an outlier, an individual value that falls outside the overall pattern.
Symmetric & Skewed Distribution • A distribution is symmetric if the right and left sides of the histogram are approximately mirror images of each other. • A distribution is skewed to the right if the right side of the histogram ( containing the half of the observations with larger values) extends much farther out than the left side. • It is skewed to the left if the left side of the histogram extends much farther out than the right side.
Numerical Summary Measures • Describing the center of a data set. • Mean • Median • Describing the variability in a data set. • Variance, standard deviation • Quartiles
The Mean • To find the mean of a set of observations, add their values and divide by the number of observations. If the n observations are , their mean is • In a more compact notation,
The Median • The Median M is the midpoint of a distribution, the number such that half of the observations are smaller and the other half are larger. To find the median of a distribution: • Arrange all observations in order of size, from smallest to largest. • If the number of observations n is odd, the median M is the center observation in the ordered list. • If the number of observations n is even, the median M is the mean of the two center observations in the ordered list.
The Quartiles Q1 and Q3 • To calculate the quartiles: • Arrange the observations in increasing order and locate the median M in the ordered list of observations. • The first quartile Q1 is the median of the observations whose position in the ordered list is to the left of the location of the overall median. • The third quartile Q3 is the median of the observations whose position in the ordered list is to the right of the location of the overall median.
The Five Number Summary and Box-Plot • The five number summary of a distribution consists of the smallest observation, the first quartile, the median, the third quartile, and the largest observation, written in order from smallest to largest. In symbols, the five number summary is Minimum Q1 M Q3 Maximum
The Five Number Summary and Box-Plot • A box-plot is a graph of the five number Summary. • A central box spans the quartiles. • A line in the box marks the median. • Lines extend from the box out to the smallest and largest observations. • Box-plots are most useful for side-by-side comparison of several distributions.
The Standard Deviation s • The Variance s2 of a set of observations is the average of the squares of the deviations of the observations from their mean. In symbols, the variance of n observations is or, more compactly,
The Standard Deviation s • The standard deviation s is the square root of the variance s2:
Choosing a Summary • The five number summary is usually better than the mean and standard deviation for describing a skewed distribution or a distribution with extreme outliers. Use , and s only for reasonably symmetric distributions that are free of outliers.
Strategies for Exploring Data • Plot the data • Make a graph, usually a histogram or a stem-plot. • Look at the distribution of the variable for: • overall pattern (shape, center, spread). • striking deviations such as outliers. • Calculate a numerical summary to briefly describe center and spread. • Describe the overall pattern with a smooth curve.
Density Curves • Sometimes the overall pattern (the distribution of the variable) of a large number of observations is so regular that we can describe it by a smooth curve, called Density curve. • The curve is a mathematical model for the distribution.
Density Curve • Histogram of the city gas mileage (miles per gallon) of 856, 2001 model year motor vehicle. • The smooth curve, density curve, shows the overall shape of the distribution.
Density Curve • The proportion of cars with gas mileage less than 20 from the histogram is
Density Curve • The proportion of cars with gas mileage less than 20 from the density curve is .410 • The area under the density curve gives a good approximation of areas given by histogram.
Density Curve • A density curve is a curve that • Is always on or above the horizontal axis. • Has area exactly 1 underneath it. • A density curve describes the overall pattern of a distribution. • The area under the curve and above any range of values is the proportion of all observations that fall in that range.
Median and mean of a Density Curve • The median of a density curve is the point that divides the area under the curve in Half.
Median and Mean of a Density Curve • The mean of a density curve is the balance point, at which the curve would balance if made of solid material.