BCOR 1020Business Statistics Lecture 3 – January 24, 2008
Overview • Chapter 3 – Describing Data Visually… • Visual Description • Dot Plots • Frequency Distribution and Histograms • Simple Line Charts & Bar Charts • Scatter Plots • Tables • Pie Charts • Maps and Pictograms • Deceptive Graphs
Chapter 3 – Visual Description Methods of organizing, exploring and summarizing data include: • Visual (charts and graphs) – provides insight into characteristics of a data set without using mathematics. • Numerical (statistics or tables) – provides insight into characteristics of a data set using mathematics.
Chapter 3 – Visual Description Beginning with univariate data (a set of n observations on one variable), consider the following: • Measurement – What are the units of measurement? Are the data integer or continuous? Any missing observations? Any concerns with accuracy or sampling methods? • Central Tendency – Where are the data values concentrated? What seem to be typical or middle data values? • Dispersion – How much variation is there in the data? How spread out are the data values? Are there unusual values? • Shape – Are the data values distributed symmetrically? Skewed? Sharply peaked? Flat? Bimodal?
Chapter 3 – Visual Description • Example: Price/Earnings Ratios: • P/E ratios are current stock price divided by earnings per share in the last 12 months. For example:
Chapter 3 – Visual Description Measurement – Look at the data and visualize how it was collected and measured. Sorting – Sort the data and then summarize in a graphical display. • Here are the sorted P/E ratios: • Sorting allows you to observe central tendency, dispersion and shape as well as minimum, maximum and range.
Chapter 3 – Dot Plots A dot plot is the simplest graphical display of n individual values of numerical data. • Easy to understand. • Not good for large samples (e.g., > 5,000). Steps in Making a Dot Plot: • Make a scale that covers the data range • Mark the axes and label them • Plot each data value as a dot above the scale at its approximate location. If more than one data value lies at about the same axis location, the dots are piled up vertically. * Figure 3.4 in your text details the MegaStat menus for creating a dotplot.
Chapter 3 – Dot Plots • Range of data shows dispersion. • Clustering shows central tendency. • Dot plots do not tell much of shape of distribution. • Can add annotations (text boxes) to call attention to specific features.
Chapter 3 – Frequency Distributions and Histograms Bins and Bin Limits: • A frequency distribution is a table formed by classifying n data values into k classes (bins). • Bin limits define the values to be included in each bin. Widths must all be the same. • Frequencies are the number of observations within each bin. • Expressas relative frequencies (frequency divided by the total) or percentages (relative frequency times 100).
Chapter 3 – Frequency Distributions and Histograms Constructing a Frequency Distribution: • Sort data in ascending order (e.g., P/E ratios) • Choose the number of bins (k). • k should be much smaller than n. • Too many bins results in sparsely populated bins, too few and dissimilar data values are lumped together. • Herbert Sturges proposes the following rule: k = 1 + log2(n)
Bin width Bin width Chapter 3 – Frequency Distributions and Histograms Constructing a Frequency Distribution: • Set the bin limits: In our example, we will use k = 7 bins to get convenient bin limits. The approximate bin width is: To obtain “nice” limits, we round the width to 10 and start the first bin at 0 to get bin limits: 0, 10, 20, 30, 40, 50, 60, 70
Chapter 3 – Frequency Distributions and Histograms Constructing a Frequency Distribution: • Put the data values in the appropriate bin. • In general, the lower limit is included in the bin while the upper limit is excluded. • Create the table, you can include: • Frequencies – counts for each bin • Relative frequencies – absolute frequency divided by total number of data values. • Cumulative frequencies – accumulated relative frequency values as bin limits increase. Example: Back to the P/E ratio data…
Chapter 3 – Frequency Distributions and Histograms What are the bin limits for the P/E ratio data?
Chapter 3 – Frequency Distributions and Histograms Histograms: • A histogram is a graphical representation of a frequency distribution. • A histogram is a bar chart. • X-axis ticks shows end points of each bin. • Y-axis shows frequency (or relative/cumulative frequency) within each bin. Consider 3 histograms for the P/E ratio data with different bin widths. Do they give you different impressions of the data? k = 4 k = 7 k = 13 * Figures 3.8 & 3.9 in your text details the MegaStat menus for creating a histogram.
Chapter 3 – Frequency Distributions and Histograms Modal Class – a histogram bar that is higher than those on either side: • Monomodal– a single modal class. • Bimodal – two modal classes. • Multimodal – more than two modal classes. Caution: Modal classes may be artifacts of the way bin limits are chosen.
Chapter 3 – Frequency Distributions and Histograms Shape: • A histogram suggests the shape of the population. • Skewness – indicated by the direction of the longer tail of the histogram. • Left-skewed – (negatively skewed) a longer left tail. • Right-skewed – (positively skewed) a longer right tail. • Symmetric – both tail areas approximately the same. Some examples…
Clickers Consider the histogram of the P/E ratio data that was displayed earlier in this lecture. How would you describe the skewness of this histogram? A = symmetric B = left-skewed C = right-skewed
Chapter 3 – Simple Line Charts Simple Line Charts – Used to display a time series or spot trends, or to compare time periods. • Can display several variables at once.
Chapter 3 – Simple Line Charts Two-scale line chart–used to compare variables that differ in magnitude or are measured in different units. Grid Lines – A line graph usually has no vertical grid lines. Horizontal lines can be added to make it easier to establish the y value. Which is easier to read?
Chapter 3 – Simple Line Charts Log Scales: • Arithmetic scale – distances on the Y-axis are proportional to the magnitude of the variable being displayed. • Logarithmic scale – (ratio scale) equal distances represent equal ratios. • Use a log scale for the vertical axis when data vary over a wide range, say, by more than an order of magnitude. This will reveal more detail for small data values. • Log scale is only suited for positive data values. • Reveals whether the quantity is growing at an increasing percent (concave upward), constant percent(straight line), or declining percent (concave downward)
Example… Consider the following graphs illustrating U.S. Trade from 1959 to 2002. What does the log scale graph tell you about growth rate for both series? Log scale Arithmetic scale
Chapter 3 – Simple Line Charts When to Use Log Scales: • Useful for… • time series data that might be expected to grow at a compound annual percentage rate (e.g., GDP, national debt, future income) • financial charts that cover long periods of time-data that grow rapidly (e.g., revenues)
Chapter 3 – Simple Line Charts Tips for Effective Line Charts: • Line charts are used for time series data (never for cross-sectional data). • Y-axis shows numerical variable while X-axis shows time units with time increasing left to right. • Use a zero origin on the Y-axis unless more detail is needed. • Omit numerical labels on a line chart to avoid clutter. Use gridlines if needed. • Use data markers (squares, triangles, circles) if they don’t clutter the graph. • Don’t make lines too thick.
Chapter 3 – Bar Charts Plain Bar Charts –Most common way to display attribute data. • Bars represent categories or attributes. • Lengths of bars represent frequencies.
Chapter 3 – Bar Charts Pareto Charts – Special type of bar chart used in quality management to display the frequency of defects or errors of different types. • Categories are displayed in descending order of frequency. • Focus on significant few (i.e., few categories that account for most defects or errors).
Chapter 3 – Bar Charts Stacked Bar Chart – Bar height is the sum of several subtotals. Areas may be compared by color to show patterns in the subgroups and total.
Chapter 3 – Bar Charts Bar Charts for Time Series Data – Bar charts can be (and often are) used for time series data although it may be harder to compare trends.
Chapter 3 – Bar Charts Tips for Effective Bar Charts: • Show the numerical variable of interest with vertical bars on the Y-axis, category labels on the X-axis. • For time series quantities, display the category labels on the horizontal X-axis with time increasing from left to right. • The height or length of each bar should be proportional to the quantity displayed. • Put numerical values at the top of each bar, except if too cluttered.
Chapter 3 – Scatter Plots Example: Aircraft Fuel Consumption: • Consider five observations on flight time and fuel consumption for a twin-engine Piper Cheyenne aircraft. • A causal relationship is assumed since a longer flight would consume more fuel.
Chapter 3 – Scatter Plots • Example: Aircraft Fuel Consumption: • Here is the scatter plot with flight time on the X-axis and fuel use on the Y-axis. • Is there an association between variables? * Figure 3.31 in your text details the Excel menus for creating a scatter plot.
Very strong association Strong association Moderate association Little or no association Chapter 3 – Scatter Plots Degree of Association/Correlation:
Clickers Consider the scatter plot (below) comparing birthrates and life expectancies in several countries. True or False: This graph shows a strong association between these two variables. A = True B = False
Chapter 3 – Tables Tables are the simplest form of data display. A compound table is a table that contains time series data down the columns and variables across the rows. Example: School Expenditures • Arrangement of data is in rows and columns to enhance meaning. • The data can be viewed by focusing on the time pattern (down the columns) or by comparing the variables (across the rows). • Units of measure are stated in the footnote. • Note merged headings to group columns.
Chapter 3 – Tables Tips for Effective Tables: • Keep the table simple, consistent with its purpose. • Summary tables go in the main body. • Detailed tables go in an appendix. • In a slide show, main point of table should be clear within 10 seconds, otherwise, break up table. • Display the data to be compared in columns. • Round off data to 3 or 4 significant figures. • Table layout should guide the eye towards the desired comparison. • Use spaces or shading to separate rows or columns. • Use lines sparingly. • Keep row and column headings simple yet descriptive. • Use a consistent number of decimal digits within a column. • Right-justify or decimal align the data.
Chapter 3 – Pie Charts An Oft-Abused Chart: • A pie chart can only convey a general idea of the data. • Pie charts should be used to portray data which sum to a total (e.g., percent market shares). • If frequency counts are important, use a bar chart or histogram. • A pie chart should only have a few (i.e., 2 or 3) slices. • Each slice should be labeled with data values or percents.
Chapter 3 – Maps and Pictograms Spatial Variation and GIS: • Maps can be used for displaying many kinds of data. • Appropriate when patterns of variation across space are of interest. • Self-explanatory and revealing. • Assess patterns based on geography. • GIS (geographic information systems) combines statistics, geography and graphics.
Chapter 3 – Maps and Pictograms Example:U.S. population change by county, 1990/2000
Chapter 3 – Maps and Pictograms Example:U.S. presidential election results, 2004 • On election night 2004 and in the months and years since then, we have seen many maps that look like this. • The amount of red on the map is skewed because there are a lot of large states (geographically) in which a majority voted Republican. One possible way to allow for this, suggested by Robert Vanderbei at Princeton University, is to use not just two colors on the map, red and blue, but instead to use red, blue, and shades of purple to indicate percentages of voters. Here is what the normal map looks like if you do this. Source: http://www-personal.umich.edu/~mejn/election/
Chapter 3 – Maps and Pictograms Example:U.S. presidential election results, 2004 We can also correct for this by making use of a cartogram, a map in which the sizes of states have been rescaled according to their population. That is, states are drawn with a size proportional not to their sheer topographic acreage -- which has little to do with politics -- but to the number of their inhabitants, states with more people appearing larger than states with fewer, regardless of their actual area on the ground. Source: http://www-personal.umich.edu/~mejn/election/
Chapter 3 – Maps and Pictograms Pictograms – A visual display in which data values are replaced by pictures. • Although entertaining, they can create visual distortion. What do you think?
Chapter 3 – Deceptive Graphs Error 1: Nonzero Origin • A nonzero origin will exaggerate the trend.
Chapter 3 – Deceptive Graphs Error 2: Elastic Graph Proportions • Keep the aspect ratio (width/height) below 2.00 so as not to exaggerate the graph. By default, Excel uses an aspect ratio of 1.8.
Chapter 3 – Deceptive Graphs Error 3: Dramatic Title • Keep short and grab readers attention. Error 4: Distracting Pictures • Avoid so as not to distract readers or impart an emotional slant. Error 5: Authority Figures • Can use pictures of authority figures to impart credibility to self-serving commercial claims.
Chapter 3 – Deceptive Graphs Error 6: 3-D and Rotated Graphs • Can make trends appear to dwindle into the distance or loom towards you. Correct Deceptive
Chapter 3 – Deceptive Graphs Error 7: Missing Axis Demarcations • If tick marks are missing, you cannot identify individual data values. Error 8: Missing Measurement Units or Definitions • Missing or unclear units of measurement can render a chart useless. Error 9: Vague Source • May indicate lost citation, unknown source, or mixed data sources. Use complete source citations.
Chapter 3 – Deceptive Graphs Error 10: Complex Graphs • Avoid if possible. Keep your main objective in mind. If necessary, break graph into smaller parts.
Chapter 3 – Deceptive Graphs Error 11: Gratuitous Effects • Avoid too many annoying special effects when using slide shows. Error 12: Estimated Data • Estimated points should be noted when used or avoided if possible.
Chapter 3 – Deceptive Graphs Error 13: Area Trick • As figure height increases, so does width, distorting the area.
Clickers Consider the graph given below. What error is present that makes this a deceptive graph? A = Non-Zero Origin B = Dramatic Title C = 3-D or Rotated D = Complex Graph