1 / 54

1.2 Displaying and Describing Categorical & Quantitative Data

1.2 Displaying and Describing Categorical & Quantitative Data. You should be able to:. Recognize when a variable is categorical or quantitative Choose an appropriate display for a categorical variable and a quantitative variable

Download Presentation

1.2 Displaying and Describing Categorical & Quantitative Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 1.2 Displaying and Describing Categorical & Quantitative Data

  2. You should be able to: • Recognize when a variable is categorical or quantitative • Choose an appropriate display for a categorical variable and a quantitative variable • Summarize the distribution with a bar, pie chart, stem-leaf plot, histogram, dot plot, box plots • Know how to make a contingency table • Describe the distribution of categorical variables in terms of relative frequencies • Be able to describe the distribution of quantitative variables in terms of its shape, center and spread • Describe abnormalities or extraordinary features of distribution • Discuss outliers and how they deviate from the overall pattern

  3. 3 RULES of EXPLORATORY DATA ANALYSIS • MAKE A PICTURE – find patterns in difficult to see from a chart • MAKE A PICTURE – show important features in graph • MAKE A PICTURE- communicates your data to others

  4. Concepts to know! • Bar graph • Histogram • Dot plot • Stem leaf plot • Scatterplots • Boxplots

  5. Which graph to use? • Depends on type of data • Depends on what you want to illustrate

  6. Categorical Data • The objects being studied are grouped into categories based on some qualitative trait. • The resulting data are merely labels or categories.

  7. Categorical Data(Single Variable)

  8. Pie Chart(Data is Counts or Percentages)

  9. Bar Graph(Shows distribution of data)

  10. Bar Graph • Summarizes categorical data. • Horizontal axis represents categories, while vertical axis represents either counts(“frequencies”) or percentages (“relative frequencies”). • Used to illustrate the differences in percentages (or counts) between categories.

  11. Contingency Table(How data is distributed across multiple variables)

  12. What can go wrong when working with categorical data? • Pay attention to the variables and what the percentages represent (9.4% of passengers who were in first class survived is different from 67% of survivors were first class passengers!!!) • Make sure you have a reasonably large data set (67% of the rats tested died and 1 lived)

  13. Analogy Bar chart is to categorical data as histogram is to ... quantitative data.

  14. Histogram

  15. Histogram • Divide measurement up into equal-sized categories (BIN WIDTH) • Determine number (or percentage) of measurements falling into each category. • Draw a bar for each category so bars’ heights represent number (or percent) falling into the categories. • Label and title appropriately. • http://www.stat.sc.edu/~west/javahtml/Histogram.html

  16. Use common sense in determining number of categories to use. Between 6 & 15 intervals is preferable Histogram (Trial-and-error works fine, too.)

  17. Too few categories

  18. Too many categories

  19. Dot Plot

  20. Dot Plot • Summarizes quantitative data. • Horizontal axis represents measurement scale. • Plot one dot for each data point.

  21. Stem-and-Leaf Plot Stem-and-leaf of Shoes N = 139 Leaf Unit = 1.0 12 0 223334444444 63 0 555555555555566666666677777778888888888888999999999 (33) 1 000000000000011112222233333333444 43 1 555555556667777888 25 2 0000000000023 12 2 5557 8 3 0023 4 3 4 4 00 2 4 2 5 0 1 5 1 6 1 6 1 7 1 7 5

  22. Stem-and-Leaf Plot • Summarizes quantitative data. • Each data point is broken down into a “stem” and a “leaf.” • First, “stems” are aligned in a column. • Then, “leaves” are attached to the stems.

  23. Box Plot

  24. Box Plot • Summarizes quantitative data. • Vertical (or horizontal) axis represents measurement scale. • Lines in box represent the 25th percentile (“first quartile”), the 50th percentile (“median”), and the 75th percentile (“third quartile”), respectively.

  25. 5 Number Summary • Minimum • Q1 (25th percentile) • Median (50th percentile) • Q3 (75th percentile) • Maximum

  26. An aside... • Roughly speaking: • The “25th percentile” is the number such that 25% of the data points fall below the number. • The “median” or “50th percentile” is the number such that half of the data points fall below the number. • The “75th percentile” is the number such that 75% of the data points fall below the number.

  27. Box Plot (cont’d) • “Outliers” are drawn to the most extreme data points that are not more than 1.5 times the length of the box beyond either quartile(IQR). IQR = Q3 - Q1 Outliers(upper) > Q3+1.5 * IQR Outliers(lower)<Q1-1.5*IQR

  28. Using Box Plots to Compare Outliers

  29. Strengths and Weaknesses of Graphs for Quantitative Data • Histograms • Uses intervals • Good to judge the “shape” of a data • Not good for small data sets • Stem-Leaf Plots • Good for sorting data (find the median) • Not good for large data sets

  30. Strengths and Weaknesses of Graphs for Quantitative Data • Dotplots • Uses individual data points • Good to show general descriptions of center and variation • Not good for judging shape for large data sets • Boxplots • Good for showing exact look at center, spread and outliers • Not good for judging shape

  31. Analogy Contingency table is to categorical data with two variables as scatterplot is to .. quantitative data with two variables.

  32. Scatter Plots

  33. Scatter Plots • Summarizes the relationship between two quantitative variables. • Horizontal axis represents one variable and vertical axis represents second variable. • Plot one point for each pair of measurements.

  34. No relationship

  35. Summary • Many possible types of graphs. • Use common sense in reading graphs. • When creating graphs, don’t summarize your data too much or too little. • When creating graphs, label everything for others. Remember you are trying to communicate something to others!

  36. GRAPHICAL ANALYSIS

  37. CENTER • SPREAD

  38. Graphical Analysis Center (Location) Spread (Variation) Shape

  39. Interesting Features Identified by Graphs • Center (Location) • Spread (Variability) • Shape • Individual Values • Compare Groups • Identify Outliers • LOOK FOR PATTERNS, CLUSTERS, GAPS!!!!! • LOOK FOR DEVIATIONS FROM THE GENERAL PATTERN!!!!! http://bcs.whfreeman.com/bps3e/

  40. Shape of Graph

  41. Shape of Graph • Symmetry-Skewness • Modes(peaks)

  42. SYMMETRY & SKEWNESS • Skewness - a measure of symmetry, or more precisely, the lack of symmetry. • A distribution, or data set, is symmetric if it looks the same to the left and right of the center point.

  43. Kurtosis • Kurtosis - a measure of whether the data are peaked or flat relative to a normal distribution. • High kurtosis tend to have a distinct peak near the mean, decline rather rapidly, and have heavy tails. L • Low kurtosis tend to have a flat top near the mean rather than a sharp peak.

  44. Symmetric -- Mesokurtic

  45. Symmetric -- Platykurtic

  46. Symmetric -- Leptokurtic

  47. Nonsymmetric –Skewed Positive -Skewed Right

  48. Nonsymmetric – Skewed Negative -- Skewed Left

  49. Symmetric -- Bimodal

  50. Describing Distributions – SOCS Rock! • Shape • Outliers • Center • Spread

More Related