1 / 66

Graphs, Charts, and Tables – Describing Your Data

Graphs, Charts, and Tables – Describing Your Data. DNA/JKA. Goals. After completing this session, you should be able to: Construct a frequency distribution both manually and with a computer Construct and interpret a histogram

kasia
Download Presentation

Graphs, Charts, and Tables – Describing Your Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Graphs, Charts, and Tables – Describing Your Data DNA/JKA

  2. Goals After completing this session, you should be able to: • Construct a frequency distribution both manually and with a computer • Construct and interpret a histogram • Create and interpret bar charts, pie charts, box and whiskers and stem-and-leaf diagrams • Present and interpret data in line charts and scatter diagrams

  3. The Focus • Describe data using frequency distribution and relative frequency distribution. • Discrete • Continuous • Present data using a chart • Universal and popular way: Histogram • Bar chart • Box and Whiskers Plot • Stem and Leaf Plot

  4. Variable (Data) Types

  5. Detail View Data Quantitative Data Qualitative Data Graphic Methods Tabular Methods Graphic Methods Tabular Methods • Frequency Distr. 2) Relative/Percent Frequency Distr. • 3) Cross tabulation 1) Dot Plot 2) Histogram 3) Ogive 4) Stem & Leaf Display 5) Scatter Diagram Charts: 1. Column 2. Pie 1) Frequency Distr. 2) Relative/Percent Frequency Distr. 3) Cumulative Frequency Distr. 4) Cumulative Relative/Percent Frequency Distr. 5) Cross tabulation

  6. Frequency Distribution (FD) • It is a tabulation of the values.. • Each entry in the table contains the frequency or count of the occurrences of values within a particular group or interval, • and in this way the table summarizes the distribution of values in the sample.

  7. Why Use FD? • A frequency distribution is a way to summarize data • The distribution condenses the raw data into a more useful form... • and allows for a quick visual interpretation of the data

  8. Frequency Distribution: Discrete Data • Discrete data: possible values are countable Example:A researcher asks 200 diabetic patients how many days per week they take the daily dosage.

  9. Relative Frequency Relative Frequency: What proportion (%) is in each category? 22% of the people in the sample report that they take the daily dosage 0 days per week

  10. Frequency Distribution: Continuous Data • Continuous Data:uncountable…..may take on any value in some interval Example: A manufacturer of insulation randomly selects 20 winter days and records the daily high temperature • (Temperature is a continuous variablebecause it could be measured to any degree of precision desired – 98.58697 F) 24, 35, 17, 21, 24, 37, 26, 46, 58, 30, 32, 13, 12, 38, 41, 43, 44, 27, 53, 27

  11. Grouping Data by Classes Sort raw data from low to high (easy using Excel): 12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58 • Find range: 58 (Max) – 12 (Min) = 46 (use for class width) • Determine number of classes: • Rule of thumb: between 5 and 20; taken as k • Calculation of class: follow 2^k≥ n (remember n=20) • Two to the power of four and five (Use calculator: 2^4=16and 2^5=32). Then, take 5. • Thus, there should be 5 classes.

  12. Grouping Data by Classes • Compute class width: 10 (46/5 = 9.2 then round off 10) • Determine intervals:10, 20, 30, 40, 50 • (Sometimes class midpoints are reported: 15, 25, 35, 45, 55 – if calculation result is 13.5) • Construct frequency distribution • count number of values in each class Largest Value - Smallest Value W = Number of Classes

  13. Frequency Distribution Data from low to high: 12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58 Frequency Distribution ClassFrequency Relative Frequency 10 but under 19.99 3 .15 20 but under 29.99 6 .30 30 but under 39.99 5 .25 40 but under 49.99 4 .20 50 but under 59.99 2 .10 Total 20 1.00

  14. Histogram based on FD Data in ordered array: 12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58 No gaps between bars, since continuous data 0 10 20 30 40 50 60 Class Endpoints Class Midpoints

  15. Histogram • The classesorintervals are shown on the horizontal axis • frequency is measured on the vertical axis • Bars of the appropriate heights can be used to represent the number of observations within each class • Such a graph is called a histogram

  16. How Many Class Intervals? • Many (Narrow class intervals) • may yield a very jagged distribution with gaps from empty classes • Can give a poor indication of how frequency varies across classes • Few (Wide class intervals) • may compress variation too much and yield a blocky distribution • can obscure important patterns of variation. (X axis labels are upper class endpoints)

  17. General Guidelines • Number of Data Points Number of Classes under 50 5 - 7 50 – 100 6 - 10 100 – 250 7 - 12 over 250 10 - 20

  18. Ogives • An Ogive is a graph of the cumulative relative frequencies from a relative frequency distribution • Ogivesare sometime shown in the same graph as a relative frequency histogram

  19. Ogives (continued) 12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58 Add a cumulative relative frequency column: Frequency Distribution Cumulative Relative Frequency Relative Frequency ClassFrequency 10 but under 20 3 .15 .15 20 but under 30 6 .30 .45 30 but under 40 5 .25 .70 40 but under 50 4 .20 .90 50 but under 60 2 .10 1.00 Total 20 1.00

  20. Ogive Example / Ogive 100 80 60 40 20 0 Cumulative Frequency (%) 0 10 20 30 40 50 60 Class Endpoints Class Midpoints

  21. Other GraphicalPresentation Tools ** Try the rest of them by yourself ** Qualitative(Categorical) Data Quantitative(Numerical)Data Bar Chart Pie Charts Stem and Leaf Diagram

  22. Bar and Pie Charts • Bar charts and Pie charts are often used for qualitative (category) data • Height of bar or size of pie slice shows the frequency or percentage for each category

  23. Bar Chart Example 1 (Note that bar charts can also be displayed with vertical bars)

  24. Bar Chart Example 2

  25. Pie Chart Example Current Investment Portfolio Investment Amount Percentage Type(in thousands $) Stocks 46.5 42.27 Bonds 32.0 29.09 CD 15.5 14.09 Savings 16.0 14.55 Total110 100 Savings 15% Stocks 42% CD 14% Bonds 29% Percentages are rounded to the nearest percent (Variables are Qualitative)

  26. Tabulating and Graphing Multivariate Categorical Data • Investment in thousands of dollars Investment Investor A Investor BInvestor CTotal Category Stocks 46.5 55 27.5 129 Bonds 32.0 44 19.0 95 CD 15.5 20 13.5 49 Savings 16.0 28 7.0 51 Total 110.0 147 67.0 324

  27. Tabulating and Graphing Multivariate Categorical Data (continued) • Side by side charts

  28. Side-by-Side Chart Example • Sales by quarter for three sales territories:

  29. Stem and Leaf Diagram • A simple way to see distribution details from qualitative data METHOD • Separate the sorted data series into leading digits (the stem) and the trailing digits (theleaves) • List all stems in a column from low to high • For each stem, list all associated leaves

  30. Example: Data sorted from low to high: 12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58 • Here, use the 10’s digit for the stem unit: Stem Leaf 1 2 3 5 • 12 is shown as • 35 is shown as

  31. Example: Data in ordered array: 12, 13, 17, 21, 24, 24, 26, 27, 28, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58 • Completed Stem-and-leaf diagram:

  32. Using other stem units • Using the 100’s digit as the stem: • Round off the 10’s digit to form the leaves • 613 would become 6 1 • 776 would become 7 8 • . . . • 1224 becomes 12 2 Stem Leaf

  33. Stem and Leaf Diagrams How to Draw One: 1. Put the first digits of each piece of data in numerical order down the left-hand side 2. Go through each piece of data in turn and put the remaining digits in the proper row 3. Re-draw the diagram putting the pieces of data in the right order 4. Add a key

  34. Stem and Leaf Diagrams Here are the marks gained by 30 students in an examination: 63 58 61 52 59 65 69 75 70 54 57 63 76 81 64 68 59 40 65 74 80 44 47 53 70 81 68 49 57 61 Write the tens figures in the left hand column of a diagram. These are the ‘STEMS’ Key a|b = ab 4 5 6 7 8

  35. Stem and Leaf Diagrams Here are the marks gained by 30 students in an examination: 63 58 61 52 59 65 69 75 70 54 57 63 76 81 64 68 59 40 65 74 80 44 47 53 70 81 68 49 57 61 Go through the marks in turn and put in the units figures of each mark in the proper row. These are the ‘LEAVES’ 8 1 3 4 5 6 7 8

  36. Stem and Leaf Diagrams Here are the marks gained by 30 students in an examination: 63 58 61 52 59 65 69 75 70 54 57 63 76 81 64 68 59 40 65 74 80 44 47 53 70 81 68 49 57 61 0 4 7 9 2 9 4 7 9 3 7 When all the marks are entered the diagram will look like this: 5 9 3 4 8 5 8 1 5 0 6 4 0 8 1 0 1 1 3 4 5 6 7 8

  37. Stem and Leaf Diagrams Here are the marks gained by 30 students in an examination: 63 58 61 52 59 65 69 75 70 54 57 63 76 81 64 68 59 40 65 74 80 44 47 53 70 81 68 49 57 61 0 4 7 9 2 3 4 7 7 8 9 9 1 1 3 3 4 5 5 8 8 9 Rewrite the diagram so that the units figures in each row are in order: 0 0 4 5 6 0 1 1 4 5 6 7 8

  38. Stem and Leaf Diagrams Here are the marks gained by 30 students in an examination: 63 58 61 52 59 65 69 75 70 54 57 63 76 81 64 68 59 40 65 74 80 44 47 53 70 81 68 49 57 61 0 4 7 9 2 3 4 7 7 8 9 9 1 1 3 3 4 5 5 8 8 9 Add a KEY: 0 0 4 5 6 0 1 1 4 5 6 7 8

  39. Stem and Leaf Diagrams Remember: - Always put in a Key - Always put your data in Order Median: - to work out the median, you must find the middle value - if there are two middle values, you need the average Range: - to work out the Range, subtract the smallest number from the biggest

  40. The stem & leaf diagram below shows the masses in kg of some people in a lift. • How many people were weighed? • What is the range of the masses? • Find the median mass. units Stem Leaf tens 4 Median is the mean of the 8th and 9th data values. 1 3 3 3 6 4 8 4 3 0 5 2 1 6 7 2 2 7 6 1 (c) 56 kg 8 Stem and Leaf Diagrams (a) 16 people. (b) 86 – 31 = 55 kg

  41. Outliers • An outlier is an observation which does not appear to belong with the other data • Outliers can arise because of a measurement or recording error or because of equipment failure during an experiment, etc. • An outlier might be indicative of a sub-population, e.g. an abnormally low or high value in a medical test could indicate presence of an illness in the patient.

  42. Outlier Boxplot • Re-define the upper and lower limits of the boxplots (the whisker lines) as: Lower limit = Q1-1.5IQR, and Upper limit = Q3+1.5IQR • Note that the lines may not go as far as these limits • If a data point is < lower limit or > upper limit, the data point is considered to be an outlier.

  43. Example – CK data outliers

  44. Hints On Box Plot The Five-Number Summary and Boxplots

  45. Objectives • Compute the five-number summary • Draw and interpret boxplots

  46. Vocabulary • Five-number Summary – the minimum data value, Q1, median, Q3 and the maximum data value

  47. Five-number summary Min Q1 M Q3 Max smallest value largest value First, Second and Third Quartiles (Second Quartile is the Median, M) LowerFence UpperFence Boxplot ] [ * Smallest Data Value > Lower Fence Largest Data Value < Upper Fence (Min unless min is an outlier) (Max unless max is an outlier) Outlier

  48. Distribution Shape Based on Boxplots: • If the median is near the center of the box and each horizontal line is of approximately equal length, then the distribution is roughly symmetric • If the median is to the left of the center of the box or the right line is substantially longer than the left line, then the distribution is skewed right • If the median is to the right of the center of the box or the left line is substantially longer than the right line, then the distribution is skewed left

  49. Why Use a Boxplot? • A boxplot provides an alternative to a histogram, a dotplot, and a stem-and-leaf plot. Among the advantages of a boxplot over a histogram are ease of construction and convenient handling of outliers. In addition, the construction of a boxplot does not involve subjective judgements, as does a histogram. That is, two individuals will construct the same boxplot for a given set of data - which is not necessarily true of a histogram, because the number of classes and the class endpoints must be chosen. On the other hand, the boxplot lacks the details the histogram provides. • Dotplots and stemplots retain the identity of the individual observations; a boxplot does not. Many sets of data are more suitable for display as boxplots than as a stemplot. A boxplot as well as a stemplot are useful for making side-by-side comparisons.

  50. Example 1 Consumer Reports did a study of ice cream bars (sigh, only vanilla flavored) in their August 1989 issue. Twenty-seven bars having a taste-test rating of at least “fair” were listed, and calories per bar was included. Calories vary quite a bit partly because bars are not of uniform size. Just how many calories should an ice cream bar contain? Construct a boxplot for the data above.

More Related