Graphs, Charts, and Tables – Describing Your Data

175 Views

Download Presentation
## Graphs, Charts, and Tables – Describing Your Data

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Goals**After completing this session, you should be able to: • Construct a frequency distribution both manually and with a computer • Construct and interpret a histogram • Create and interpret bar charts, pie charts, box and whiskers and stem-and-leaf diagrams • Present and interpret data in line charts and scatter diagrams**The Focus**• Describe data using frequency distribution and relative frequency distribution. • Discrete • Continuous • Present data using a chart • Universal and popular way: Histogram • Bar chart • Box and Whiskers Plot • Stem and Leaf Plot**Detail View**Data Quantitative Data Qualitative Data Graphic Methods Tabular Methods Graphic Methods Tabular Methods • Frequency Distr. 2) Relative/Percent Frequency Distr. • 3) Cross tabulation 1) Dot Plot 2) Histogram 3) Ogive 4) Stem & Leaf Display 5) Scatter Diagram Charts: 1. Column 2. Pie 1) Frequency Distr. 2) Relative/Percent Frequency Distr. 3) Cumulative Frequency Distr. 4) Cumulative Relative/Percent Frequency Distr. 5) Cross tabulation**Frequency Distribution (FD)**• It is a tabulation of the values.. • Each entry in the table contains the frequency or count of the occurrences of values within a particular group or interval, • and in this way the table summarizes the distribution of values in the sample.**Why Use FD?**• A frequency distribution is a way to summarize data • The distribution condenses the raw data into a more useful form... • and allows for a quick visual interpretation of the data**Frequency Distribution: Discrete Data**• Discrete data: possible values are countable Example:A researcher asks 200 diabetic patients how many days per week they take the daily dosage.**Relative Frequency**Relative Frequency: What proportion (%) is in each category? 22% of the people in the sample report that they take the daily dosage 0 days per week**Frequency Distribution: Continuous Data**• Continuous Data:uncountable…..may take on any value in some interval Example: A manufacturer of insulation randomly selects 20 winter days and records the daily high temperature • (Temperature is a continuous variablebecause it could be measured to any degree of precision desired – 98.58697 F) 24, 35, 17, 21, 24, 37, 26, 46, 58, 30, 32, 13, 12, 38, 41, 43, 44, 27, 53, 27**Grouping Data by Classes**Sort raw data from low to high (easy using Excel): 12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58 • Find range: 58 (Max) – 12 (Min) = 46 (use for class width) • Determine number of classes: • Rule of thumb: between 5 and 20; taken as k • Calculation of class: follow 2^k≥ n (remember n=20) • Two to the power of four and five (Use calculator: 2^4=16and 2^5=32). Then, take 5. • Thus, there should be 5 classes.**Grouping Data by Classes**• Compute class width: 10 (46/5 = 9.2 then round off 10) • Determine intervals:10, 20, 30, 40, 50 • (Sometimes class midpoints are reported: 15, 25, 35, 45, 55 – if calculation result is 13.5) • Construct frequency distribution • count number of values in each class Largest Value - Smallest Value W = Number of Classes**Frequency Distribution**Data from low to high: 12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58 Frequency Distribution ClassFrequency Relative Frequency 10 but under 19.99 3 .15 20 but under 29.99 6 .30 30 but under 39.99 5 .25 40 but under 49.99 4 .20 50 but under 59.99 2 .10 Total 20 1.00**Histogram based on FD**Data in ordered array: 12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58 No gaps between bars, since continuous data 0 10 20 30 40 50 60 Class Endpoints Class Midpoints**Histogram**• The classesorintervals are shown on the horizontal axis • frequency is measured on the vertical axis • Bars of the appropriate heights can be used to represent the number of observations within each class • Such a graph is called a histogram**How Many Class Intervals?**• Many (Narrow class intervals) • may yield a very jagged distribution with gaps from empty classes • Can give a poor indication of how frequency varies across classes • Few (Wide class intervals) • may compress variation too much and yield a blocky distribution • can obscure important patterns of variation. (X axis labels are upper class endpoints)**General Guidelines**• Number of Data Points Number of Classes under 50 5 - 7 50 – 100 6 - 10 100 – 250 7 - 12 over 250 10 - 20**Ogives**• An Ogive is a graph of the cumulative relative frequencies from a relative frequency distribution • Ogivesare sometime shown in the same graph as a relative frequency histogram**Ogives**(continued) 12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58 Add a cumulative relative frequency column: Frequency Distribution Cumulative Relative Frequency Relative Frequency ClassFrequency 10 but under 20 3 .15 .15 20 but under 30 6 .30 .45 30 but under 40 5 .25 .70 40 but under 50 4 .20 .90 50 but under 60 2 .10 1.00 Total 20 1.00**Ogive Example**/ Ogive 100 80 60 40 20 0 Cumulative Frequency (%) 0 10 20 30 40 50 60 Class Endpoints Class Midpoints**Other GraphicalPresentation Tools**** Try the rest of them by yourself ** Qualitative(Categorical) Data Quantitative(Numerical)Data Bar Chart Pie Charts Stem and Leaf Diagram**Bar and Pie Charts**• Bar charts and Pie charts are often used for qualitative (category) data • Height of bar or size of pie slice shows the frequency or percentage for each category**Bar Chart Example 1**(Note that bar charts can also be displayed with vertical bars)**Pie Chart Example**Current Investment Portfolio Investment Amount Percentage Type(in thousands $) Stocks 46.5 42.27 Bonds 32.0 29.09 CD 15.5 14.09 Savings 16.0 14.55 Total110 100 Savings 15% Stocks 42% CD 14% Bonds 29% Percentages are rounded to the nearest percent (Variables are Qualitative)**Tabulating and Graphing Multivariate Categorical Data**• Investment in thousands of dollars Investment Investor A Investor BInvestor CTotal Category Stocks 46.5 55 27.5 129 Bonds 32.0 44 19.0 95 CD 15.5 20 13.5 49 Savings 16.0 28 7.0 51 Total 110.0 147 67.0 324**Tabulating and Graphing Multivariate Categorical Data**(continued) • Side by side charts**Side-by-Side Chart Example**• Sales by quarter for three sales territories:**Stem and Leaf Diagram**• A simple way to see distribution details from qualitative data METHOD • Separate the sorted data series into leading digits (the stem) and the trailing digits (theleaves) • List all stems in a column from low to high • For each stem, list all associated leaves**Example:**Data sorted from low to high: 12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58 • Here, use the 10’s digit for the stem unit: Stem Leaf 1 2 3 5 • 12 is shown as • 35 is shown as**Example:**Data in ordered array: 12, 13, 17, 21, 24, 24, 26, 27, 28, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58 • Completed Stem-and-leaf diagram:**Using other stem units**• Using the 100’s digit as the stem: • Round off the 10’s digit to form the leaves • 613 would become 6 1 • 776 would become 7 8 • . . . • 1224 becomes 12 2 Stem Leaf**Stem and Leaf Diagrams**How to Draw One: 1. Put the first digits of each piece of data in numerical order down the left-hand side 2. Go through each piece of data in turn and put the remaining digits in the proper row 3. Re-draw the diagram putting the pieces of data in the right order 4. Add a key**Stem and Leaf Diagrams**Here are the marks gained by 30 students in an examination: 63 58 61 52 59 65 69 75 70 54 57 63 76 81 64 68 59 40 65 74 80 44 47 53 70 81 68 49 57 61 Write the tens figures in the left hand column of a diagram. These are the ‘STEMS’ Key a|b = ab 4 5 6 7 8**Stem and Leaf Diagrams**Here are the marks gained by 30 students in an examination: 63 58 61 52 59 65 69 75 70 54 57 63 76 81 64 68 59 40 65 74 80 44 47 53 70 81 68 49 57 61 Go through the marks in turn and put in the units figures of each mark in the proper row. These are the ‘LEAVES’ 8 1 3 4 5 6 7 8**Stem and Leaf Diagrams**Here are the marks gained by 30 students in an examination: 63 58 61 52 59 65 69 75 70 54 57 63 76 81 64 68 59 40 65 74 80 44 47 53 70 81 68 49 57 61 0 4 7 9 2 9 4 7 9 3 7 When all the marks are entered the diagram will look like this: 5 9 3 4 8 5 8 1 5 0 6 4 0 8 1 0 1 1 3 4 5 6 7 8**Stem and Leaf Diagrams**Here are the marks gained by 30 students in an examination: 63 58 61 52 59 65 69 75 70 54 57 63 76 81 64 68 59 40 65 74 80 44 47 53 70 81 68 49 57 61 0 4 7 9 2 3 4 7 7 8 9 9 1 1 3 3 4 5 5 8 8 9 Rewrite the diagram so that the units figures in each row are in order: 0 0 4 5 6 0 1 1 4 5 6 7 8**Stem and Leaf Diagrams**Here are the marks gained by 30 students in an examination: 63 58 61 52 59 65 69 75 70 54 57 63 76 81 64 68 59 40 65 74 80 44 47 53 70 81 68 49 57 61 0 4 7 9 2 3 4 7 7 8 9 9 1 1 3 3 4 5 5 8 8 9 Add a KEY: 0 0 4 5 6 0 1 1 4 5 6 7 8**Stem and Leaf Diagrams**Remember: - Always put in a Key - Always put your data in Order Median: - to work out the median, you must find the middle value - if there are two middle values, you need the average Range: - to work out the Range, subtract the smallest number from the biggest**The stem & leaf diagram below shows the masses in kg of some**people in a lift. • How many people were weighed? • What is the range of the masses? • Find the median mass. units Stem Leaf tens 4 Median is the mean of the 8th and 9th data values. 1 3 3 3 6 4 8 4 3 0 5 2 1 6 7 2 2 7 6 1 (c) 56 kg 8 Stem and Leaf Diagrams (a) 16 people. (b) 86 – 31 = 55 kg**Outliers**• An outlier is an observation which does not appear to belong with the other data • Outliers can arise because of a measurement or recording error or because of equipment failure during an experiment, etc. • An outlier might be indicative of a sub-population, e.g. an abnormally low or high value in a medical test could indicate presence of an illness in the patient.**Outlier Boxplot**• Re-define the upper and lower limits of the boxplots (the whisker lines) as: Lower limit = Q1-1.5IQR, and Upper limit = Q3+1.5IQR • Note that the lines may not go as far as these limits • If a data point is < lower limit or > upper limit, the data point is considered to be an outlier.**Example – CK data**outliers**Hints On Box Plot**The Five-Number Summary and Boxplots**Objectives**• Compute the five-number summary • Draw and interpret boxplots**Vocabulary**• Five-number Summary – the minimum data value, Q1, median, Q3 and the maximum data value**Five-number summary**Min Q1 M Q3 Max smallest value largest value First, Second and Third Quartiles (Second Quartile is the Median, M) LowerFence UpperFence Boxplot ] [ * Smallest Data Value > Lower Fence Largest Data Value < Upper Fence (Min unless min is an outlier) (Max unless max is an outlier) Outlier**Distribution Shape Based on Boxplots:**• If the median is near the center of the box and each horizontal line is of approximately equal length, then the distribution is roughly symmetric • If the median is to the left of the center of the box or the right line is substantially longer than the left line, then the distribution is skewed right • If the median is to the right of the center of the box or the left line is substantially longer than the right line, then the distribution is skewed left**Why Use a Boxplot?**• A boxplot provides an alternative to a histogram, a dotplot, and a stem-and-leaf plot. Among the advantages of a boxplot over a histogram are ease of construction and convenient handling of outliers. In addition, the construction of a boxplot does not involve subjective judgements, as does a histogram. That is, two individuals will construct the same boxplot for a given set of data - which is not necessarily true of a histogram, because the number of classes and the class endpoints must be chosen. On the other hand, the boxplot lacks the details the histogram provides. • Dotplots and stemplots retain the identity of the individual observations; a boxplot does not. Many sets of data are more suitable for display as boxplots than as a stemplot. A boxplot as well as a stemplot are useful for making side-by-side comparisons.**Example 1**Consumer Reports did a study of ice cream bars (sigh, only vanilla flavored) in their August 1989 issue. Twenty-seven bars having a taste-test rating of at least “fair” were listed, and calories per bar was included. Calories vary quite a bit partly because bars are not of uniform size. Just how many calories should an ice cream bar contain? Construct a boxplot for the data above.