Statistics. Chapter 10. 10.1 Organizing and Picturing Information. Line Plot: A line plot is a basic and intuitive visual representation of data. 16. 14. 12. 10. Frequency. 8. 6. 4. 2. 10. 20. 30. 40. 50. 60. 70. 80. 90. 100. Student Test Scores.
A line plot is a basic and intuitive visual representation of data.
Student Test Scores
22, 23, 14, 45, 39, 11, 9, 46, 22, 25, 6, 28, 33, 36, 16, 39, 49, 17, 22, 32, 34, 22, 18, 21, 27, 34, 26, 41, 28, 25
6, 9, 11, 14, 16, 17, 18, 21, 22, 22, 22, 22, 23, 25, 25, 26, 27, 28, 28, 32, 33, 34, 34, 36, 39, 39, 41, 45, 46, 49
Step1: Put data in ascending order.
Step 2: Place one dot for each score
Science Test Scores
A stem and leaf plot is an effective way to present two sets of data side by side for analysis.
For a two digit number the “stem” is the tens place, and the “leaf” is the ones place.
Stems are listed once in ascending order vertically. Leaves are placed in increasing order away from the stem, and may be repeated if necessary.
94, 105, 107, 108, 108, 120, 121, 122, 123
For three digit numbers, the stem is the hundreds and tens positions, and the leaf is the ones position.
A bar graph used to graph frequency distributions of continuous variables is called a histogram.
The graph is similar, but no spaces are allowed between the bars.
When there are many different values in a data set, we may group the data values into classes to better understand the information.
Typically we use between 8 and 12 classes, but there is no rule that dictates the number we must use. Choose wisely.
Bar graph: Specify the classes on the horizontal axis and the frequencies on the vertical axis.
A pictograph uses a picture or icon to symbolize the quantities being represented.
Pictorial Embellishments are used to make the graph more visibly appealing.
Data that occurs in pairs, such as dates and temperature, selling price of a home and its appraised value, etc, can be plotted on a set of axes similar to an xy plane. Such a plot is called a scatterplot.
One of the ways to summarize data numerically is to calculate measures of center.
The measures we will use are the mean, median, mode and quartiles.
We will also be examining the Five Number Summary.
The arithmetic mean is what we usually refer to as the “average.”
To calculate the mean, we add up all the data points and divide by the number of data points.
If we arrange a set of numbers in order, the median is the middle value in the list of numbers.
Case 1: Odd number of data points: The median is the data point in the middle position.
Case 2: Even number of data points: The median is the average of the two middle numbers and is not a data point.
The mode is the most frequent data point in the set.
There can be more than one mode. If there are two modes, the data set is “bimodal”.
The median divides the data set into two halves. The set below the median is the lower half, and the set above the median is the upper half.
The median of the lower half is the first quartile, Q1. The median of the upper half is the third quartile, Q3.
The low data point, Q1, the median,Q3and the high data point form the five number summary.
The graph of the five number summary is called a box and whisker plot.
If we were to divide the data into 100 equal parts, percentiles could be used to mark the dividing points in the data.
A number is in the nth percentile of some data if it is greater than or equal to n% of the data.
The range is the difference between the largest and the smallest data values in the set.
If x is a data value in a set whose mean is then is called x’s deviation from the mean.
The standard deviation measures how far off the mean a data point is “on average”. Think of standard deviation as the “average deviation” of a data set.
The z-score, z, for a particular score, x, is
The z-score indicates how many standard deviations the number is away from the mean. Numbers above the mean = positive z-score. Numbers below the mean = negative z-score.
Definitions: A collection of numerical information is called data or a distribution. A set of data listed with their frequencies is called a frequency distribution.
When the percent of the time each item occurs in a frequency distribution is listed, we call the distribution a relative frequency distribution
Bar graphs can be drawn using frequency distributions or relative frequency distributions.
Bar graph using Relative Frequency
Bar graph using Frequency Distribution
When describing a set of data, statisticians often look to the shape of the data. One special shape that occurs frequently is a bell curve. The bell curve indicates that a distribution is “normal”.
When discussing normal distributions, we assume we are dealing with an entire population rather than a sample. To indicate this, we change the symbols representing the mean and standard deviation.
Areas under the curve represent percentages (or probabilities) of values in a distribution.
To address this idea properly and generally, we need something called the standard normal distribution.
This distribution is also called a “z distribution.”
68% of the data lies between z = - 1 and z = 1
95% of the data lies between z = -2 and z = 2
99.7% of the data lies between z =-3 and z =3
Scaling and Axis Manipulation
To make the differences among bars of a histogram or bar chart more dramatic, the axes are often manipulated, either by changing the scales or omitting the scale values.
To manipulate a line graph, one could either compress the vertical axis scaling or extend the scaling, whichever fits the desired effect.
A circle graph can be misleading by not indicating the percent amounts, not having the correct central angle, or by illustrating the graph by “exploding” sectors.
The entire group in question is called the population. The subset of the population that is actually questioned is called a sample.
A bias is a flaw in the sampling procedure that makes it more likely the sample will not represent the entire population.