MISLEADING STATISTICS. Twisting information to your advantage…. Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write. – H.G. Wells.
Twisting information to your advantage…
Indeed, statistics may be one of our most effective and efficient vehicles for communicating information. It is the natural inclination of people to trust numbers over words, and statistics present numbers in an attractive format that even the most innumerate man can follow. In addition, statistics can be presented in a wide variety of forms, from line graphs to tables to pie charts. Each performs its own unique function and offers information from a new perspective.
Yet with every benefit comes a setback. Many people do not realize that numbers in a graph can be easily manipulated to reflect the author’s own wishes. The problem with graphs is that even with missing information, incomplete figures, and vague captions, they can still be presented with reasonable realism. People have grown so accustomed to seeing graphs that they accept its information unquestionably.
In the following presentation, we will show you two such misleading graphs, point out their errors, and attempt to recreate the same graph using more accurate forms of presentation. You will see how the same set of information can produce two completely different graphs, and learn about the many ways in which statistics can deceive you.
This graph is misleading in many ways. Here are some examples of the most commonly used graph-manipulation tactics.
First of all, there is the title to consider. While retail sales do go down in April 2002, the title doesn’t accurately reflect what the rest of the graph shows. Yes, the sales do rise and fall over a period of a year and a half, but in general, they have been steadily rising since November 1998.
Second, notice that the y-axis does not begin at zero, but at $225 billion. This has the unfortunate effect of making the rising slope shown in the graph much steeper than it actually is.
Third, the little white box that shows the rate of change from pervious months only includes the last three months in the graph. This immediately biases the graph in favor of the title, as it shows that sales have actually gone down since February. A reader just looking at the box will not know that sales have also gone down in May and September 1999, and that these did not affect the rising number sales one bit.
Fourth, note that the year 1999 is written under June and July, and not January. This may be a minor transgression, but it will certainly lead some readers to believe that the time period spans three whole, consecutive years and not fragments of a year.
One final observation: Is it fair to compare retail sales of the months of a year all together? Christmas in December, for example, would prompt gift buying, but slower months like February might now have any at all. Wouldn’t it be much fairer to compare the same months and calculate how much it has grown over the year?
On this second graph, the y-axis begins on zero, therefore making the rising slope much less dramatic. When presented like this it is also harder to tell which bars are higher and lower. The last two bars, for example – March and April – look almost exactly the same on this graph. If the reader wasn’t told that the sales had actually gone down from March in April, he would never know. The title, likewise, has been changed to something that can encompass all aspects of this graph. In addition, instead of labeling a group of months with one year, we have given each month its own year so that its easier to read.
However, this graph still does not address the problem of comparing all months together as equals. In our next graph, we will show you what that might look like.
Comparing the months of consecutive years yields yet another perspective to the picture. From this graph, it is easy to see that sales have steadily risen for each month, and by a fairly predictable percentage at that. Nearly all the months are rising by the same margin from one year to the next. Even April sales, which the original graph proclaimed was falling, have risen compared to its sales from the previous year.
Once again, while the original graph seems to be trying to convince us that April sales have very obviously fallen, these two graphs tell us the opposite. Appropriately, the title for this third graph has been changed completely to give the opposite minute.
Of course, there are many different ways to lie with statistics, and now we’ll show you how it can be done with a pictograph.
The most deceptive aspect of this graph is the way in which it was drawn. Firstly, the perspective puts barrel 1979 at the forefront and barrel 1973 at the back. This effectively draws reader’s eyes to the 1979 barrel first and then forces him read the rest of the years in descending order. Supporting this deceptive tactic is the fact that only the foremost barrels have complete year to read. The rest are indicated with only the last two digits, as in ‘76. Obviously, the makers of the graph intend for the audience to read in reverse chronological order, which has the effect of making oil prices seem to fall.
Secondly, the perspective makes it hard to judge the numerical difference between each barrel. For example, even though barrel 1975 appears to be over two thirds the height of 1976, in reality, the difference between them is only $0.95. Likewise, barrel 1973 seems less than half the height of 1974, yet they differ by a whopping $8.54!
A third misleading aspect is that this pictograph doesn’t contain a scale or axis’ of any kind. Without it, the reader’s attention might be directed to the area of each barrel instead. Numerically, the smallest barrel should only be about one 1/5 of the largest barrel, but in terms of area, the ratio is about 1/25. This makes the different between the two much larger than it actually is.
Lastly, the way in which the barrels are labeled seem somewhat awkward. Shouldn’t the prices be on the barrel instead of years? Prices written on the barrel will clarify that it is the cost that is changing, not the years. And with more space to indicate years, readers won’t be forced to read in reverse.
As soon as the information is transferred to a bar graph instead of a pictograph, most of the major problems, such as perspective, are eliminated. This graph neatly depicts the steadily rising prices of crude oil, and doesn’t hesitate to show sudden rises or drops. Each bar represents a number by its height without using fancy images to distract the reader. The presence of the x and y-axis’ also make it much more organized. While the original graph tended to overstate small differences and gloss over wide gaps, this graph is much more honest. One can see that the largest rise occurs between 1973 and 1973, and that it continues to rise by smaller amounts steadily over the next five years. The years on the x-axis are all clearly marked in chronological order as well so that it is easy for readers to understand.
For this type of information, using a line graph may be even more useful than a bar graph. With a line to define the rise of fall of oil prices, it is all the more obvious what the shape of the changing rates look like. This graph even seems to accentuate the huge rise between 1973 and 1974. The biggest benefits of a using a line graph, however, lies in the fact that each point is marked with small, accurate dots. These are much easier to read than bars, and the line between them outlines the contour of the rise.
What makes statistical information reliable and accurate?
To make sure statistics are accurate and reliable, one must keep a number of things in mind. Here are the some of the most important points to remember:
The first and most important is the collection of information. It’s alarmingly easy to make graphs with missing figures, and this only produces inaccurate results. Before making any graph, it is wise to make that the data is sufficient. This is especially true in surveys, where the accuracy of the results is in direct proportion to the number of people surveyed. Next to quantity in importance is quality. There is little point in making a graph with inaccurate information.
Even with accurate information, however, you must know which is the best way of using and presenting it. Many perfectly accurate statistics become misleading when they are unfairly compared. You would not, for example, compare the average grades of a small school to the average grades of a large school without making allowances for the larger diversity of students. Therefore, when presenting data, care must be taken to prevent this.
Although this graph is pleasing to look at, it can also be confusing. The author meant for the Number of Buyers to be calculated by the height of each picture, but the reader’s attention will be more focused on area. What makes it even more biased is that each monitor on the graph is a Macintosh.
Colors should also be used with care, and this applies to most graphs: they should enhance a good presentation, not act as a crutch for a poor one. Likewise, be very careful if you are drawing in perspective. Perspective tends to be hard to read and understand, and can easily confuse.
One of the most common ways of deceiving with graphs is to (a) Cut off the y-axis, or (b) Have the numbers on it rise in an illogical way. Take the following two examples: (a) b)
As you can see, graph (a) begins the y-axis on 80 000, making the increase between the two years seem much larger than it actually is. On graph (b), the numbers on the y-axis start on zero, but then double itself for each consecutive value. This makes it seem as if the greatest increase occurred between x-values 1 and 2, and not 3 and 4. These two mistakes should be avoided at all costs on graphs.
The last point to remember pertains to averages. Averages are a tricky business because many people apply them in places where they should not be applied. That is where medians and modes come in. In cases where a survey turns up many small numbers but one enormous figure, it is probably better to use a median. In cases where there are many, many figures between a narrow range, it might be wiser to use a mode. All three of these tools: mean, median, and mode, should be used only when appropriate.
All of these are factors that make some statistical information accurate and reliable. Put together, they can make a powerful tool in presenting information. In the end, however, whether or not a graph is accurate depends on the maker, and whether he or she wishes the graph to be honest or misleading. That is why we must be true to the data when dealing with statistics, and always remember what makes the difference between an accurate graph and an inaccurate one.
“Graphing Quantitative Variables.” The Connexions Project. Modification Date: 27 June 2003. Rice University. Access Date: 01 December 2003. <http://cnx.rice.edu/content/m10927/latest/>
“Misleading Graphs.” Math and the Media: Deconstructing Graphs and Numbers. Modification Date: N/A. Reich College of Education. Access Date: 01 December 2003. <http://www.ced.appstate.edu/~goodmanj/workshops/ABS04/graphs/graphs.html>
“Misleading Graphs.” Maths Data Handling: Foundation/Intermediate. Modification Date: N/A. BBCi. Access Date: 01 December 2003. <http://www.bbc.co.uk/schools/gcsebitesize/maths/datahandlingfi/representingdatarev5.shtml>
“Statistical Manipulation.” Effective Meetings. Modification Date: N/A. SMART Technologies Inc.. Access Date: 01 December 2003. <http://www.effectivemeetings.com/productivity/communication/statmanipulation.asp>