1 / 28

INTRODUCTION TO CLINICAL RESEARCH How To Make A Bad Plot Karen Bandeen-Roche, Ph.D. July 13, 2010

INTRODUCTION TO CLINICAL RESEARCH How To Make A Bad Plot Karen Bandeen-Roche, Ph.D. July 13, 2010. How to display data badly. Karl W Broman Department of Biostatistics and Medical Informatics University of Wisconsin. Using Microsoft Excel to obscure your data and annoy your readers.

Download Presentation

INTRODUCTION TO CLINICAL RESEARCH How To Make A Bad Plot Karen Bandeen-Roche, Ph.D. July 13, 2010

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. INTRODUCTION TO CLINICAL RESEARCHHow To Make A Bad PlotKaren Bandeen-Roche, Ph.D.July 13, 2010

  2. How to display data badly Karl W Broman Department of Biostatistics and Medical Informatics University of Wisconsin

  3. Using Microsoft Excel to obscure your data and annoy your readers Karl W Broman Department of Biostatistics http://www.biostat.jhsph.edu/~kbroman

  4. Inspiration This lecture was inspired by H Wainer (1984) How to display data badly. American Statistician 38(2):137-147 Dr. Wainer was the first to elucidate the principles of the bad display of data. The now widespread use of Microsoft Excel has resulted in remarkable advances in the field.

  5. General principles The aim of good data graphics: Display data accurately and clearly. Some rules for displaying data badly: • Display as little information as possible. • Obscure what you do show (with chart junk). • Use pseudo-3d and color gratuitously. • Label badly • Use a poorly chosen scale. • Ignore sig figs.

  6. Displaying data well • Be accurate and clear. • Let the data speak. • Show as much information as possible, taking care not to obscure the message. • Science not sales. • Avoid unnecessary frills — esp. gratuitous 3d. • In tables, every digit should be meaningful. Don’t drop ending 0’s.

  7. Displaying data well • Show “typical”, “average” values • Convey extent of “spread”, “variability” in values • Compare groups clearly • Label explicitly

  8. Supersaturation of Bile Data Set • Einarsson K, et al (NEJM 313:277, 1985; reprinted in D-S & T, p. 28 1st ed) • Supersaturation of bile with cholesterol necessary for cholesterol gall stones • Female gender and increasing age are risk factors for gall stones • Is either gender or age associated with percentage cholesterol saturation of bile? • Cross-sectional data on 60 healthy Swedish subjects (31 men, 29 women) who were not obese

  9. Bile Data Set

  10. Measures of the “Average” • “Average” -- typical or representative value; where the distribution is “centered” • Different measures of the center -- usually, all the same for symmetric distributions (ones that look on right or left of center • Median -- value such that half the observations are less than it and half are greater than it (50th percentile) Males Females 86% 84% • Mode -- value where the distribution achieves maximum -- most likely value Males Females 80-90% (85%) 80-90 (85%) • Mean -- sum of values divided by the number of values = Males Females 84.5% 88.5%

  11. Measures of Spread • Spread -- variability among the observations • Different measures of spread, like averages, represent distinct aspects of distribution • Interquartile range • 75th-25th percentiles -- range of values that contains middle 50% of data Men Women 106-66= 40.0% 111.5-71=40.5%

  12. Measures of Spread (cont’d) • Variance = (standard deviation)2 = mean squared error deviation from the mean variance = standard deviation = square root of variance MenWomen (24.0%)2=574(%2) (26.6%)2=761(%2) to n SUM from i=1

  13. Some common data displays • Displays for continuous data • Histograms / Stem and leaf plots • Boxplots • Displays for categorical data: tables • Displays for relationships of two variables (on same “people”) to each other • Continuous data: scatterplots • Categorical data: cross-tabulations

  14. Stem and Leaf Plot: % Saturation, Men

  15. Boxplots of Bile Data

  16. Scatterplot: SBP vs DBP SBP DBP 16

  17. Some really bad plots

  18. Example 1

  19. Example 2 Distribution of genotypes AA 21% AB 48% BB 22% missing 9%

  20. Example 3

  21. Example 4

  22. Example 5

  23. Example 6

  24. Example 7

  25. Example 8

  26. Main points once again Be accurate and clear. Let the data speak. Show as much data as possible, taking care not to obscure the message. Science not sales. Avoid unnecessary frills Go for the cleanest display that conveys the necessary info In tables, every digit should be meaningful. Don’t drop ending 0’s. 26

  27. Displaying data well Show “typical”, “average” values Convey extent of “spread”, “variability” in values Compare groups clearly Label explicitly 27

  28. Further reading • ER Tufte (1983) The visual display of quantitative information. Graphics Press. • ER Tufte (1990) Envisioning information. Graphics Press. • ER Tufte (1997) Visual explanations. Graphics Press. • WS Cleveland (1993) Visualizing data. Hobart Press. • WS Cleveland (1994) The elements of graphing data. CRC Press.

More Related