1 / 57

Experimental Lifecycle

This guide explores common mistakes in data visualization, such as using multiple scales and symbols instead of text, and provides best practices for creating clear and informative graphics.

drieth
Download Presentation

Experimental Lifecycle

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. “groping around” experiences Vague idea Initialobservations Hypothesis Model Experiment Data, analysis, interpretation Experimental Lifecycle Results & finalPresentation

  2. Common Mistakes in Graphics • Excess information • Multiple scales • Using symbols in place of text • Poor scales • Using lines incorrectly

  3. Start here Multiple Scales • Another way to meet length limits • Basically, two graphs overlaid on each other • Confuses reader (which line goes with which scale?) • Misstates relationships • Implies equality of magnitude that doesn’t exist

  4. Some Especially Bad Multiple Scales

  5. Using Symbolsin Place of Text • Graphics should be self-explanatory • Remember that the graphs often draw the reader in • So use explanatory text, not symbols • This means no Greek letters! • Unless your conference is in Athens...

  6. It’s All Greek To Me...

  7. Explanation is Easy

  8. Poor Scales • Plotting programs love non-zero origins • But people are used to zero • Fiddle with axis ranges (and logarithms) to get your message across • But don’t lie or cheat • Sometimes trimming off high ends makes things clearer • Brings out low-end detail

  9. Nonzero Origins(Chosen by Microsoft)

  10. Proper Origins

  11. A Poor Axis Range

  12. A Logarithmic Range

  13. A Truncated Range

  14. Using Lines Incorrectly • Don’t connect points unless interpolation is meaningful • Don’t smooth lines that are based on samples • Exception: fitted non-linear curves

  15. Incorrect Line Usage

  16. Pictorial Games • Non-zero origins and broken scales • Double-whammy graphs • Omitting confidence intervals • Scaling by height, not area • Poor histogram cell size

  17. Non-Zero Originsand Broken Scales • People expect (0,0) origins • Subconsciously • So non-zero origins are a great way to lie • More common than not in popular press • Also very common to cheat by omitting part of scale • “Really, Your Honor, I included (0,0)”

  18. Non-Zero Origins

  19. The Three-Quarters Rule • Highest point should be 3/4 of scale or more

  20. Double-Whammy Graphs • Put two related measures on same graph • One is (almost) function of other • Hits reader twice with same information • And thus overstates impact

  21. OmittingConfidence Intervals • Statistical data is inherently fuzzy • But means appear precise • Giving confidence intervals can make it clear there’s no real difference • So liars and fools leave them out

  22. Graph WithoutConfidence Intervals

  23. Graph WithConfidence Intervals

  24. Confidence Intervals • Sample mean value is only an estimate of the true population mean • Bounds c1 and c2 such that there is a high probability, 1-a, that the population mean is in the interval (c1,c2): Prob{ c1 < m < c2} =1-awhere a is the significance level and100(1-a) is the confidence level • Overlapping confidence intervals is interpreted as “not statistically different”

  25. Graph WithConfidence Intervals

  26. Reporting Only One Run(tell-tale sign) Probably a fluke(It’s likely that withmultiple trials this would go away)

  27. 1960 1980 Scaling by HeightInstead of Area • Clip art is popular with illustrators: Women in the Workforce Any quesses? w1980/w1960 = ?

  28. The Troublewith Height Scaling • Previous graph had heights of 2:1 • But people perceive areas, not heights • So areas should be what’s proportional to data • Tufte defines a lie factor: size of effect in graphic divided by size of effect in data • Not limited to area scaling • But especially insidious there (quadratic effect)

  29. 1960 1980 Scaling by Area • Here’s the same graph with 2:1 area: Women in the Workforce

  30. Histogram Cell Size • Picking bucket size is always a problem • Prefer 5 or more observations per bucket • Choice of bucket size can affect results:

  31. Histogram Cell Size • Picking bucket size is always a problem • Prefer 5 or more observations per bucket • Choice of bucket size can affect results:

  32. Histogram Cell Size • Picking bucket size is always a problem • Prefer 5 or more observations per bucket • Choice of bucket size can affect results:

  33. Don’t Quote DataOut of Context

  34. The Same Data in Context

  35. Tell the Whole Truth

  36. Tell the Whole Truth

  37. Special-Purpose Charts • Histograms • Scatter plots • Gantt charts • Kiviat graphs

  38. Tukey’s Box Plot • Shows range, median, quartiles all in one: • Variations: minimum quartile median quartile maximum

  39. Histograms

  40. Scatter Plots • Useful in statistical analysis • Also excellent for huge quantities of data • Can show patterns otherwise invisible

  41. Gantt Charts • Shows relative duration of Boolean conditions • Arranged to make lines continuous • Each level after first follows FTTF pattern

  42. Gantt Charts • Shows relative duration of Boolean conditions • Arranged to make lines continuous • Each level after first follows FTTF pattern F T F T T F F T T F F T T F

  43. Kiviat Graphs • Also called “star charts” or “radar plots” • Useful for looking at balance between HB and LB metrics HB LB

  44. Useful Reference Works • Edward R. Tufte, The Visual Display of Quantitative Information, Graphics Press, Cheshire, Connecticut, 1983. • Edward R. Tufte, Envisioning Information, Graphics Press, Cheshire, Connecticut, 1990. • Edward R. Tufte, Visual Explanations, Graphics Press, Cheshire, Connecticut, 1997. • Darrell Huff, How to Lie With Statistics, W.W. Norton & Co., New York, 1954

  45. Ratio Games • Choosing a Base System • Using Ratio Metrics • Relative Performance Enhancement • Ratio Games with Percentages • Strategies for Winning a Ratio Game • Correct Analysis of Ratios

  46. Choosing a Base System • Run workloads on two systems • Normalize performance to chosen system • Take average of ratios • Presto: you control what’s best

  47. Code Size Example

  48. Simple Example

  49. Using Ratio Metrics • Pick a metric that is itself a ratio • power = throughput  response time • cost / performance • improvement ratio • Handy because division is “hidden”

  50. Relative Performance Enhancement • Compare systems with incomparable bases • Turn into ratios • Example: compare Ficus 1 vs. 2 replicas with UFS vs. NFS (1 run on chosen day): • “Proves” adding Ficus replica costs less than going from UFS to NFS

More Related