1 / 33

Statistical Reasoning and Analysis

Statistical Reasoning and Analysis. Tony Panzarella Department of Biostatistics Princess Margaret Cancer Center Tony.Panzarella@uhnres.utoronto.ca September 2014. “Lies , damned lies, and statistics” ( British Prime Minister Benjamin Disraeli,1804–1881)

tpalmquist
Download Presentation

Statistical Reasoning and Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistical Reasoning and Analysis Tony Panzarella Department of BiostatisticsPrincess Margaret Cancer Center Tony.Panzarella@uhnres.utoronto.ca September 2014

  2. “Lies, damned lies, and statistics” (British Prime Minister Benjamin Disraeli,1804–1881) • “Talked politics, scandal, and the three classes of witnesses—liars, d—d liars, and experts.” (The Life and Letters of Thomas Henry Huxley by L. Huxley, 1900) https://en.wikipedia.org/wiki/Lies,_damned_lies,_and_statistics

  3. Statistics – the new Sexy!? • Hal Varian, Google's chief economist, says statistician will be 'the sexy job in the next 10 years.' Chad Schafer explains why. (August 4, 2013, Daniel Marsula/Post-Gazette) http://www.post-gazette.com/opinion/Op-Ed/2013/08/04/The-Next-Page-Data-Driven-Why-statistics-is-sexy/stories/201308040172#ixzz3CFcEr9IV • Google’s prediction: What will be the "sexy" job in the next ten years? Here’s a strange prediction from Google’s Chief Economist: “I keep saying that the sexy job in the next 10 years will be statisticians. And I’m not kidding.”

  4. Some Keywords • Study Designs, Confounding • Pitfalls of Data Analysis • Bias (representative sampling, statistical assumptions) • Errors in methodology (statistical power, multiple comparisons, measurement error) • Interpretation (precision and accuracy, causality, graphical representation) • Era of Big Data

  5. Prelude: Design and Analysis • Objective: Design the ultimate Intro to PHS talk… and the worst one that I can still get away with… • Methods: Identify topic(s), and delivery with visuals • Examples; No formula • Take-home messages

  6. Correlations, Cluster analysis Source: Sebastian Wernicke, 2010, TedTalks

  7. Nonparametric methods using ranks, Discriminant analysis Source: Sebastian Wernicke, 2010, TedTalks

  8. Confidence Intervals, Associations, Time-to-event analysis Source: Sebastian Wernicke, 2010, TedTalks

  9. Data Mining, Pattern Recognitions Source: Sebastian Wernicke, 2010, TedTalks

  10. Pattern Recognitions (Evidence….Hans Rolling) Source: Sebastian Wernicke, 2010, TedTalks Hans Rolling “The best stats you’ve ever seen, New insights on poverty)

  11. Motivating Example: Smoking & Survival • 20-year follow-up study, Wickham in UK (Tunbridge et al. 1977) • 1972-1974, one-in-six survey of the electoral roll, largely concerned with thyroid disease and heart disease • For simplicity, consider women aged 45 to 75 at the start of the study • Smoking status: current smoker (Y/N) • 20-year survival info: determined for all women in the study

  12. Smoking & Survival (Cont’d) • Protective effect of smoking? (data adapted from Appleton et al. 1996, Am. Stat.)

  13. Smoking & Survival (Con’t) • Consider 10-year ranges: 45-54,55-64,65-75 • Non-smoking group does better in each case!

  14. Gender Bias, or Not? • 1973, UC Berkeley wassued for discrimination against women in graduate school admissions • Percent acceptance: Male vs Female, 44% vs. 35%

  15. Gender Bias, or Not? (cont’d) P. J. Bickel, E. A. Hammel, J. W. O'Connell. (1975). Sex Bias in Graduate Admissions: Data from Berkeley. Science 187, (4175). pp. 398-404

  16. Message #1 • Be aware of the dangers of ignoring a covariate that is correlated to an outcome variable and an explanatory one. • Simpson, E.H. (1951). “The interpretation of Interaction in Contingency Tables”, Journal of the Royal Statistical Society, B, 13, 238-241. • Simpson’s Paradox; many other examples

  17. Guard Against Biases

  18. Biases due to … • Selection of subjects: web surveys • Responses: e.g. question on income • Contamination in controls: non-blind study • Recall: food-intake • Attrition: drop out • Reporting: negative findings • Publication: meta-analysis • Over thirty kinds of biases

  19. Guard Against Biases • [BACKUP REFERENCE] Bias in design • Concato et al (2001). A nested case–control study of the effectiveness of screening for prostate cancer: research design • Concato et al. (2001) reports another type of bias in designs for prostate cancer detection when groups were asymptomatic men who received digital rectal examination, screening by prostate specific antigen and transrectal ultrasound, but there was no ‘control’ group with ‘no screening’. Thus the effectiveness of screening could not be evaluated. • Although prostate-specific antigen (PSA) and digital rectal examination (DRE) are commonly used to screen for prostate cancer, available data do not confirm that either test improves survival. This report describes the methodological aspects of a nested case–control study addressing the question of whether PSA screening, with or without DRE, is effective in increasing survival. Potential sources of bias are discussed, as well as corresponding strategies used to avoid them

  20. Possible steps to minimize bias • Assess the validity of the identified target population, and the groups to be included in the study in the context of objectives and the methodology. • Evaluate the reliability & validity of the measurements required to assess the antecedents and outcomes, also other tools you plan to deploy. • Carry out a pilot study and pretest the tools. Make changes as needed. • Identify possible confounding factors and other sources of bias; develop an appropriate design that can take care of most of these biases. • Use matching, blinding, masking, and random allocation as needed. • Analyze the data with proper statistical methods. Use standardized or adjusted rates where needed, do the stratified analysis, or use mathematical models such as regression to take care of biases that could not be ruled out by design. • Report only the evidence based the results – enthusiastically but dispassionately

  21. Multiple Testing (Large p & small n)

  22. Data Dredging

  23. “Deming, data and observational studies: a process out of control and needing fixing”Young and Karr (2011) Significance, p116-120.

  24. “Deming, data and observational studies: a process out of control and needing fixing”

  25. Deming, data and observational studies: a process out of control and needing fixing”Young and Karr (2011). Significance, p116-120 Young, S. S. and Yu, M. (2009) To the Editor. Journal of the American Medical Association, 301, 720–721.

  26. Visual Display of Quantitative Information • Effectiveness of traffic enforcement in 1955-6, Before vs. After Source (Tufte, 1983)

  27. In the Age of BIG Data • Does Big Data make Statistics obsolete? • NO! • BIG data, Big mistake? (Google Flu) • http://www.ft.com/cms/s/2/21a6e7d8-b479-11e3-a09a-00144feabdc0.html#axzz3CHWGduc7

  28. Statistical Truisms • Correlation does not imply Causation In fact, causal relationships are among the most significant discoveries from big data that analytics practitioners seek. Finding causes to observed effects would truly be a gold mine of value for any business, science, government, healthcare, or security group that is analyzing big data. • Sample variance does not go to zero, even with Big Data

  29. Statistical Truisms (cont’d) • Sample bias does not necessarily go to zero, even with Big Data  Sample bias can lead to models with biased results, slanted against the wonderful diversity of the original population • Absence of Evidence is not the same as Evidence of Absence  • Reference: http://www.amstat.org/publications/jse/v10n3/chance.html

  30. Acknowledgements • Prof. Wendy Lou

  31. Q & A

More Related