1 / 11

Review

Review. Review. We’ve covered three main topics thus far Data collection Data summarization Probability. Data Collection. We’ve talked about three ways of data collection Survey Sampling frame, questionnaire, probability sample, convenience sample, non-response bias, other types of bias

mariko
Download Presentation

Review

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Review

  2. Review • We’ve covered three main topics thus far • Data collection • Data summarization • Probability

  3. Data Collection • We’ve talked about three ways of data collection • Survey • Sampling frame, questionnaire, probability sample, convenience sample, non-response bias, other types of bias • Observational study • No assignment of treatments. No causal conclusions • Randomized experiment • Random assignment units/subjects to treatments. If done properly causal conclusions (conclusions might not generalize). • Why randomize?

  4. Data summarization • We talked about graphical and numerical summaries for one variable and two. Important to identify type of variable. • One categorical/qualitative variable • graphical: pie chart, bar graph • numerical: counts/percents/frequencies • One quantitative variable • graphical: histogram/boxplot (shape, center, spread, outliers) • numerical:mean, median, standard deviation, inter-quartile range, range, percentiles

  5. Data summarization • Two variables • One categorical/qualitative and one quantitative • graphical: side-by-side boxplots • numerical: means, meadians, SDs, IQRs, etc. for each category • Two quantitative • graphical: scatterplot (form, direction, strength, outliers) • numerical: means, SDs, etc. for both. correlation coefficient • If association is linear model with straight line. slope and intercept of regression line (prediction, interpretation, extrapolation, etc.) • Two categorical/qualitative • graphical: plots we didn’t talk about • numerical: contigency tables; marginal frequencies, conditional frequencies • Also relative risk and odds ratios

  6. Probability • To find probability of event A • Enumerate sample space. Count number of outcomes in event A. Divide by the total number of outcomes • Easy to do if sample space is small • Use probability laws to push symbols around • Independence, mutually exclusive, joint= marginal(conditional) • Sample space large only way to approach things

  7. Duke b-ball • What type of study is this? • Survey? Randomized experiment? observational study? • Might it be reasonable to assume that the opponents are a random sample of all type of opponents Duke could potentially face? • If not, then everything we see can’t be generalized to teams Duke might play in the future. (In other words, the population is the teams that Duke has played so far and we’ve have observations on all of them.)

  8. Limitations • Since this is not a designed experiment what are limitations? • Can we make causal conclusions? • nope • Is there potential for lurking variables? • Yup. In I’d bet there are some. • What type of information does looking at these type of data provide?

  9. JMP • Lets look at a few variables to summarize them graphically and numerically.

  10. Regression vs correlation coefficient • Do change of units change value? • Correlation coefficient (no) • Regression slope yes • Does defining the response and explanatory variable matter • Correlation coefficient (no) • Regression slope (yes) • Provides direction and strength of linear association • Correlation coefficient (yes, yes) • Regression slope (yes, no) • Quantifies linear association between two quantitative variables • Correlation coefficient (no) • Regression slope (yes)

  11. Correlation coefficient vs regression • Influenced by outliers • Correlation coefficient (yes) • Regression slope (yes) sometimes called influential points • Can conclude explanatory variable causes change in the response variable • Correlation coefficient (no) • Regression slope (no) • Although under a well designed experiment it is possible • Must both variables be quantitative • Corelation coefficient (yes) • Regression slope (not necessarily but I don’t think we’ll be able to cover the the quantitative qualitative regression often called ANOVA)

More Related