1 / 32

Statistics [1/2,3/2]

Statistics [1/2,3/2]. The Essential Mathematics. Standard Error. What standard deviation is to an individual (relative to a population mean), standard error is to a sample mean (relative to a population mean) standard deviation/sqrt(n)

thisbe
Download Presentation

Statistics [1/2,3/2]

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistics [1/2,3/2] • The Essential Mathematics

  2. Standard Error • What standard deviation is to an individual (relative to a population mean), standard error is to a sample mean (relative to a population mean) • standard deviation/sqrt(n) • All parameters have a standard error associated with them...we use them to “normalize” statistical tests

  3. Short Exercise • What is the mean of {1,2,3,4,5}? • Now, let’s take all possible triplets: • {1,2,3}, {1,2,4}, {1,2,5}, {1,3,4}, {1,3,5}, {1,4,5}, {2,3,4}, {2,3,5}, {2,4,5}, {3,4,5}

  4. Short Exercise • What is the mean of {1,2,3,4,5} = 3 • Std. Dev (sample) = 1.58114 • Now, let’s take all possible triplets: • 2, 7/3, 8/3, 8/3, 3, 10/3, 3, 10/3, 11/3, 4 • Mean = 3, Std. Dev (sample) = .60858 • Maximum offset: 1 (was originally 2) • Message: having a group reduces the Std. Dev, hence we have standard error

  5. Statistical Tests • Null hypothesis: A hypothesis of no change • Alternate hypothesis: A hypothesis of change • All stats tests assume “no change from something”...the goal is to prove otherwise...

  6. Common Tests • Skewness and Kurtosis • Z-Test / T-Test • ANOVA / F-Test • Correlation Test

  7. Standard Error Skewing

  8. Who’s Skewed?

  9. Standard Error Skewing

  10. Who’s Kurtic?

  11. Central Limit Theorem • Population distribution X • Take n (large) random samples and compute the mean of the samples • The distribution of these random sample means (independent of X) will follow the Gaussian distribution, hence we call it Normal

  12. Normal Distribution

  13. Z-Test • Assumes normality • Either you know it should be normal, or you have enough of a sample size to use the Central Limit Theorem • (observed - mean)/(std. dev / sqrt(n)) • This equation is generalized for sample means of sample size n (individual is n = 1)

  14. Example • A group of 9 people takes an IQ test. The population is known to follow a normal distribution with average score of 100 on the same test with a standard deviation of 15. The group of 9 averaged a score of 105. Should we assume that this group differs from the population of test takers?

  15. Calculation • (sample mean - population mean) = 5 • (std. dev)/sqrt(9) = 5 • z = 5/5 = 1 • What does this 1 mean?

  16. Generalization • An arbitrary Gaussian distribution down to a Gaussian distribution with mean 0 and standard deviation 1 • It’s a value that helps us find another value

  17. p-value • Every statistical test has a p-value • The probability that other observations (less than it) have already occurred • In other words, how extreme the observation is relative to others of its kind • z = 1 links to a p-value of .8414 (or .1586) • Not something very extreme

  18. a-level • Every statistical test has an alpha level • The level at which you reject the null hypothesis in favor of the alternate hypothesis • This defines how you handle the p-value • Otherwise known as Type 1 Error (false rejection probability)

  19. T-test • A test for when normality cannot be assumed • Behaves just like a z-test, but has a different distribution to work from • Degrees of freedom

  20. ANOVA • A way to test whether or not there is a difference based upon some factor in a study • Partitions variance into sources and uses the ratio as the determining factor

  21. One-Way ANOVA

  22. Two-Way ANOVA

  23. Example ANOVA

  24. Example • It has always been said that hitter of the opposite hand as the pitcher throws will succeed at a higher rate • Does this claim hold water?

  25. Example • Managers frequently set their lineups on the principle that they do not want left-handed hitters back-to-back because a left-handed specialist (almost always an LHP) can be used to get consecutive outs, yet righties are frequently stacked without concern. • Are these managers paranoid, or is there some merit to this?

  26. Sample Set • 30 of the top 75 qualifying hitters for MLB batting titles in 2012 were selected • Top 10 right-handed hitters • Top 10 left-handed hitters • Top 10 switch hitters (both left and right) • Average against LHP and average against RHP was recorded for each of these 30 hitters

  27. Let’s check it out!

  28. Correlation Test • I got an r-value from a regression that I performed • What does it tell me? • Long story short, it depends on the sample size

  29. Correlation Test Statistic H0: correlation (r) = p HA: correlation is <,> that

  30. Interesting Picture Positively Correlated, but could be perfect positive model Positively Correlated Not Correlated, but could be perfect positive model Not Correlated No Clue Not Correlated, but could be perfect negative model Negatively Correlated Negatively Correlated, but could be perfect negative model

  31. What did we learn? • When dealing with correlation studies, make sure you have at least 13 observations • You can disassociate no correlation from the possibility of a perfect model at this sample size (at 95% confidence) • With more confidence, you will need more observations to achieve this • A little correlation goes a long way in large samples • With small samples, more correlation is required to make a claim

  32. Assignment • Given definition of outliers for a population: • 25% - 1.5(IQR) • 75% + 1.5(IQR) • Determine what the z-scores of the minimum outliers on either side would be • I will send you an ANOVA table: • Tell me the factorial environment • A has a levels • B has b levels • How many subjects per block n

More Related