1 / 34

Confidence Levels, Dinosaur Footprints, and Humanity’s Quest For Truth

Confidence Levels, Dinosaur Footprints, and Humanity’s Quest For Truth. Chapter 19 Rocks!. Formulaic Consequences. In our formula, fractionwise , z-star is in the top and n is in the bottom. The terms for these are directly proportional (for z-star) and inversely proportional (for n).

egil
Download Presentation

Confidence Levels, Dinosaur Footprints, and Humanity’s Quest For Truth

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Confidence Levels, Dinosaur Footprints, and Humanity’s Quest For Truth Chapter 19 Rocks!

  2. Formulaic Consequences • In our formula, fractionwise, z-star is in the top and n is in the bottom. • The terms for these are directly proportional (for z-star) and inversely proportional (for n). • Technically the interval is inversely proportional to the square root of n, but there is still inverse-type stuff going on there.

  3. Formulaic Consequences • This means that when z-star gets bigger, the confidence interval gets wider. • A bigger z-star is the result of a higher confidence interval. • This means that the more confident we want to be in our answer, the more generously vague our answer needs to be.

  4. Formulaic Consequences • If we use a lower confidence level, the z-star gets smaller and our interval gets narrower. • We are, however, more likely to be wrong. • This is another reason we use 95% confidence most of the time, since it gives a good balance generally.

  5. Formulaic Consequences • If n gets larger, the confidence interval is narrower. • This means larger samples let us draw more accurate conclusions about the population. • Because n is inside a square root, though, the effect a larger sample has is not constant. • So the larger our sample, the less useful each additional subject is.

  6. Formulaic Consequences • This idea that sooner or later additional subjects are not worth studying is an example of the Law of Diminishing Returns. • What this means for you is that a larger sample makes a narrower (or smaller) confidence interval without changing the confidence level. • It is not a complete fix, however, as samples can only help so much before it becomes ridiculous to sample more.

  7. Confidence Level • Many people assume that a 95% confidence level means that it is 95% likely that your interval includes the true value. • This is wrong! • Wrong, wrong, wrong, wrong, wrong! • Wrong, wrong, wrongity, wrong-wrong!

  8. Dinosaur Footprints • In my Philosophy 101 class one of the most insightful ideas I caught from it was about dinosaur footprints. • What I am about to share, changed my life forever. • I was almost 2 decades into having an IQ well above other people, having spent my whole life being an information sponge, and this idea still changed me in a way that even now continues to adjust how I see the world.

  9. Dinosaur Footprints • Consider the spot where I am standing. • Did a dinosaur ever occupy that space? • Let’s discuss this.

  10. On Discussion Later… • The simple fact of the matter comes down to this: • Either the answer is yes or the answer is no and there is absolutely no way for me to determine the truth of it without utilizing some kind of superhuman power. • If you have such powers do NOT admit to them in public school.

  11. The Hidden Truth of Dinosaur Footprints • Sometimes in life just because you cannot know if something is true does not mean there is no truth. • Some truths absolutely exist and humans simply have no access to them. • They still exist and still are true. • Even most regular people can appreciate this idea when introduce to it through dinosaur footprints.

  12. Statistics Cannot Prove Anything • Ever. • However, the truth is there, and is just inaccessible to humans through the methods of statistics. • Often a census would be required, and in many cases, even a census is inaccessible to humans.

  13. Back to Confidence Levels • When we make a confidence level, one of two things happened. • Option 1: We correctly identified a range that does include the true population value and we have no way to verify it. • Option 2: We did not correctly identify a range that includes the true population value but we have no way to know we failed (other than a census).

  14. Confidence Levels • So on a 95% confidence interval, it is not a 95% chance that we were correct. • We either are correct or we are not. • We have no sensible way to verify it either. • So if it isn’t probability, what significance does the 95% have?

  15. Dumber Than A Census • For that we have to refer back to sampling distributions. • If we took every possible sample, we would have a sampling distribution. • We would also have failed at life, most likely. • If we did the 95% confidence interval on each and every single possible sample, then 95% of them would contain the true population value.

  16. Dumber Than A Census • Generating a sampling distribution is dumber than a census. • Generating every possible confidence interval for each sample in that sampling distribution is even dumber than a sampling distribution. • Pretty much it is going from ludicrous speed straight to plaid. • Space Balls reference.

  17. Confidence Level • So, since we have no way to verify if we succeeded or not with our interval, we do the next best thing. • We confidently trust that 95% is often enough that right now is not the 5%. • In other words, we simply arrogantly assume we are right, because 5% is much less than 95%, plus, since someone as awesome as us was researching, it is obviously correct, right?

  18. Interpreting the Confidence Interval • We will use a script. • Part of the reason this script is so important is that the phrasing bypasses all of this “what does the confidence level really mean” business. • Failing to use the script can make it seem like you do not actually understand what confidence level means.

  19. Interpretation Script • “I am <confidence level> confident that the true <population mean/percent of the population> for <restate the population> is between <lower limit> and <upper limit>.” • Example: “I am 95% confident that the true percent of the population of highland students who oppose school uniforms is between 69.3% and 81.1%.”

  20. A Gentlestatisticians’ Agreement • Statisticians have all basically agreed that when someone says they are 95% confident that we all know that it does not mean that they are right with a 95% probability. • We have all agreed to simply appreciate that they mean they used a method which is correct 95% of the time and that they have no freaking clue if they are right or wrong.

  21. The 4 Conditions • There are 4 conditions to check. • They are Random, Independent, Less Than 10%, and Large Enough sample. • For proportional data, Large Enough becomes Success/Failure instead. • Some of these conditions are obviously met in a problem, but you are still expected to address them.

  22. Random • For the “The sample was selected randomly” condition, it will generally be addressed in one of three ways. • Option 1: “Random – The problem said it was random.” • Option 2: “Random – The problem did not specify that it was random, but this is a reasonable assumption because <reason>” • Often a good claim is since the study was done by experienced researchers, they obviously would have randomized. • Option 3: “Random – The sample fails to be random because <reason>.”

  23. Consequences of Failure • We need the sample to be random because if it is not random, we introduce bias, and that means our sample might be even less representative than a random one. • This makes generalizing it to the population an unreliable process. • Therefore if it is not random, our results are not able to be fully generalized to the population with the same degree of accuracy. • In other words, a 95% confidence interval might actually carry with it even less confidence than the level suggests.

  24. Independent • For the “The subjects of the sample are independent from one another condition” it will generally be handled just one way. • Option 1: “Independent – It is reasonable to assume independence because one <subject>’s <measurement> does not determine another.” • Ex: “Independent – One person’s preference in hair color does not determine another person’s preference.” • For truly random phenomena such as die rolling and coin flipping, there is another option. • Option 2: “Independent – Coin flips/Rolls of a die are independent.”

  25. Consequences of Failure • Unless there is some reason that the method used is generating truly dependent values, this assumption pretty much always works. • If, however, you were not generating independent values, just like with a randomization failure, the results are less reliable. • Meaning that using the methods of these two units is doomed to failure in this case.

  26. Less Than 10% • When we get an outlier, we can generally just assume that the outlier is unique and is a freak occurrence as it were. • The problem is that in a large enough segment of the population, freak occurrences are going to happen regular. • Consider the problem about the 200 coin tosses done in a large introductory stats class. • Even though the girl got a very unlikely result (42% and below was only about 1% likely), in such a large class where all results were recorded, we would actually expect someone to get a result like that.

  27. Less Than 10% • To go from 10% to 100%, we must multiply by 10. • So, for this condition, we first need to take the sample size and multiply it by 10. • We meet the condition if the population size is presumably at larger than that. • We do not meet the condition if our sample size is not. • Consider Mr. Sanford testing 16 bags of microwave popcorn to see what percent of my sample pops at least 99% of the kernels when popped for 2 minutes.

  28. Less Than 10% • There would be two ways to handle this. • Option 1: “Less than 10% - There are more than 160 popcorn bags out there.” • Option 1 Alternative (Use sass): “Less than 10% - There are more than 160 popcorn bags in my basement alone.” • Option 2: “Less than 10% - The number of popcorn bags is less than 160.” • Keep in mind this would mean…the full total.

  29. Consequences of Failure • If there is noticeable skew or outliers in a sample that is more than 10% of the population, it does not fully normalize. • For proportional data, our methods similarly break down and confidence does not mean exactly the same thing. • This condition is, however, the least harmful to violate, as its effect is smaller on the validity of our processes.

  30. Large Enough – Success/Failure • If n times p is more than 10 and n times the complement of p is more than 10, you meet this condition. • You are expected to show the product of both multiplications. • I do not expect you to write a sentence for this. • I feel like the multiplication speaks for itself. • You do, however, need to say “Success/Failure – ” or “Large Enough – ” before you start the multiplying so that I know you know you are doing it to check this condition.

  31. Consequences of Failure • If you fail this condition, rather than using z scores to find probability, you need to construct the entire binomial distribution in order to discuss confidence intervals or hypothesis tests. • I will not go over how to do this since you will not be expected to actually manage this.

  32. When We Fail • In a more perfect world, if we failed a condition, we might fix the failure and resample. • On the homework problems, quiz questions, and test questions we will instead go ahead with the rest of the process anyway. • In the real world if you failed an assumption like that you would either use a model that worked around it you would not be able to continue further.

  33. Assignments • Chapter 18 problems (5, 17, 25, and 27) are due tomorrow. • Chapter 19 problems (7, 11, 13, 15, 23) are due Tuesday. • Chapters 18+19 quiz will be Tuesday. • There is a practice quiz online. • Midterm project presentations will be next week.

  34. Quiz Bulletpoints • Be able to use z-scores to find probabilities for individuals. • Be able to use z-scores to find probabilities for sample averages. • Be able to use z-scores to find probabilities for sample proportions. • Be able to find a confidence interval for the true proportion based on a sample. • Be able to find the sample size in order get a desired margin of error.

More Related