Please turn off cell phones, pagers, etc.

Please turn off cell phones, pagers, etc.

Welcome to Stat 100.001 Joe Schafer Associate Professor Department of Statistics and The Methodology Center The Pennsylvania State University

Course website All official information pertaining to this course—syllabus, announcements, lecture notes, assignments, etc.—are posted at the website: http://www.stat.psu.edu/~jls/stat100/ Be sure to visit this website early and often.

Textbook Utts, J. M. (2005) Seeing Through Statistics, Third edition. Duxbury Press. We will follow this book closely, covering about one chapter per lecture. Additional material will be presented by the instructor as needed. Students are responsible for all material covered in the lectures, whether or not it can be found in the textbook.

Quizzes Near the end of every lecture, we will conduct a random experiment to determine whether or not a quiz will take place (probability between 1/3 and 2/3). If “yes,” you will receive a short quiz. • quiz based solely on that day’s lecture • no make-up quizzes for any reason • missed quiz will be scored as zero • ten best quiz scores will contribute to your • final grade; remaining scores will be dropped

Exams Three in-class exams will be given during the semester (dates to be announced soon). Final exam during finals week. • no make-up exams for any reason • exam may be missed only by prior written request • and approval by instructor

Homework Regular homework exercises will be assigned (approximately 5 problems per lecture) • not collected or graded • solutions to some problems available in book • or posted on website • each exam will include some questions derived • from the homework

Grading Scheme Overall grade based on a possible 500 points • ten best quizzes worth 50 points (10%) • exams worth 100 points (20%) each • final exam worth 150 points (30%) Extra credit problems (worth 1 point each) may occasionally appear in various on the course website. Answers must be emailed to the instructor by the specified deadline.

Course rules • Don’t disrupt the learning environment. • Pay attention in class. • You are responsible for all material and • announcements given in class and/or posted • on the website For a complete list of rules and policies, see the syllabus posted on line at http://www.stat.psu.edu/~jls/stat100

By appointment at The Methodology Center • (204 E. Calder Way, Suite 400) Office Hours • Monday and Wednesday 2-3:30 pm in 422A Thomas

Lecture 1 This lecture is not strictly based on material in the textbook, but it covers some of the same concepts that appear in Chapter 1. We will cover Chapter 1 next time. Topic for today: “The three C’s and the Three R’s” • co-occurrence, causation, counterfactuals • replication, randomization, representativeness

Co-occurrence and causation Many questions in scientific research are of the following nature • We observe something (A) together with something • else (B) (co-occurrence) • We want to know, “Does A cause B”? (causation) Causation usually requires co-occurrence, but co-occurrence does not automatically imply causation.

Example 1 Light switch You walk into a room and flip the switch on the wall. The lights come on. Can you immediately conclude that A (flipping the switch) caused B (the lights go on)? No, because you cannot be sure that the lights would not have gone on anyway.

Example 2 Global warming and atmospheric CO2 Average surface temperatures around the world have apparently been increasing. Levels of carbon dioxide in the atmosphere have also been increasing. Can we conclude that increasing CO2 levels cause global warming? No, because you cannot be sure that global warming would not have taken place anyway.

Meaning of causation Suppose A and B co-occur. What does it mean for A to have caused B? “A caused B” means that B would not have occurred if A had not occurred. Or, more generally, it means that B would have been less likely to occur if A had not occurred.

Counterfactual outcomes Causation, therefore, is really statement about counterfactual outcomes. If A and B occur together, the counterfactual outcome is what would have happened (either B or not B) if A had not occurred. • Counterfactual outcomes cannot be observed. We • cannot simultaneously observe what happens under A • and not A. • Yet, using careful experimentation and statistical analysis, • we can sometimes infer or prove beyond a reasonable • doubt that A causes B. How?

The Three R’s To infer or prove causation in a scientifically defensible manner, we rely on three R’s • replication: Use multiple subjects or experimental units • randomization: Randomly select some units to receive • A, the others to receive not A • representativeness: Make sure that the units in your • experiment are similar to (representative of) the • population to which you want to generalize

Example 1 (continued) Light switch experiment Designate ten moments in time. Randomly select five and flip the switch at those times, but not the others. Suppose you observe that the light is on at the five moments at which you flipped the switch, and off at the other five moments. This is powerful evidence that flipping the switch caused the light to go on.

Analysis Number of ways to select five units (without replacement) out of ten = 252 If the lights were going to be on regardless of what you did at five of the ten occasions, the probability that they would have been on at exactly the five moments you selected is 1/252 = .00397 = 0.397% Beyond a reasonable doubt!

For now, the details of this analysis are not important. The important thing is that we were able to get strong evidence that A (flip switch) caused B (lights on) by using replication (multiple moments in time) and randomization (randomly selecting the moments at which to apply the treatment A). In order to conclude that A causes B, not just at those ten moments but in general (all the time), we must believe that the ten moments in time were not unusual—i.e, that they are representative.

Why the three R’s are important Why replication? Using of multiple subjects or experimental units decreases the chance that the effects we observe could be a product of mere chance. • 1 out of 2: 1/2 = .5 • 5 out of 10: 1/252 = .00397 • 10 out of 20: 1/184756 = .00000541

Why the three R’s are important Why randomization? Using randomization to select the units to receive the alternative treatments ensures that the units receiving A are, on average, no different from those receiving not A. Without randomization, the two groups could be systematically different even before the treatments are applied. Then we could not be sure that the different responses in the two groups were not caused by these prior differences.

Why the three R’s are important Why representativeness? Representativeness allows you to generalize from the units in your study to a broader population. Studies with units that are not representative of the population may be criticized for lacking “external validity.”

Example 2 (continued) Global warming and atmospheric CO2 Should we conclude that increasing CO2 levels have caused global warming? From the standpoint of the three R’s, this will be very difficult to prove or disprove, because it is based on a sample of just one unit (our world). Nevertheless, it is an important question!

Today’s Quiz (not given) • 3 points • List the three R’s. (b) 2 points “Last year, I decided to go on a diet (A). Now I weigh more than I did one year ago (B). Therefore, dieting made me fat.” Is this conclusion correct? What is the counterfactual outcome?

See you on Wednesday.

Please turn off cell phones, pagers, etc.