1 / 56

Stat 155, Section 2, Last Time

Stat 155, Section 2, Last Time. Linear Regression Fit a line to data Least Squares Prediction Residual Diagnostic Plot Producing Data How to Sample? History of Presidential Election Polls. Reading In Textbook. Approximate Reading for Today’s Material: Pages 198-210, 218-225

malana
Download Presentation

Stat 155, Section 2, Last Time

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Stat 155, Section 2, Last Time • Linear Regression • Fit a line to data • Least Squares Prediction • Residual Diagnostic Plot • Producing Data • How to Sample? • History of Presidential Election Polls

  2. Reading In Textbook Approximate Reading for Today’s Material: Pages 198-210, 218-225 Approximate Reading for Next Class: Pages 231-240, 256-257

  3. Common Problem Adding lines to an Excel Plot E.g. Textbook problem 2.17 • Plot Data • Add line with “Add trendline” • Add line: y = 35+.5x • Explicitly add least squares fit line

  4. Chapter 3: Producing Data (how this is done is critical to conclusions) Section 3.1: Statistical Settings 2 Main Types: • Observational Study • Designed Experiment

  5. Producing Data 2 Main Types: • Observational Study • Experiment (Make Changes, & Study Effect) Apply “treatment” to individuals & measure “responses” e.g. Clinical trials for drugs, agricultural trials (safe? effective?) (max yield?)

  6. Producing Data 2 Main Types: • Observational Study • Experiment (common sense) Caution: Thinking is required for each. Both if you do statistics & if you need to understand somebody else’s results

  7. Helpful Distinctions (Critical Issue of “Good” vs. “Bad”) • Observational Studies: • Anecdotal Evidence Idea: Study just a few cases Problem: may not be representative (or worse: only considered for this reason) e.g. Cures for hiccups Key Question: how were data chosen? (early medicine: this gave crazy attempts at cures)

  8. Helpful Distinctions • Observational Studies: B. Sampling Idea: Seek sample representative of population Challenge: How to sample? (turns out: not easy)

  9. How to sample? History of Presidential Election Polls During Campaigns, constantly hear in news “polls say …” How good are these? Why? • Landon vs. Roosevelt Literary Digest Poll: 43% for R Result: 62% for R What happened? Sample size not big enough? 2.4 million Biggest Poll ever done (before or since)

  10. Bias in Sampling Bias: Systematically favoring one outcome (need to think carefully) Selection Bias: Addresses from L. D. readers, phone books, club memberships (representative of population?) Non-Response Bias: Return-mail survey (who had time?)

  11. How to sample? • Presidential Election (cont.) Interesting Alternative Poll: Gallup: 56% for R (sample size ~ 50,000) Gallup of L.D. 44% for R ( ~ 3,000) Predicted both correct result (62% for R), and L. D. error (43% for R)! (what was better?)

  12. Improved Sampling Gallup’s Improvements: • Personal Interviews (attacks non-response bias) (ii) Quota Sampling (attacks selection bias)

  13. Quota Sampling Idea: make “sample like population” So surveyor chooses people to give: • Right % male • Right % “young” • Right % “blue collar” • … This worked well, until …

  14. How to sample? • Dewey Truman sample size Crossley 50% 45% Gallup 50% 44% 50,000 Roper 53% 38% 15,000 Actual 45% 50% - Note: Embarassing for polls, famous photo of Truman + Headline “Dewey Wins”

  15. What went wrong? Problem: Unintentional Bias (surveyors understood bias, but still made choices) Lesson: Human Choice can not give a Representative Sample Surprising Improvement: Random Sampling Now called “scientific sampling” Random = Scientific???

  16. Random Sampling Key Idea: “random error” is smaller than “unintentional bias”, for large enough sample sizes How large? Current sample sizes: ~1,000 - 3,000 Note: now << 50,000 used in 1948. So surveys are much cheaper (thus many more done now….)

  17. Random Sampling How Accurate? • Can (& will) calculate using “probability” • Justifies term “scientific sampling” • 2nd improvement over quota sampling

  18. And now for something completely different Recall Distribution of majors of students in this course:

  19. And now for something completely different A man goes into a drugstore and asks the pharmacist if he can give him something for the hiccups. The pharmacist promptly reaches out and slaps the man's face."What did you do that for?" the man asks.

  20. And now for something completely different What did you do that for?" the man asks. "Well, you don't have the hiccups anymore, do you?“ The man says, "No, but my wife out in the car still does!"

  21. And now for something completely different An elderly woman went into the doctor's office. When the doctor asked why she was there, she replied, "I'd like to have some birth control pills." Taken aback, the doctor thought for a minute and then said, "Excuse me, Mrs. Smith, but you're 75 years old. What possible use could you have for birth control pills?" The woman responded, "They help me sleep better."

  22. And now for something completely different The woman responded, "They help me sleep better." The doctor thought some more and continued, "How in the world do birth control pills help you to sleep?" The woman said, "I put them in my granddaughter's orange juice and I sleep better at night."

  23. Random Sampling How Accurate? • Can (& will) calculate using “probability” • Justifies term “scientific sampling” • 2nd improvement over quota sampling

  24. Random Sampling What is random? Simple Random Sampling: Each member of population is equally likely to be in sample Key Idea: Different from “just choose some”

  25. Random Sampling An old (but still fun?) experiment: Choose a number among 1,2,3,4 Old typical results: about 70% choose “3” (perhaps you have seen this before…) Main lesson: human choice does not give “equally likely” (i.e. random sample)

  26. Random Sampling How to choose a random sample? Old Approaches: • Random Number Table • Roll Dice Modern Approach: • Computer Generated

  27. Random Sampling EXCEL generation of random samples: http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155Eg16.xls Goal 1: Generate Random Numbers EXCEL approaches: • RAND function • Tools  Data Analysis  Random Number Generation

  28. EXCEL Random Sampling Goal 2: Randomly Reorder List EXCEL approach: • Highlight block with list & random num’s • Sort whole thing on numbers Goal 3: Random Sample from List • Choose 1st subset from random re-order • Since, each equally likely in each spot

  29. EXCEL Details RAND: • Not available among “Statistical” functions • But can find on “All” menu • Note no (explicit) inputs • Just put in desired cell • Drag downwards for several random #s • Caution: these change on each re-comp. • Thus not recommended for this

  30. EXCEL Details Tools  Data Analysis  Random Number Generation : • Set: # Variables: 1 Distribution: Uniform (over [0,1]) • Generates Fixed List (doesn’t change with re-computation) (note entries are “just numbers”) • Thus stable for later interpretation • Recommended for random sample choice

  31. EXCEL Details Sorting Lists: • Highlight Block with Both: • Names to sort • Random numbers • Data  Sort  Choose Column • Result is random re-ordering of List

  32. Random Sampling HW HW: C7: For the letters A – L, use EXCEL to: (a) Put in a random order. (b) Choose a random sample of 6. (Hints: for (a), want each equally likely, for (b), reorder, and choose a subset)

  33. Random Sampling HW Interesting Question: What is the % of Male Students at UNC? (Your chance of date, or take 100% - to get your chance) HW: C8: Print Class Handout http://stat-or.unc.edu/webspace/postscript/marron/Teaching/stor155-2007/Stor155HWC8.doc

  34. Random Sampling HW Notes on HW C8: • 3 dumb ways to sample, 1 good one • Goal is to learn about sampling, Not “get right answer” • Part 1, put symbol for yourself, Ms and Fs for others • Put both count & % (%100 x count / 25) • Part 2, “tally” is: • Part 4, student phone directory available in Student Union?

  35. Random Sampling HW Notes on HW C8, • Hints on Part 4: • For each draw, first draw a “random page” • Tools  Data Analysis  Random Number Generation  Uniform is one way to do this • In “Uniform”, you need to set “Parameters”, to 0 and “number of pages”. • This gives a random decimal, to get an integer, round up, using CEILING • In CEILING, set “significance” to 1.

  36. Random Sampling HW Notes on HW C8, • Hints on Part 4 (cont.): • Next Choose Random Column • Next Choose Random Name • Caution: Different numbers on each page. • Challenge: still make equally likely • Approach: choose larger number. • Approach: when not there, just toss it out • Approach: then do a “redraw” • Also redraw if can’t tell gender

  37. More On Surveys More Common Sense: How you ask the question makes a big difference HW: 3.57, 3.59

  38. And Now for Something Completely Different Extreme Bicycling Need a bicycle helmet there?

  39. And Now for Something Completely Different

  40. And Now for Something Completely Different

  41. And Now for Something Completely Different

  42. And Now for Something Completely Different

  43. More about Sampling The “simple random sample” (recall “each equally likely”) can be expensive (e.g. nationwide political poll, collected by personal interview) So there are many cheaper variations: • Stratified Sampling • Multi Stage Sampling • See text • And there are many others as well

  44. Sampling for Experiments • Experiments (Recall I was Observational Studies, Now take similar look at II) Terminology: “treatments” are applied to “individuals” i.e. to “subjects” i.e. to “experimental units”

  45. Sampling for Experiments A “treatment” is: a combination of “levels”, of explanatory variables (quantities), called “factors”. E.g. Medicine, Agriculture, …

  46. Sampling for Experiments Agriculture Example: Study how plant growth depends on: fertilizer and water So plants = “experiment’l units”, i.e. “subjects” “Factors” are fertilizer and water, Each plant gets some “level” of each.

  47. HW on Sampling Terminology HW: 3.9 3.11

  48. Design of Experiments The “design” of an experiment is the assignment of levels and treatments to experimental units (just as “choice of sample” was critical for sampling, this is too. There is a huge literature on this, including current research)

  49. Design of Experiments Key Design Issues: • Control Idea: Eliminate “lurking variable” effects, by comparing treatments on groups of similar experimental units.

  50. Controlled Experiments Common Type: compare “treatment” with “placebo”, a “sham treatment” that controls for psychological effects (think you are better, just because you are treated, so you are better…) Called a “blind” experiment

More Related