1 / 17

Statistics is the study of how to: Collect Organize Analyze Interpret reliable data

Statistics is the study of how to: Collect Organize Analyze Interpret reliable data. (pg. 4 ). Much Better : Statistics is the science of reasoning in the face of uncertainty. Statistical Inference (covered in detail in Chap 9).

dextra
Download Presentation

Statistics is the study of how to: Collect Organize Analyze Interpret reliable data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistics is the study of how to: • Collect • Organize • Analyze • Interpret • reliable data (pg. 4) Much Better: Statistics is the science of reasoning in the face of uncertainty

  2. Statistical Inference (covered in detail in Chap 9) • Without inference (also called hypothesis testing) statistics is essentially bookkeeping • The technical phrase for this is Descriptive Statistics • The word “statistics” originally meant “state data” such as records of births, marriages, deaths, etc. The same sense in which we say “to become a statistic” • When combined with probability, data can become predictive: What do we expect will happen in situations we haven’t yet studied. • The technical phrase for this is Inferential Statistics • Statistical inference is never certain: • We can be 95% sure that a new medication is effective but we can never be 100% sure. • We can convict someone “beyond a reasonable doubt” but not beyond any possible doubt A statistical hypothesis is a statement about the probabilities and/or parameters of population of interest Often a statistical hypothesis is put forth precisely to reject it by proving it is extremely improbable by means of experimental tests.

  3. The Three Languages of Statistics Ordinary vernacular: Most statistics questions are word problems Technical English: Many apparently ordinary words are used in statistics but actually have a very precise, specific sense which much be learned. Examples: Success, average, probable, rare, extreme, expected Mathematical Symbols: Often using the Greek alphabet (μ, σ) or the English alphabet with subscripts (xi) or overbars ( ) To learn statistics, you have to learn all three languages and practice going back-and-forth between them. You need to use both sides of your brain as well as simple common sense. The key: Homework, homework, homework!

  4. Mathematical Advice • Greek letters are often used. You should force yourself to write and say them correctly so you will have a sense of mastery: • μ the small Greek m, pronounced “myoo” (Not: “The little u thing”) • σ the small Greek s, pronounced “sigma” (Not: “That thing that looks like an o”) •  the uppercase form of σ • Many subscripts on letters are used. They qualify the letter. Think of something like xi as having last namex and first namei: All the data are from the “family” x but x2 is the 2nd one of them. • Practice saying this as “x two” or “x-sub-two” • σ always denotes standard deviation but σx (“sigma-sub-x”) denotes the particular standard deviation associated with x. Etc... • Be vary careful not to confuse sub- and super-scripts with multiplication: x2 means“x-sub-2”, x∙2 means “x times 2”, x2 means “x-to-the-2-power” or “x-squared”. All three forms appear in statistics. • Function notation is used extensively: P(xi) means: “the probability of xi“ not “P times xi“ • Remember: Conversion from percents to decimal fractions happens by moving the decimal place twice: 13.52% is the same as .1352. • Be careful with single-digit percentages: 5% is .05, not .50 • Always use decimals for multiplying, never percentages

  5. Formulas Are Interpreted Inside-Out This must be mastered or you’ll drown in statistics: You need to be able to read complicated formulas and convert them into a sequence of simple steps. Formulas in statistics are just shorthand for recipes The key is to work inside-out “x-bar equals the sum of all the x-sub-i’s times P-of-x-sub-i” First get your xi’s and P(xi)’s into separate columns The next operation is multiplication, so multiply them The “sigma” means “add them all up” so now add the previous step and set the result equal to x-bar

  6. Stages of a Statistical Study (§1.3) • Design • Identify variables of interest • Think of a variable as a question you ask about each individual • Plan how to get information on those variables • Decide on methodologies/protocols • Obtain/Analyze • Observe vs. treat (Experiment) • Census • Survey • Explore/describe the data • Perform mathematical/probability analysis • Interpret • Draw inferences about the variables of interest • Note concerns and future plans (cf. pg. 20)

  7. Contrasting Concepts (§1.1) • Population vs. sample • A population consists of all individuals of interest • Read the description carefully and precisely to identify exactly what is being studied. Watch for restrictions such as “adults”, “imported cars”, etc. • Identifying the population is the most important step in designing or interpreting a study • A census attempts to measure variables on all individuals in a population • A sample consists of some individuals selected from the population • The best samples are representative of the population; e.g. a simple random sample. • But the word “sample” does not imply the sample was well-chosen. It might be biased. • Individuals vs. variables • Populations and samples are built from individuals • A variable is a quality, attribute, or numerical measure of each individual (height, age, gender, automobile color, etc.) Always think of these as questions • To a statistician, an individual is nothing but the infinite list of answers (values) to all questions (variables) which apply to that individual • Parameter vs. statistic • Both are numerical measures which apply to a bunch of individuals like “average weight”, “median income” • The only difference is Parameters are calculated for Populations, Statistics for Samples

  8. §1.3 Some Additional Concepts • Double-blind Experiment • Neither the experimenters nor the subjects • know which individuals are in the control • group and which are in the treated group. • Placebo: • A fake medical treatment administered to human subjects in the control group to make them believe they are being treated. • Simulation • A numerical facsimile of a real-world experiment, for • example by using random-number generators.

  9. Some Pitfalls of Surveys • The results of surveys can be manipulated in many ways. For instance: • Having too few possible responses • “Have you stopped beating your wife?” • How the question is phrased • “Do you support the continued loss of young American’s lives in order to prevent foreigners from killing one another?” vs. • “Do you support the building of strong democratic institutions in friendly nations in the Middle East?” • The sampling technique (voluntary or random) • “95% of the callers to our radio talk show belive we should impeach the President.” vs. • Questioning a random sample of Americans • Ambiguity in the meaning of words • E.g., “democracy”, “ethical” • The order of questions • “Do you think tax evaders should be severely punished” preceded by: • “Have you ever lied on your income tax form?” • Interviewing respondents who know nothing about the questions • “Most Americans believe that Einstein’s Theory of General Relativity is wrong” • The race, gender, or other attributes of the questioner • A famous example of this was a study done by the US Army after WW II on racial prejudice in the ranks. The study concluded there was none! Later analysis proved the main effect was the reluctance of black soldiers to affirm the existence of racism to white survey-takers. Moral: Be skeptical!

  10. Classifying Variables (§1.1) Quantitative vs. qualitative • Think: Quantity vs. Quality • “Qualitative” is sometimes called “categorical”; i.e putting things in categories • Ask yourself: “What are the possible answers?” If the answers can be yes/no, ... red/blue/green, etc. the question is qualitative. • If the answers can be numerical quantities the question is quantitative • But can be very ambiguous • Is your student ID number quantitative or qualitative?

  11. Classifying Variables: Levels of Measurement (§1.1) • Nominal: “Names”/categories only • Names, phone numbers, colors • Nominal data is always qualitative • Ordinal: “Order” only • Typical: small, medium, large ... good, better, best ... etc. • Class ranks • Interval: Differences make sense but not ratios • Temperature, times, dates • Negative values are possible • Ratio: Ratios make sense • Ask yourself: “Does it make verbal sense to say something of this quantity is twice another or half as much as another?” If it does make sense, the variable is at the ratio level. • True zero: Negatives are (usually) not possible • Salaries, length, height, age, weight

  12. §1.2 pg. 17 Statistical Errors • Undercoverage • Omitting population individuals from • potential sample • Bias • Selecting an unrepresentative sample • Sampling error • Difference between measurements in • a sample vs. the entire population. Undercoverage and bias are non-sampling errors; i.e., mistakes. Sampling error is not a mistake; it’s inevitable and is handled using probability theory.

  13. Simple Random Sample (§1.2) (pg. 12) A simple random sample of n measurements from a population is: A subset of the population selected in a manner such that every sample of size n from the population has an equal chance of being selected. This is a lot stronger than just saying every individual is equally likely to be chosen. Example: Suppose the 28 STA 2023 students are arranged evenly in the 4 rows of S219 and I need to select a random sample of 7 of you. I toss a coin twice and if I get HH, I use the 1st row, HT I use the 2nd row, etc. Each of you individually has the same chance of being in the sample (i.e., a 25% chance) but not every sample of size 7 has an equal chance. For instance if you are chosen, the person behind you can’t be chosen. There are exactly 1,184,040 ways of picking a sample of size 7 out of 28 students. The way I proposed could choose only 4 of them. The other 1,184,036 samples were impossible. This was not a random sample.

  14. The Correct Approach: Use a Random Number Table (Text) or Generator (Excel or Calculator) • The idea is a precise form of writing everyone’s name on a piece of paper, putting them in a hat, then having a blindfolded person select them. • First assign numbers to every individual. • You need to know 3 things: • The size of the population from which you’re sampling • This tells you how many digits you need for the random numbers you generate • If there are 642 individuals, you need 3-digit numbers, if there are 67 individuals you need 2-digit numbers, if there are 8 individuals, you need only 1-digit numbers, etc. • The size of the sample you’re creating • This tells how long to keep generating new random numbers • Whether or not an individual is allowed to be selected more than once • If an individual can only be selected once, repeated numbers are thrown away • The technical term is “sampling with or without replacement”.

  15. Selecting 7 Students from 28 (Continued) I first number the students from 1 to 28. (I could start at 0 instead and only go up to 27 as a different way). The size of the population (the STA 2023 class) is 28 so I need to generate 2-digitnumbers. I go to the Random Number Table in the Appendix and start anywhere I like. Because it’s random, it doesn’t matter where I begin. I’ll start at the 6th row, 3rd column: The digits are originally broken into groups of 5 only to make them easy to read. I can re-group them however I want. I need 2-digit numbers so I’ll re-group them like this: I only need numbers from 1 to 28 so I throw away any outside this range: This gave me only 3 of the numbers I need: 16, 28, and 1. I need 4 more. 94456 48396 73780 06436 86641 69239 57662 80181 94456 48396 73780 06436 86641 69239 57662 80181 94 45 64 83 96 73 78 00 64 36 86 64 16 92 39 57 66 28 01 81 94 45 64 83 96 73 78 00 64 36 86 64 16 92 39 57 66 28 01 81

  16. Selecting 7 Students from 28 (Continued) So I get another row from the table: Because it’s random I can use the next row, or any other row so long as I don’t use the same row twice. I’ll use the 7th row: Re-group to 2 digits: I throw out bad numbers until I get the 4 more I need: This added 10, 9, 2, and 18 to the list. Since a student can only be selected once, if any of these numbers repeated, I would have ignored it and kept going until I had a total of 7 distinct numbers. So my random sample of size 7 are the students with numbers 16, 28, 1, 10, 9, 2, 18 68108 89266 94730 95761 75023 48464 65544 96583 18911 16391 68 10 88 92 66 94 73 09 57 61 75 02 34 84 64 65 54 49 65 83 18 91 11 63 91 68 10 88 92 66 94 73 09 57 61 75 02 34 84 64 65 54 49 65 83 18 91 11 63 91

  17. Use of Random Numbers: Ex. 3 pg. 13Select 30 cars, without replacement, from 500 cars Start at row 15, column 5: 99281 59640 15221 96079 09961 05371 992 815 964 015 221 960 790 996 105 371 992 815 964 015 221 960 790 996 105 371 15 221 105 371 . . . until you get 30 different numbers

More Related