1.3 Experimental Design

1.3 Experimental Design Designing a Statistical Study Identify the variable(s) of interest (the focus) and the population of the study. Develop a detailed plan for collecting data. Make sure the data are representative of the population

Designing a Statistical Study • Collect the data. • Describe the data with descriptive statistics and techniques. • Make decisions using inferential statistics. Identify any possible errors.

Data Collection • Perform an experiment • Do experiment to part of the population • Do nothing to, or give a placebo to the other part of the population. This is the control group. • Take data and compare the results between the two groups. • Example: an experiment could be used to evaluate the benefits of a new drug or medical procedure.

Data Collection • Use a simulation • A simulation uses a mathematical or physical model to reproduce the conditions of a situation or process. • Allows the study of something that is impractical or dangerous to create in real life. • Simulations often save time and money. • Example: the destructive characteristics of a bomb or fire. • Example: 50/50 odds game with pins on a board at a carnival

Data Collection • Take a census • A census is a count or measure of the entire population. • A census provides complete information, but is expensive, time consuming, and difficult to perform. • In the case of destructive testing (think of testing bombs), may not have anything left when done.

Data Collection • Use sampling – this is what you are going to do. • A sampling is a count or measure part of the population • Use the sample to predict the behavior of the population • A sample of the bombs can be tested for potency, and the results can be used to predict the potency of the un-tested bombs.

Examples from the book • Try it Yourself – pg 16 1a) Focus: Effect of exercise on senior citizens. Population: Collection of all senior citizens. 1b) Experiment 2a) Focus: Effect of radiation fallout on senior citizens. Population: Collection of all senior citizens 2b) Sampling

Dictionary Word Chase • What percent of the English words do you know? • Randomly open the book and pick a word • Is this truly random? • This would be like convenience sampling

Dictionary Word Chase • Simple Random Sample (SRS) all the words have to have the same probability of being selected. • Use a number generator (math, probability, randInt(#))to randomly pick a word from all words in dictionary (Webster’s Ninth New Collegiate Dictionary has 13,000,000 words) • Is this feasible?

Dictionary Word Chase • Stratification – use the number generator to pick a letter (stratified on letter) and then randomly select a word • Is this feasible? • Cluster – pick a page (or pages), then a column, then all the words in that(those) column(s)

Dictionary Word Chase • Systematic – randomly select a page, randomly select a starting word, select words at a specified interval. • An advantage to systematic sampling is that it is easy to use.

Resource: • Much of the information for the following slides was taken from: http://stattrek.com/

Data Collection Methods: Pros and Cons • Each method of data collection has advantages and disadvantages. • Resources. When the population is large, a sample survey has a big resource advantage over a census. A well-designed sample survey can provide very precise estimates of population parameters - quicker, cheaper, and with less manpower than a census.

Data Collection Methods: Pros and Cons • Generalizability. Generalizability refers to the appropriateness of applying findings from a study to a larger population. Generalizability requires random selection. If participants in a study are randomly selected from a larger population, it is appropriate to generalize study results to the larger population; if not, it is not appropriate to generalize. Observational studies do not feature random selection; so it is not appropriate to generalize from the results of an observational study to a larger population.

Data Collection Methods: Pros and Cons • Causal inference. Cause-and-effect relationships can be teased out when subjects are randomly assigned to groups. Therefore, experiments, which allow the researcher to control assignment of subjects to treatment groups, are the best method for investigating causal relationships.

Test Your Understanding of This Lesson • Which of the following statements are true? • I. A sample survey is an example of an experimental study. II. An observational study requires fewer resources than an experiment. III. The best method for investigating causal relationships is an observational study. • (A) I only (B) II only (C) III only (D) All of the above. (E) None of the above.

Test Your Understanding of This Lesson • Solution • The correct answer is (E). In a sample survey, the researcher does not assign treatments to survey respondents. Therefore, a sample survey is not an experimental study; rather, it is an observational study. An observational study may or may not require fewer resources (time, money, manpower) than an experiment. The best method for investigating causal relationships is an experiment - not an observational study - because an experiment features randomized assignment of subjects to treatment groups.

Survey Sampling Methods • Probability vs. Non-Probability Samples • As a group, sampling methods fall into one of two categories. • Probability samples. With probability sampling methods, each population element has a known (non-zero) chance of being chosen for the sample. • Non-probability samples. With non-probability sampling methods, we do not know the probability that each population element will be chosen, and/or we cannot be sure that each population element has a non-zero chance of being chosen.

Survey Sampling Methods • Non-probability samples. With non-probability sampling methods, we do not know the probability that each population element will be chosen, and/or we cannot be sure that each population element has a non-zero chance of being chosen. • Non-probability sampling methods offer two potential advantages - convenience and cost. The main disadvantage is that non-probability sampling methods do not allow you to estimate the extent to which sample statistics are likely to differ from population parameters. Only probability sampling methods permit that kind of analysis.

Non-Probability Sampling Methods • Two of the main types of non-probability sampling methods are voluntary samples and convenience samples. • Voluntary sample. A voluntary sample is made up of people who self-select into the survey. Often, these folks have a strong interest in the main topic of the survey. Suppose, for example, that a news show asks viewers to participate in an on-line poll. This would be a volunteer sample. The sample is chosen by the viewers, not by the survey administrator.

Non-Probability Sampling Methods • Convenience sample. A convenience sample is made up of people who are easy to reach. Consider the following example. A pollster interviews shoppers at a local mall. If the mall was chosen because it was a convenient site from which to solicit survey participants and/or because it was close to the pollster's home or business, this would be a convenience sample.

Probability Sampling Methods • The main types of probability sampling methods are simple random sampling, stratified sampling, cluster sampling, multistage sampling, and systematic random sampling. The key benefit of probability sampling methods is that they guarantee that the sample chosen is representative of the population. This ensures that the statistical conclusions will be valid.

Probability Sampling Methods • Simple random sampling. Simple random sampling refers to any sampling method that has the following properties. The population consists of N objects. • The sample consists of n objects. • If all possible samples of n objects are equally likely to occur, the sampling method is called simple random sampling. • There are many ways to obtain a simple random sample. One way would be the lottery method. Each of the N population members is assigned a unique number. The numbers are placed in a bowl and thoroughly mixed. Then, a blind-folded researcher selects n numbers. Population members having the selected numbers are included in the sample.

Probability Sampling Methods • Stratified sampling. With stratified sampling, the population is divided into groups, based on some characteristic. Then, within each group, a probability sample (often a simple random sample) is selected. In stratified sampling, the groups are called strata. As a example, suppose we conduct a national survey. We might divide the population into groups or strata, based on geography - north, east, south, and west. Then, within each stratum, we might randomly select survey respondents.

Probability Sampling Methods • Cluster sampling. With cluster sampling, every member of the population is assigned to one, and only one, group. Each group is called a cluster. A sample of clusters is chosen, using a probability method (often simple random sampling). Only individuals within sampled clusters are surveyed. Note the difference between cluster sampling and stratified sampling. With stratified sampling, the sample includes elements from each stratum. With cluster sampling, in contrast, the sample includes elements only from sampled clusters.

Probability Sampling Methods • Systematic random sampling. With systematic random sampling, we create a list of every member of the population. From the list, we randomly select the first sample element from the first k elements on the population list. Thereafter, we select every kth element on the list. This method is different from simple random sampling since every possible sample of n elements is not equally likely.

Probability Sampling Methods • Multistage sampling. With multistage sampling, we select a sample by using combinations of different sampling methods. For example, in Stage 1, we might use cluster sampling to choose clusters from a population. Then, in Stage 2, we might use simple random sampling to select a subset of elements from each chosen cluster for the final sample.

Test Your Understanding • An auto analyst is conducting a satisfaction survey, sampling from a list of 10,000 new car buyers. The list includes 2,500 Ford buyers, 2,500 GM buyers, 2,500 Honda buyers, and 2,500 Toyota buyers. The analyst selects a sample of 400 car buyers, by randomly sampling 100 buyers of each brand. • Is this an example of a simple random sample? • (A) Yes, because each buyer in the sample was randomly sampled. (B) Yes, because each buyer in the sample had an equal chance of being sampled. (C) Yes, because car buyers of every brand were equally represented in the sample. (D) No, because every possible 400-buyer sample did not have an equal chance of being chosen. (E) No, because the population consisted of purchasers of four different brands of car.

Test Your Understanding • Solution • The correct answer is (D). A simple random sample requires that every sample of size n (in this problem, n is equal to 400) have an equal chance of being selected. In this problem, there was a 100 percent chance that the sample would include 100 purchasers of each brand of car. There was zero percent chance that the sample would include, for example, 99 Ford buyers, 101 Honda buyers, 100 Toyota buyers, and 100 GM buyers. Thus, all possible samples of size 400 did not have an equal chance of being selected; so this cannot be a simple random sample.

Test Your Understanding of This Lesson • The fact that each buyer in the sample was randomly sampled is a necessary condition for a simple random sample, but it is not sufficient. Similarly, the fact that each buyer in the sample had an equal chance of being selected is characteristic of a simple random sample, but it is not sufficient. The sampling method in this problem used random sampling and gave each buyer an equal chance of being selected; but the sampling method was actually stratified random sampling. • The fact that car buyers of every brand were equally represented in the sample is irrelevant to whether the sampling method was simple random sampling. Similarly, the fact that population consisted of buyers of different car brands is irrelevant.

Bias in Survey Sampling • In survey sampling, bias refers to the tendency of a sample statistic to systematically over- or under-estimate a population parameter.

Bias Due to Unrepresentative Samples • A good sample is representative. This means that each sample point represents the attributes of a known number of population elements. • Bias often occurs when the survey sample does not accurately represent the population. The bias that results from an unrepresentative sample is called selection bias. Some common examples of selection bias are described below.

Bias Due to Unrepresentative Samples • Undercoverage. Undercoverage occurs when some members of the population are inadequately represented in the sample. A classic example of undercoverage is the Literary Digest voter survey, which predicted that Alfred Landon would beat Franklin Roosevelt in the 1936 presidential election. The survey sample suffered from undercoverage of low-income voters, who tended to be Democrats.How did this happen? The survey relied on a convenience sample, drawn from telephone directories and car registration lists. In 1936, people who owned cars and telephones tended to be more affluent. Undercoverage is often a problem with convenience samples.

Bias Due to Unrepresentative Samples • Nonresponse bias. Sometimes, individuals chosen for the sample are unwilling or unable to participate in the survey. Nonresponse bias is the bias that results when respondents differ in meaningful ways from nonrespondents. The Literary Digest survey illustrates this problem. Respondents tended to be Landon supporters; and nonrespondents, Roosevelt supporters. Since only 25% of the sampled voters actually completed the mail-in survey, survey results overestimated voter support for Alfred Landon.The Literary Digest experience illustrates a common problem with mail surveys. Response rate is often low, making mail surveys vulnerable to nonresponse bias.

Bias Due to Unrepresentative Samples • Voluntary response bias. Voluntary response bias occurs when sample members are self-selected volunteers, as in voluntary samples. An example would be call-in radio shows that solicit audience participation in surveys on controversial topics (abortion, affirmative action, gun control, etc.). The resulting sample tends to overrepresent individuals who have strong opinions.

Bias Due to Unrepresentative Samples • Random sampling is a procedure for sampling from a population in which (a) the selection of a sample unit is based on chance and (b) every element of the population has a known, non-zero probability of being selected. Random sampling helps produce representative samples by eliminating voluntary response bias and guarding against undercoverage bias. All probability sampling methods rely on random sampling.

Bias Due to Measurement Error • A poor measurement process can also lead to bias. In survey research, the measurement process includes the environment in which the survey is conducted, the way that questions are asked, and the state of the survey respondent.

Bias Due to Measurement Error • Response bias refers to the bias that results from problems in the measurement process. Some examples of response bias are given below. • Leading questions. The wording of the question may be loaded in some way to unduly favor one response over another. For example, a satisfaction survey may ask the respondent to indicate where she is satisfied, dissatisfied, or very dissatified. By giving the respondent one response option to express satisfaction and two response options to express dissatisfaction, this survey question is biased toward getting a dissatisfied response.

Bias Due to Measurement Error • Social desirability. Most people like to present themselves in a favorable light, so they will be reluctant to admit to unsavory attitudes or illegal activities in a survey, particularly if survey results are not confidential. Instead, their responses may be biased toward what they believe is socially desirable.

Test Your Understanding • Which of the following statements are true? • I. Random sampling is a good way to reduce response bias. II. To guard against bias from undercoverage, use a convenience sample. III. Increasing the sample size tends to reduce survey bias. IV. To guard against nonresponse bias, use a mail-in survey. • (A) I only (B) II only (C) III only (D) IV only (E) None of the above.

Test Your Understanding • The correct answer is (E). None of the statements is true. Random sampling provides strong protection against bias from undercoverage bias and voluntary response bias; but it is not effective against response bias. A convenience sample does not protect against undercoverage bias; in fact, it sometimes causes undercoverage bias. Increasing sample size does not affect survey bias. And finally, using a mail-in survey does not prevent nonresponse bias. In fact, mail-in surveys are quite vulnerable to nonresponse bias.

1.3 Experimental Design