Objectives (IPS Chapters 3.1). Design of experimentsAnecdotal and available dataComparative experimentsRandomizationRandomized comparative experimentsCautions about experimentationMatched pairs designsBlock designs . Obtaining data. Available data are data that were produced in the past for
1. Producing Data Design of Experiments IPS Chapters 3.1
2. Objectives (IPS Chapters 3.1) Design of experiments
Anecdotal and available data
Randomized comparative experiments
Cautions about experimentation
Matched pairs designs
3. Obtaining data Available data are data that were produced in the past for some other purpose but that may help answer a present question inexpensively. The library and the Internet are sources of available data.
Look for primary data…
For example, government statistical offices are the primary source for demographic, economic, and social data (visit the Fed-Stats site at www.fedstats.gov).
4. A new 4-letter (or 9-letter) Word: Anecdotal Anecdotal evidence refers to isolated stories / incidents /situations, etc. These are the kinds of things we hear from friends, case-studies in the press, “crazy coincidences”, etc.
Anecdotes are based on selected individual cases, which we tend to remember because they are often unusual in some way. There is a very good chance that anecdotes are in no way representative of any larger group of cases.
Beware of drawing conclusions from personal experiences or hearsay!!
5. Anecdote: Smoking doesn't cause lung cancer, "My grandmother lived to 95, smoked constantly, and didn't die of lung cancer."
The data proves than not all smokers die of lung cancer, but fails to see the data that 90 percent of the lung cancer cases are smokers.
People wearing top hats seemed to live longer. This fact, was supported by a great deal of anecdotal evidence. “My grandpa wore a top hat and lived until 97 years old!”
People that wear top hats are usually richer, therefore can afford better food, shelter, sanity, and medical resources. A wider study including people's income identified the false claim.
6. Population versus sample Sample: The part of the population we actually examine and for which we do have data.
How well the sample represents the population depends on the sample design.
A statistic is a number describing a characteristic of a sample.
8. Observational studies vs. Experiments Observational studies are essential sources of data on a variety of topics. However, if our goal is to understand cause and effect, then experiments are the only source of fully convincing data.
Two variables are confounded when their effects on a response variable cannot be distinguished from each other. Similar idea to a lurking variable.
Example: If we simply observe cell phone use and brain cancer, any effect of radiation on the occurrence of brain cancer is confounded with lurking variables such as age, occupation, and place of residence
Well designed experiments take steps to defeat confounding.
9. (Terminology) The individuals in an experiment are the experimental units. If they are human, we call them subjects.
In an experiment, we do something to the subject and measure the response. The “something” we do is a called a treatment, or factor.
The factor may be the administration of a drug.
One group of people may be placed on a diet/exercise program for six months (treatment), and their blood pressure (response variable) would be compared with that of people who did not diet or exercise.
10. If the experiment involves giving two different doses of a drug, we say that we are testing two levels of the factor.
A response to a treatment is statistically significant if it is larger than you would expect by chance (due to random variation among the subjects). We will learn how to determine this later.
11. ** Comparative experiments Experiments are comparative in nature: We compare the response to a treatment to:
No treatment (a control)
Older / original treatment (another form of control)
A placebo (another form of control)
Any combination of the above
*** A control is a group to which the new treatment is NOT administered. It serves as a reference mark for comparison (e.g., a group of subjects that do not receive any drug or pill of any kind).
A placebo is a fake treatment, such as a sugar pill. This is to test the hypothesis that the response is due to the actual treatment and not to the subject’s belief that they were treated.
*** Without a control group, you should be very suspicious about any conclusions drawn by the experiment.
12. About the placebo effect The “placebo effect” is an improvement in health not due to any treatment, but only to the patient’s belief that he or she will improve.
The “placebo effect” is not understood, but it is believed to have therapeutic results on up to a whopping 35% of patients.
It can sometimes ease the symptoms of a variety of ills, from asthma to pain to high blood pressure, and even to heart attacks.
An opposite, or “negative placebo effect,” has been observed when patients believe their health will get worse.
14. Caution about experimentation
15. Let’s play: Find the Bias What toothpaste do people prefer? Experiment: In order to determine which brand of toothpaste Americans perfer, researchers wait outside of Walgreens and ask everyone who bought toothpaste, which brand they preferred. What are some biases present in this experiment?
Colgate is on sale
Crest just had an advertising blitz during the Superbowl
Oprah mentioned in an interview that she likes Aquafresh
17. Assume that all data is biased – it’s just a matter of degree…
Survey: Obtained 36,000 physician office fax numbers, delivered ~16,000 faxes and received ~700 replies. Their respondents were mostly private practice physicians, and mostly mid-career. .” (Source: http://www.dpmafoundation.org/physician-attitudes-on-medicine.html).
The Doctor Patient Medical Association (DPMA) and the Patient Power Alliance (PPA) work to repeal health care reform and call themselves a "a nonpartisan association of doctors and patients dedicated to preserving free choice in medicine." The organization is a member of the National Tea Party Federation and the "American Grassroots Coalition
Note which magazine published this article - hardly a fly-by-night magazine!
19. Lack of realism aka ‘Validity’
20. Designing “controlled” experiments Fisher found that the data from experiments that had been going on for decades was basically worthless because of poor experimental design.
Fertilizer had been applied to a field one year and not the following year, in order to compare the yield of grain produced with v.s. without the fertilizer.
What are the flaws in this research methodology?
It may have rained more or been sunnier during different years.
The seeds used may have differed between years as well.
In one case, fertilizer was applied to one field and not applied to a nearby field in the same year.
The two fields might have had different soil, water, drainage, and history (that is, the two fields may have been farmed differently in previous years).
Too many factors affecting the results were “uncontrolled.”
** Any suggestions for a valid control?
21. Fisher’s solution: In the same field and same year, apply fertilizer to randomly spaced plots within the field. Analyze plants from similarly treated plots together.
This minimizes the effect of variation within the field, in drainage and soil composition on yield, as well as controls for weather.
22. The control and experimental treatments must be given to similar groups of individuals An experiment to compare the growth rates of two species of corn:
Be sure that both varieties of corn are planted in equally fertile soil.
Testing one cancer treatment vs another:
Both treatments must be given to patients with similar severity of disease.
Eg: Don’t give one of the drugs to a more seriously ill group of patients
23. * Randomization Any decent study will randomize which subjects are in the control group vs which are in the experimental group.
For example, if you are comparing a new cancer treatment vs the ‘older’ treatment, which patients get the new treatment and which get the older treatment must be decided at random.
24. Principles of Experimental Design The KEY ideas of experimental design:
Control the effects of lurking variables on the response, by comparing the treatment you are interested in with a second group who either receives a placebo, or a different treatment.
Randomize – use some kind of randomization technique to assign subjects to treatments – in other words, the researcher does not pick who goes in the treatment group and who goes in the control group.
Replicate each treatment on enough subjects to reduce chance variation in the results.
Blind: This is another major factor – particularly in medical trials. Neither the experimenter nor the subjects should be aware which subjects are receiving the experimental treatment and which subjects are receiving the control treatment.
26. Completely randomized designs
28. ** Block aka “stratified” designs
29. A stratified experiment:
31. Matched pairs designs
32. Matched pairs in medical experimentation Medical studies will frequently stratify on matched groups such as
This helps minimize/mitigate the effect of lurkng variables
See Jama Claudication study from course page.
33. Blinding Eg: In hospital studies, a patient is sometimes given a bar-code which they wear on a wristband. The medications also are not labeled, and instead have a bar-code. The researcher/nurse giving the medication will scan the wristband and match it with an appropriate medication bar-code. So neither the patient nor the researcher knows if they are getting the treatment or the placebo/control.
If the patient doesn’t know if they are receiving the treatment vs the control, the study is said to be ‘blinded’.
If the researcher/physician also doesn’t know, the study is said to be double-blinded. This is much more ideal.
34. Weaknesses in experimental design There is no such thing as the perfect experiment. Your goal is to decide whether any of the limitations in the design are significant enough to limit the validity of the conclusions.
Outside of reputable journals, this is extremely common!
35. Example of a randomized, double-blind controlled trial A major cancer center is excited to hear about a promising new treatment for pancreatic cancer. So:
They contact all of the patients in their files with this condition.
They find 408 patients who agree to be in their trial.
They exclude from their trial 11 patients who say they moving out of state since that group cannot be monitored by the center.
They exclude 43 others from the trial because they have other significant medical ilnesses which would be confounding
Of the remaining group, they stratify based on gender and age
Stratification: Now they have 354 patients remaining.
Gender: 190 are female and 164 are male.
Age: They further stratify into the following age groups: 20-40 / 40-60 / 60-80
Randomization and Control: Within each of these 6 groups, the patients are randomly assigned to receive their usual treatment (the control group) vs the new treatment (the experimental group)
Double Blind: Neither the patients nor the physicans know which patient is receiving which treatment. They will not find out until the study has been completed.
Very good, but there are still some flaws in the design of this study…
36. Limitations/Flaws in the pancreatic cancer study? Stage of cancer
Choice of age groups
Lack of placebo control (ethics)
Why couldn’t we use a placebo as the control?
If the new treatment showed staggeringly effective results, the experiment would be halted and all patients would probably be changed to the new drug. This is very rare, however.
37. Producing Data Sampling designs – Intro to Inference IPS Chapters 3.2 and 3.3
38. Objectives (IPS Chapters 3.2 and 3.3) Sampling designs; Toward statistical inference
Simple random samples
Caution about sampling surveys
Population versus sample
Toward statistical inference
39. * Population vs Sample A political scientist wants to know what % of college students consider themselves conservatives
An automaker highers a market research firm to learn what % of adults 18-35 recall seeing TV ads for a new SUV
Government economists want to know about average household income
It would be impossible to ask these questions of every single college student / adult / household. Instead, we ask a sample of college students / adults / households.
The population refers to the entire group that we want information about
The sample is the small section of the population that we actually examine
The GOAL of a study is for the information we derive from the sample to generalize accurately to the population as accurately as possible.
Identify the population of interest for the three examples above.
40. If you’re biased and you know it… Biases are everywhere
It is very important to be aware of the different types of bias and where they tend to show up
41. 1. Convenience sampling: Just ask whoever is around.
Example: “Man on the street” survey (cheap, convenient, often quite opinionated, or emotional => now very popular with TV “journalism”)
Which men, and on which street?
Ask about gun control or legalizing marijuana “on the street” in Berkeley or in some small town in Idaho and you would probably get totally different answers.
Even within an area, answers would probably differ if you did the survey outside a high school or a country western bar.
Bias: Opinions limited to individuals who are present.
42. 2. Voluntary Response Sampling:
Individuals choose to be involved. These samples are very susceptible to being biased because different people are motivated to respond or not. Often called “public opinion polls,” these are not considered valid or scientific.
Bias: Sample design systematically favors a particular outcome.
44. Ideally: RANDOM Sampling
Individuals are randomly selected. No one group should be over-represented.
45. * Simple random samples A Simple Random Sample (SRS) is made of randomly selected individuals. Each individual in the population has the same probability of being in the sample. All possible samples of size n have the same chance of being drawn. - You’ll see ‘SRS’ frequently throughout the course.
46. A sample that is not random is essentially useless.
47. Stratified samples A common, and important form of random sampling:
A stratified random sample is essentially a series of SRSs performed on subgroups of a given population. The subgroups are chosen to contain all the individuals with a certain characteristic. For example:
Divide the population of DePaul students into males and females.
Divide the population of California by major ethnic group.
Divide the counties in America as either urban or rural based on population density.
* The SRS taken within each group in a stratified random sample need not be of the same size. For example:
A stratified random sample of students may end up with 100 male and 150 female students
A stratified random sample of a total of 100 Californians, will likely have a higher percentage of Hispanics, than, say, a SRS taken in Arkansas.
49. * Common biases in surveys: Nonresponse Bias: People who feel they have something to hide or who don’t like their privacy being invaded probably won’t answer. Yet they are absolutely part of the population under study!
Remember that the most important objective of a good sample is for that sample to accurately represent the population.
Response Bias: Fancy term for lying. This is particularly important when the questions are very personal (e.g., “How much do you drink?”)
Wording effects Bias: Questions worded like “Do you agree that it is awful that…” are prompting you to give a particular response.
Etc, Etc ? Bias can show up in all kinds of unexpected ways.
51. 1. To assess the opinion of students at the Ohio State University about campus safety, a reporter interviews 15 students he meets walking on the campus late at night who are willing to give their opinion.
? What is the sample here? What is the population? Why?
All those students walking on campus late at night
All students at universities with safety issues
The 15 students interviewed
All students approached by the reporter
52. Flaws? To assess the opinion of students at the Ohio State University about campus safety, a reporter interviews 15 students he meets walking on the campus late at night who are willing to give their opinion.
People who feel safe are more likely to walk out at night. People who don’t feel safe probably won’t do so as often. They would be under-represented in the sample.
Non-Response: Entirely possible that some people would hurry away or refuse to answer if someone approaches them with a question at night.
53. Population versus sample Sample: The part of the population we actually examine and for which we do have data.
How well the sample represents the population depends on the sample design.
A statistic is a number describing a characteristic of a sample.
54. ** Statistical Inference Inferential statistics refers to the process of drawing conclusions about a population based on the information determined from a sample.
Inference is the bread-and-butter of statistics.
However, ALL of the following concepts are very important to bear in mind:
Your estimate of the population is only as good as your sampling design. ? Work hard to eliminate biases.
Trying to determine relationship between # beers and BAC by sampling a group of NFL athletes will not properly generalize to the population of all Americans.
** Your sample is only an estimate—and if you randomly sampled again you would probably get a somewhat different result.
Calculate the average height of 20 people. Then do it again – you will almost certainly get a different result.
The bigger the sample the better. However, big sample sizes have their own problems.
Eg. Studies with large sample sizes can become very expensive
55. “Yeah, yeah” Often we look at topics that seem like “common sense” and say to ourselves, ‘yeah, yeah, I get that’. BE CAREFUL! Often there are many ways in which we think you understand something, but there still remain (many!!) gaps in our knowledge and understanding.
This is not only true of statistics – happens in all kinds of places/studies.
However, it is a particularly common pitfall in stats.
Be sure to do lots of problems.
56. * Sampling variability Recall that your sample is only an estimate. Each time we take a random sample from a population, we are likely to get a different set of individuals and, therefore, calculate a different statistic. This is called sampling variability.
The good news is that, if we take lots of random samples of the same size from a given population, the variation from sample to sample—the sampling distribution—will follow a predictable pattern. All of statistical inference is based on this knowledge. Understanding this concept (not difficult) is very important to understanding a slightly trickier concept coming up later in the course, the Central Limit Theorem.Understanding this concept (not difficult) is very important to understanding a slightly trickier concept coming up later in the course, the Central Limit Theorem.
57. Be sure to understand what is meant by a ‘sampling distribution’… Refers to the results taken from many, many samples.
A sample of the heights of 20 college-aged women would give one result. Of course, this is only one sample. If we repeat this sample again, we’d almost certainly obtain a different result.
If we repeat this sample again, we will have 2 results. If we repeat this sample 100 times, we will have 100 results.
If we plot these 100 results on a histogram, we have a sampling distribution.
58. The variability of a statistic is described by the spread of its sampling distribution. Recall that each sample will give a different result. So what if you took many, many samples of people’s heights and calculated the mean from each sample?
Answer: You would get a distribution of sample values. (Specifically, you’d get a distribution of sample means).
In a later lecture, we will show that if you graphed all of these means on a histogram, they will turn out to follow a normal distribution pattern.
Recall that a normal distribution has a spread (aka variability) which we quantify as the standard deviation. How wide the spread (sd) is depends on the sampling design and the sample size. Larger sample sizes lead to lower spread. Sampling variability contd
59. Statistical Variability contd
Statistics from large samples are almost always close estimates of the true population parameter.
Of course, this only applies to random samples. If there is bias, the results will NOT be accurate!
I repeat: The higher the bias the less trust you can place in the results.
Remember: There will always be bias. A good study, however, will do everything they can to minimize it.
60. A very effective way to reduce sampling variability is to use a larger sample size However, large samples are not always attainable.
Sometimes the cost, difficulty, or preciousness of what is studied limits drastically any possible sample size
Blood samples/biopsies: No more than a handful of repetitions acceptable. We often even make do with just one.
Opinion polls have a limited sample size due to time and cost of operation. During election times though, sample sizes are increased for better accuracy.
There are several techniques used to get around the problem of a limited sample size. One is discussed on the following slide.
61. Capture–recapture sampling Repeated sampling can be used to estimate the size N of a population (e.g., animals). Here is an example of capture-recapture sampling:
62. Producing Data Ethics IPS Chapters 3.4
63. Objectives (IPS Chapters 3.4) Ethics
Institutional review boards
Behavioral and social science experiments
64. Institutional review boards (IRBs) An organization that carries out the study often has an institutional review board that reviews all planned studies in advance in order to protect the subjects from possible harm.
Every medical school / teaching hospital in the United States has an IRB.
“The purpose of an institutional review board is to protect the rights and welfare of human subjects (including patients) recruited to participate in research activities”
The institutional review board:
reviews the plan of study
can require changes
reviews the consent form
monitors progress at regular intervals
Sometimes they will close a study right in the middle!
65. Informed consent All subjects must give their informed consent before data are collected.
Subjects must be informed in advance about the nature of a study and any risk of harm it might bring.
Subjects must then consent in writing.
Who can’t give informed consent?
very young children
people with mental disorders
66. Confidentiality All individual data must be kept confidential. Only statistical summaries may be made public.
Confidentiality is not the same as anonymity. Anonymity prevents follow-ups to improve non-response or inform subjects of results.
Separate the identity of the subjects from the rest of the data immediately!
67. Clinical trials Clinical trials study the effectiveness of medical treatments on actual patients – these treatments can harm as well as heal.
Points for a discussion:
Randomized comparative experiments are the only way to see the true effects of new treatments.
Most benefits of clinical trials go to future patients. We must balance future benefits against present risks.
The interests of the subject must always prevail over the interests of science and society.
In the 1930s, the Public Health Service Tuskegee study recruited 399 poor blacks with syphilis and 201 without the disease in order to observe how syphilis progressed without treatment. The Public Health Service prevented any treatment until word leaked out and forced an end to the study in the 1970s.
68. Behavioral and social science experiments Many behavioral experiments rely on hiding the true purpose of the study.
Subjects would change their behavior if told in advance what investigators were looking for. (This is why there is nothing “real” about Reality TV).
The “Ethical Principals” of the American Psychological Association require consent unless a study merely observes behavior in a public space.
69. Do the ‘questions’ The material from this lecture will be included on exams.
Watch out for the yeah-yeahs.
Go through the examples in this textbook and make sure you can come up with convincing answers that are similar to or representative of the solutions in the text.
70. Example – Claudication Study (on web page) Methods: first thing they mention is IRB approval; Randomized; Design: 3 groups; Location (Northwestern)
Inclusion & Exclusion Criteria: defining the population
Measurement: How they measured the results – sometimes straight-forward, sometimes can be a huge and contentious issue. How do you measure pain symptoms? How do you measure improvement?
Blinding: Obviously could not be double-blinded since patients knew their ‘treatment’. However, researchers were blinded. They just saw the data results. They did not know which patients were in which group as the experiment was going on.
Details: Many other issues and techniques employed by the study are explained in careful detail.
Stratifications (Blocks): Claudication vs No Claudication.
Control group: Nutritional consulting, regular meetings with data-gathering team, etc, but NO exercise.
Outcomes: In particular note the very frequent mention of p-values, and confidence intervals. Very important and we will be learning about them.
Charts and graphs:
p159: Breakdown of stratifications. Also note the ‘exclusion’ disclaimer at the bottom of the graph. If you’re gonna leave people out of your analysis, you’d better explain why. In this case, 4 were left out in the end because they did not respond to following up.
Table 1, p.170: A careful breakdown and description of the people in each strata (block)
Conclusion: A study should at some point summarize the researchers’ recommendations on what the study can tell us. In this study it is in the very last paragraph: “Physicians should recommend supervised treadmill exercise programs for PAD patients regardless of whether they have classic symptoms of intermittent claudication”.