Surveys FPP Chapter 19
Plan of Study • Conducting good surveys is not trivial. We will cover these three main topics today. • Issues in questionnaire design • Methods for selecting units to survey • Administration of surveys
Statistical Literacy • Statistics are often used to validate arguments, philosophies, public policy decisions etc. • It behooves us to be educated consumers of statistics • Lying with statistics • “There are three kinds of lies: lies, damned lies and statistics” • “If you torture data enough it will tell you what you want it to.” • Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists by Joel Best • Statistics: Concepts and Controversies by David Moore
Statistical Literacy • An example from Joel Best’s book • In dissertation prospectus a student reported the following that was found in a 1995 reputable peer-reviewed journal • “Every year since 1950 the number of American children gunned down has doubled” • Best says that “I think that it may be the worst – that is, the most inaccurate -- social statistic ever” • What’s wrong with it?
Statistical Literacy • Understand the measure • Two reports in same newspaper (2/90) • Teens’ sexual activity on the rise. • Teen’s sexual activity on the decline • First article: avg. age of first intercourse (17.2 females, 16.5 males), which was younger • Second article: avg. number of partners (6) and frequency (3 per month), which were lower. • Two analyses really tackled different question but conclusions presented the same way • Always find out what variable is being measured when judging or making statistical conclusions.
Statistical Literacy • Which U.S. states are the worst polluters? • E.P.A. in 1993 ranked New Jersey 22nd worst in the nation in release of toxic chemicals. E.P.A used total pounds (38.6 million). • When using pounds per square mile, New Jersey was the 4th worst in the nation. • The two analyses assess pollution, but they use different variables. • Pay attention to variable used when judging statistical claims
Statistical Literacy • “Statistics are no substitute for judgments” Henry Clay
Statistical Literacy • Beware Hidden Agendas • Survey paid for by disposable diaper company • “It is estimated that disposable diapers account for less than 2% of the trash in today’s landfills. In contrasts, beverage containers, third-class mail and yard wastes are estimate to account for about 21% of the trash in landfills. Given this, in your opinion would it be fair to ban disposable diapers?” • Question is prefaced with claims that only favor diapers which will lead respondents answers • When judging surveys get exact wording of questions
Statistical Literacy • Beware hidden agendas 2 • “Levi’s 501 Report, a fall fashion survey conducted annually on 100 campuses” sponsored by Levi’s • Report: 90% of college students chose Levi’s 501 jeans as “in” on campus • Levi’s 501 jeans was the only type of blue jeans on the list
Statistical Literacy • Beware of hidden agendas 3 • Advertisement for Triumph cigarettes: • “TRIUMPH BEATS MERIT – an amazing 60% said that Triumph tastes as good or better than Merid.” • Actual survey results • 36% Triumph, 40% Merit, 24% no pref. • The company reported the survey results to make Triumph look as good as possible.
Statistical Literacy • Economic questions • Did wages increase during the Reagan-Bush years (1980-1992)? • Average wages in private nonagricultural • Production: $235 in 1980, $345 in 1992 • The equivalent 1980 wage adjusted for inflation was $388. People were worse off! • Consider inflation when judging statistical statements about money.
Statistical Literacy • Consider raw numbers • “Planes get closer in midair as traffic control errors rise.” • 18% increase in errors, 1997 to 1998. • Wow! Huge increase. Flying was much more dangerous in 1998 compared to 1997 • Actual error rates: • 5.5 errors per million 1998 • 4.8 errors per million 1997 • 7 more errors per million isn’t as bad
Statistical Literacy • Use the proper base • 80% of all accidents happen within 10 miles of home • Well duh! Most driving is within ten miles of home, so most accidents should occur within 10 miles of home.
Statistical Literacy • Use proper base • 12th grade students ranking on math performance in various regions. (I.A.E.E.A. rankings 1991) • Hong Kong ranked first • Hungary ranked near the bottom of list. • But … • 50% of Hungary’s 12th graders took math • 3% of Hong Kong’s 12th graders took math • It is likely that only the mathematically inclined students took math in Hong Kong while math education was more universal in Hungary • Average score in Hungary should be lower when including performance of weaker math students.
Statistical Literacy • Good and bad graphs
Statistical Literacy • What to do? • Ask yourself as many questions as possible • Try to think of all possible interpretations • Is it suspicious? • If it is suspicious then investigate • Make sure that it makes sense
General Idea Parameter Population Inference Sample Statistic
Some new vocabulary • Population • Sample • Parameter • Statistic • Inference • Bias • Selection bias • Non-response bias • Frame coverage bias • Simple random sample • Quota sampling • Convenience sampling • Judgment sampling • Voluntary sampling • Probably others that I’ve missed
Plan of Study • Conducting surveys is not trivial. We will cover these three main topics today. • Issues in questionnaire design • Methods for selecting units to survey • Administration of surveys
Challenges to writing good questions • Defining objectives and specifying the kind of answers needed to meet objectives of the question • Ensuring all respondents have a shared, common understanding of the question • Ensuring people are asked questions to which they know the answers • Asking questions respondents are able to answer in the terms required by the question • Asking questions respondents are willing to answer accurately • These come from the book by Floyd Fowler, Jr. Improving Survey Questions, Sage Publications, 1995 • Dr. Jerry Reiter created a fantastic article about asking good questions. It is on blackboard and I encourage you to read it and use it as a resource.
Challenge to writing a good question #1 • Objective: Assess the mental health of people with arthritis (Pincus, 1993). • To meet this objective, the researchers used the Minnesota Multiphasic Personality Inventory, which is a standard battery of questions used to judge whether people are depressed. The Inventory contains a series of true/false questions, for example • (a) “I am about as able to work as I ever was.” • (b) “I am in just as good physical health as my friends.” • (c) “I have few or no pains.” • People who answer “false” many times are considered to be depressed. • These questions are not informative for the objective of interest. • For example, people with arthritis answer “false” to the questions because of the nature of their disease.
Challenge to writing a good question #2 • It is surprisingly easy to interpret questions differently, and it is surprisingly hard to write questions that are interpreted consistently. • Here are questions from two separate 1992 polls of U.S. residents (Moore, 1995): • (a) “Does it seem possible or does it seem impossible that the Nazi extermination of the Jews never happened?” • (b) “Does it seem possible to you that the Nazi extermination of the Jews never happened, or do you feel certain that it happened?” • For the first question, about 22% of respondents answered that it seems possible. • But, the question has a double negative, which easily could confuse people. • In the second poll, which has a much clearer phrasing, only 1% of people answered that it seems possible.
Challenge to writing a good question #3 • Ability to answer question • 1. Is your health plan a PPO, HMO, or fee for service plan? • It is unreasonable to expect people to know the terms PPO (preferred provider organization), HMO (health maintainance organization), or fee for service. • What can be done instead is to ask people questions that lead them to describe their plan. The researcher then can categorize the responses once the survey is complete.
Challenge to writing a good question #4 • The form of the answer • Here is a question that was asked to AIDS patients in a health survey. • “In the past 30 days, were you able to climb a flight of stairs with no difficulty, with some difficulty, or were you not able to climb stairs at all?” • The problem with this question is that AIDS patients cannot answer it! Their condition varies tremendously from day to day. Some days they have the strength to climb stairs, and other days they do not.
Challenge to writing a good question #5 • It’s surprising how many issues people are unwilling to answer truthfully, or even answer at all, because they don’t want to be perceived as doing something socially undesirable. • “Did you vote in the presidential election of November 2000?” • “How many alcoholic drinks did you have altogether yesterday?”
Steps to running a survey • Establish the target population (this isn’t always easy) • Obtain a sampling frame (this can be very difficult) • Select a sample (this can be difficult) • Obtain data from the sampled units (this can be difficult) • In addition to being difficult all of the above could potentially be very expensive
Think about it • What percent of folks think that we should legalize drugs? • How can we answer this question? • Will legalizing drugs reduce crime? • How can we answer this question? • What percent of adults believe that the TSA makes us safer? • How can we answer this question? • Does the TSA prevent terror attacks? • How can we answer this question?
Misspecifying target population • 1994 Democratic gubernatorial primary in Arizona • All polls predict Eddie Basha would trail front-runner by at least 9 points • Result of election: Basha won • Target population used in polls: registered voters who had voted in previous primaries
Surveys that use census as sampling frame • U.S. census often used as frame for many federal and social surveys • target population here is folks living in U.S. • U.S. census misses some people • can you think of any examples? • Samples taken from frame are non-representative even before sampling
Selecting samples • Units sampled should be representative of the target population • How do we ensure this? • Select a subset of units from the frame at random • Most common method is to obtain a “simple random sample” • Simple random sample: Every possible sample of size n has the same chance of being selected • How can this be done carry this out? • If random sample is large enough, it should have characteristics that mirror the characteristics of the population frame.
Obtaining survey data • Remember the following when designing a survey • Imperative that purpose of survey is stated clearly • Confidentiality should be promised and kept • At ISU there is a group that ok’s confidentiality of survey is met • Method for asking questions should be the same for all sampled units
Unreliable methods of selecting samples • What follows are examples of how NOT to select a sample • Convenience sampling: • Picking units that are easy to measure • Judgement sampling: • Picking units you judge as representative of the population • Voluntary response sampling: • Picking units who respond voluntarily • What are some examples of each?
Additional potential pitfalls • Nonresponse bias: • Units that do not respond differ from those that do. These folks will be under representated. • Frame coverage bias: • Frame doesn’t include all of target population • Can we think of some examples?
Think about it • A survey is carried out to determine the distribution of household size in a certain city. • They draw a simple random sample of 1000 households • The interviewers find people home in 635 of homes • What is an issue here? • They draw another sample and visit homes until they have 1000 responses. • The average household size in the city turns out to be 3.1 persons • Is this estimate likely to be too high, too low, or about right?
Example of voluntary response survey • Nightline call-in poll: Ted Koppel asked people to call his show to express their opinion on whether the United Nations should continue to have its headquarters in New York 186,000 people called in with 67% saying no. Independent random sample: 72% said yes.
Examples of problematic survey designs • Shere Hite’s book, Women and Love: A Cultural Revolution in Progress (1987), claims: • 84% of women “not satisfied emotionally with their relationships” (pg. 804) • 95% of women “report forms of emotional and psychological harassment from men with whom they are in love relationships” (pg. 810) • 70% of women “married five or more years are having sex outside of their marriages” (pg. 856)
Hite’s survey • To whom did she send a survey? • 100,00 questionnaires mailed to professional women’s groups, counseling centers, church societies, and senior citizens’ centers. • Her target population was women. What was her actual represented population?
Hite’s survey • What did the survey look like? • 127 essay questions on questionnaire • 4.5% of these questionnaires returned • What was not taken into account?
Hite’s survey • How did she ask the questions? • Questions use vague words like “love”. • People have different interpretations of such words • Questions were leading • “Does your husband/lover treat you as an equal? Or are there times when he seems to you as an inferior? Leave you out of decisions? Act superior?” (pg. 795)
Random sampling comment 1 • Say you collect data on units using a method other than a random sample, and you know these data are not representative of the population of interest. Then, you take a random sample from these collected data. This random sample is representative of the population. • Wrongo !! • Large random samples are representative of the population in the frame. • Effectively, this methods uses the unrepresentative, collected data as a frame. • By randomly sampling from a unrepresentative sample, you just get a smaller unrepresentative sample.
Random sampling comment 2 • Say you obtain data that are representative of the target population. Should you take a random sample from these collected data? • This question arises when researchers use data collected by others, for example in a Stat 101 project. • No! • If you have a representative sample, use it. • This sub-sampling method just reduces the amount of data you work with
Random sampling comment 3 • A census is a measurement of outcomes for all units in the population. For example the U.S.. Government does a census of the population every 10 years to apportion seats in the House of Representatives. It also takes censuses of agriculture and business. • Why do survey instead of census? • Surveys are cheaper • They require much fewer people to contact • Surveys results can be obtained more quickly • Same reason as above • This is important because we want to make policy decisions on current answers not answers that are months or years old. • Surveys can be more accurate • Fewer people to contact, less problems with interviewer effects and non-response bias • Up shot: less data of high quality is better than more data of poor quality
Random sampling comment 4 • Most major surveys are not simple random samples • They involve multiple stages of random selection • e.g., randomly pick 100 cities. From these cities random pick 500 households, then random pick 1 person from each household • Data collection like this are NOT representative of the population. However, because units are selected randomly, statistician can account for the non-representation. • This is done by assigning a weight to each observation that reflects how many units it represents in the population • A good question to ask here would be: Where do the weights come from? • Generally when analyzing data from surveys that are not simple random samples it is wise to contact a professional statistician
Up Shot • Conducting GOOD surveys isn’t trivial • Requires tons of work in the preparation phase • Identifying population • Obtaining representative sampling frame • Creating a well written questionnaire • Lots of work collecting data • Depending on the above analysis may be difficult • Final projects