MEASUREMENT ISSUES Leonie huddy Stony brook university firstname.lastname@example.org
outline • MEASUREMENT ERROR • Definitions & Sources • Need for reliable measurement of DVs and moderators in survey experiments • Examples: measurement of moderator variables • Using Experiments to Validate Measurement • Example: Racial Resentment • Cross-National Measurement
i. Measurement ERROR1. Definitions (Alwin, 2007) • Measurement error: Error that occurs when observed value is different from the true value (systematically or at random) • Bias: a measure differs in systematic ways from its true value • Reliability: the measure is free of measurement error • Validity: measures right concept. May also need to assess this to ensure valid measurement. • Face validity (looks right on the surface) • Discriminant validity (differs from opposing soing what it should) • Convergent validity (goes with what it should) • Predictive validity (predicts what it is supposed to)
2. SourceS of MEASUREMENT ERROR (Alwin) Bias Variance -interviewer bias -interviewer error variance -respondent bias -respondent error variance -instrument bias -instrument error variance -mode bias -mode error variance
3. RELIABLE measurement OF THE dependent variable (DV) • The major problem with measurement error in the DV: • VARIABILITY: Measurement error makes it more difficult to successfully identify significant treatment effects. • Important to include multiple measures of the DV to reduce measurement error and increase measurement reliability • Many experimental studies focus more on the manipulation than the DV • Bias in the DV (under or over estimates) does not bias the estimated relationship with an independent variable
4. Reliable measurement of experimental moderators • Experimental effects in political science are frequently heterogeneous. Hypothetical examples include: • The effect of elite partisan cues in a framing study depends on partisan identity (direction and identity strength) • The effects of new information about a government policy may depend on existing levels of political sophistication • Exposure to a more or less generous welfare policy depends on la respondent’s left/right political ideology • The reliable measurement of moderators (and their correct theoretical model specification) will increase the likelihood of detecting heterogeneous experimental effects.
A. moderator measurement EXAMPLE: Partisan Identity vs. Traditional PID Strength; Huddy, Mason & Aaroe • Threat: an experimental blog statement that suggests that Democrats or Republicans will lose the upcoming election; message is from either the same or the other main party. • Sample statements in Democratic threat manipulation (from the other party): • I love watching Democrats delude themselves! They’re talking a big game, but look closer and they know they’re in trouble. • America clearly wants Republican leadership, and the Democrats are running in circles desperately trying to convince themselves that anyone in America trusts them! • People don’t trust Democrats and they don’t like their politics. • They lost a lot of credibility over their years of flip-flopping, it's going to take more than a couple of years to get it back.
Multi-item Partisan Identity vs. Traditional PID Strength; Huddy, Mason & Aaroe Partisan Identity Scale • How important is being a [Democrat/ Republican/Independent] to you?” • How well does the term [Democrat/ Republican/Independent] describe you? • When talking about [Democrats/ Republicans/Independents], how often do you use “we instead of “they”? • To what extent do you think of yourself as being a [Democrat/ Republican/Independent]? Traditional PID Strength • “Generally speaking do you think of yourself as a Democrat, a Republican, or an Independent?” • “Are you a strong or not so strong Democrat/Republican?” • IF INDEPENDENT: “Do you think of yourself as close to the Republican party or closer to the Democratic party? “
Multi-item Partisan identity vs. PID strength: predicting angry reactions to threat; Huddy, Mason & Aaroe
B. moderator measurement EXAMPLE: candidate skin color, racial prejudice and social desirability; Terkildsen 1993 • Assigned to read about a light or and dark skinned black candidate • Subjects part of the Louisville, KY jury pool • Measured self–monitoring (tendency to distort true beliefs in response to social norms) AND racial prejudice as factors that moderate the experimental treatment • Both are measured as multi-item scales to reduce measurement error
Question Wording-self monitoring; Terkildsen(1993) C. Self-monitoring scale (abridged version): Respondents were asked to indicate if "each statement is true or false as it applies to you:" Scale reliability was .74. F 1. I can only argue for ideas which I already believe. T 2. When I am uncertain how to act in social situations I look to the behavior of others. T 3. I laugh more when I watch a comedy with others than when alone. F 4. I would not change or modify my opinions in order to please someone else or win favor. T 5. I am not always the person I appear to be. F 6. My behavior is usually an expression of my true attitudes and beliefs. F 7. I am not particularly good at making other people like me. T 8. I can look anyone in the eye and tell a lie. Scoring indicates responses for high self-monitors. Respondents received a 1 when they agreed with a high self-monitor's response and a 0 when they disagreed.
Question Wording-racial prejudice; Terkildsen(1993): • D. Racial Prejudice (adopted from the General Social Survey): "Please rate black Americans on each scale provided using any number between 1 and 5." A "don't know" option was furnished. The endpoints of the six scales were labeled as follows: Scale reliability was .85. • Rich-Poor • Intelligent-Unintelligent • Hard-working-Lazy • Prone to Violence-Not Prone to Violence • Prefer to be self-supporting-Prefer to live off welfare • Patriotic-Unpatriotic Item four is reverse coded.
Race and Skin-Tone of Political Candidates, vote for governor on 1-4 scale; Terkildsen, 1993
5. Measurement of the Treatment Effect • Emotional ads study (Weber); The Campaign Ads Study (2007) examined the emotional impact of experimentally altered campaign ads on political attitudes and participation. • 4 ads designed to manipulate anger, anxiety, sadness, enthusiasm • Respondents complete a battery of emotion questions (3 question / emotion) after the treatment • In this study, ads have heterogeneous effects and do not alter emotions cleanly. Raises questions about how to assess the effects of the treatment. At a minimum, need to measure the treatment well.
Manipulation Checks-Emotional Ads; Top Panel-SMIS adult sample, Bottom Panel-Students (Weber)
II. Using Experiments to Validate key Variables: • Racial Resentment, (Feldman and Huddy 2005 ) • Controversy over the measurement and conception of racial prejudice in political research • A. Overt Prejudice: belief that blacks are inherently inferior to whites. • B. New Racism: resentment at the special treatment of blacks. • symbolic racism (Kinder and Sears); • modern racism (McConahay); • racial resentment (Kinder and Sanders).
New Racism is Controversial • It is an excellent predictor of white racial policy attitudes ( • But some argue that the items may be too close to the racial policies they are supposed to predict (e.g., Schuman 2000; Sniderman and Tetlock 1986) • Conceptualization makes it difficult to distinguish resentment from individualism (Sniderman et al 2000). • Racial Resentment Items • 1) “Irish, Italians, Jewish and many other minorities overcame prejudice and worked their way up. Blacks should do the same without any special favors.” • (2) “Over the past few years blacks have gotten less than they deserve.” • (3) “It's really a matter of some people not trying hard enough; if blacks would only try harder they could be just as well off as whites.” • (4) “Generations of slavery and discrimination have created conditions that make it difficult for blacks to work their way out of the lower class.”
Data: New York state racial attitudes survey • RDD telephone interview of New York state residents (late 2000 -2001) • 760 white, non-Hispanic, non-Asian respondents. • Survey conducted by the Center for Survey Research at Stony Brook University. • College Scholarship Experiment. (similar to a program adopted by some universities to replace race-based affirmative action college admissions). • Respondents were randomly assigned to one of 8 conditions. • “To what extent do you favor providing special college scholarships for _____ students who score in the top fifteen percent of their school class, even if their school’s grades are not in the top fifteen percent nationally?” • The eight conditions referred to white, black, poor white, poor black, middle class white, middle class black, poor, and middle class students.
Predictions Concerning Racial Resentment : Ideology or Prejudice? • Racial resentment as prejudice: should only predict opposition to policies targeted for black students • Racial resentment as ideology: should promote opposition to programs for all students regardless of race • Or does the meaning of racial resentment vary with left-right (liberal-conservative) ideology? • Racial for liberals (only affects their opposition to programs for black students) • Ideological for conservatives (predicts opposition to program for all students)
Probability of Support for Scholarships by Racial Resentment: POLITICAL LIBERALS Poor White Poor Black Middle Class White Middle Class Black 1 .8 .6 Probability of Support .4 .2 0 0 .5 1 Racial Resentment
Probability of Support for Scholarships by Racial Resentment: Race - by - Class Conditions POLITICAL CONSERVATIVES Poor White Poor Black Middle Class White Middle Class Black 1 .8 .6 Probability of Support .4 .2 0 0 .5 1 Racial Resentment
III. CROSS-NATIONAL SURVEYS:methods to develop comparable questions 1. Sequential: • Developed in one context and then exported to another; survey simply translated without adaptation for another context • Examples: Eurobaromoeter; usually questions developed in French and English first, and then these questions are translated for other countries • Does not allow for pre-testing in other languages. Other countries are stuck with what ahs been developed initially. • Example of problems: ISSP problems: could not ask Japanese about whether their earnings were ”just” or “fair” because this is inappropriate in the Japanese context. • Harkness: all items should be carefully exported. • It may be easier to discard “bad” items in long psychological batteries because there are others. It may be more difficult in social science questionnaires in which a concept is measured by only one or two items.
2. Parallel Development: • Combine expertise from many countries and develop the survey in a single language • e.g., ESS which is written and developed in English first; ISSP also is developed by a multicultural group and everyone votes on the final questionnaire • Survey is then subject to multicultural testing before it is finalized • Advance translation occurs by translating some questions before the questionnaire is completed in order to identify problems. Such translations do not have to be perfect but are designed to bring up obvious problems. • Overall, this approach is better than sequential but is time consuming and involves complex coordination
3. Simultaneous: • Decentering: a draft questionnaire is produced in one language and the final version is produced in two. In the decentering phase specific cultural references are also removed. • Typically applied when studying only 2 cultures; ensures that questions are truly comparable. • This technique has been used on existing instruments. • But it my create very bland items • An alternative is to have some core common questions and some country specific; but then these are difficult to compare
References Alwin, Duane F. 2007. Margins of Error: A Study of Reliability in Survey Measurement.” Groves, Robert M. 1989. Survey Errors and Survey Costs. New York: Wiley. Weber, Lavine, Federico Lavine Tourangeau, Roger, Lance Rips, and Kenneth Rasinski. 2000. The Psychology of Survey Response. New York: Cambridge University Press. Snyder, Mark, and Steven W. Gangestad. 1986. On the Nature of Self‑Monitoring: Matters of Assessment, Matters of Validity. Journal of Personality and Social Psychology, 51, 125‑139. Feldman, Stanley and Huddy, Leonie. 2005. “Racial Resentment and White Opposition to Race-Conscious Programs: Principles or Prejudice? “American Journal of Political Science, 49 (1): 168-183. Huddy, Leonie and Anna Gunthorsdottir. 2000. The Persuasive Effects of Emotive Visual Imagery: Superficial Manipulation or A Deepening of Conviction? Political Psychology. 21:745-778. Harkness, Janet. 2003. “Questionnaire Translation” In Janet Harkness, Fons J. R. Van De Vijver, and Peter de Mohler. Cross-Cultural Survey Methods. Hoboken, NJ: John Wiley and sons. pp. 35-56. HUDDY, MASON & AAROE Shcuman