The response category labeling effect: How the wording of labels affects response distributions in Likert data. Bert Weijters Maggie Geuens Hans Baumgartner. Research questions. Do the labels attached to scale categories influence response behavior?
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Do the labels attached to scale categories influence response behavior?
What mechanism(s) can account for this response category labeling effect?
Are there moderators of this effect?
What are the implications of the response category labeling effect for cross-cultural research?
Label intensity refers to the perceived degree of (dis)agreement implied by the label;
More intense labels represent more extreme positions, which are endorsed less often (e.g., agree vs. strongly agree; superior vs. very good);
Even more subtle adverbial modifiers (e.g., strongly vs. completely agree) may influence response behavior;
Prior evidence that different intensities are associated with different adverbs (e.g., Cliff 1959; Smith et al. 2009), but little evidence that different adverbs lead to differential category endorsement;
Two alternative hypotheses to explain the effect of response category labels
when people are processing more carefully or when people are highly experienced, their actual thoughts, not the ease of generating them, play a more decisive role;
Verbal ability (as a form of language expertise) may moderate the fluency effect;
We posit that for respondents who tend to use words in a precise manner and who make fine-grained distinctions as to the exact meaning and implications of words, fluency will be less important as a cue in selecting a response;
If the intensity or fluency of scale labels is to have a reliable effect on responses to questionnaires, consistent differences in the perceived intensity and fluency of category labels should emerge across respondents.
We need two labels that imply contradictory responses under the intensity and fluency hypotheses.
Sample 1: 83 undergraduates; pairwise comparisons of intensity and fluency of six endpoint labels;
Sample 2: 112 respondents (mean age 32.03, 66% female) from an online panel; direct ratings of intensity and fluency on 11-point scales;
Sample 3: 125 under graduates (57% female); lexical decision task;
For intensity, the correlation of the means obtained from the paired comparison and direct rating tasks is .92;
The correlations of the means derived from the four fluency methods range from .66 to .97, with an average of r = .84;
Thus, there is considerable consistency in respondents’ judgments of the perceived intensity and fluency of different category labels;
‘sterkeens’ (strongly agree) consistently emerged as one of the least intense and least fluent labels, while ‘volledigeens’ (completely agree) surfaced as one of the most intense and most fluent labels;
The endorsement rate for a high intensity and high fluency label should be relatively low if the intensity hypothesis is true, and it should be relatively high if the fluency hypothesis is true.
(p<.001 based on a Poisson regression)
The findings support the fluency hypothesis:
Replication of the fluency effect with a sample drawn from the general population;
Literacy as a potential moderator;
(p<.05 based on a Poisson regression)
(based on Bassetti and Cook 2011)
Approx. 200 English- or French-speaking respondents in five regions (nationality/language combinations) of North America and Europe;
Five endpoint labels in each language;
16 heterogeneous items from Greenleaf (1992), rated on 5-point scales;
Pairwise comparisons of the six labels plus “agree” or “d’accord” in terms of intensity and fluency;
Intensity and fluency ratings by region
Note: Correlation between the fluency ratings and the natural logarithm of the number of Google hits was at least .88.
Multilevel model estimates
Demonstration that fluency is a viable determinant of extreme responding differences between regions in an international survey;
Illustration of how to construct and use relative measures of fluency and extreme responding based on secondary data only;
13,520 respondents from 17 European regions;
16 heterogeneous items based on Greenleaf (1992);
Use of fully labeled 7-point response scales;
Fluency: relative measure of fluency as the natural logarithm of the ratio of the number of Google hits for the 7th category (strongly agree) to the number of Google hits for the 6th category (agree);
Endorsement: relative endorsement of the 7th vs. the 6th response category (natural logarithm).
Note: Standardized regression slope of .67 (p<.01, R2=.45)
response category labels that are more commonly used (i.e., that are more fluent) lead to higher endorsement of their associated response categories;
respondents do not simply scale response categories along an intensity dimension and then map their latent response to the best-matching category, but they are also influenced by the fluency of the labels;
the effect of fluency is more pronounced for respondents who are lower in literacy and verbal ability;
the problem may be particularly serious in cross-cultural research when different languages are used;
e.g., ‘Strongly agree’ is most commonly used in scales, but may not have valid equivalents in some other languages. ‘Completely agree’ seems to be a viable alternative.
Completely agree 1.24 18.8%
Tout à fait d’accord 1.22 19.2%