Loading in 2 Seconds...
Loading in 2 Seconds...
The Effect of Familiarity with the Response Category Labels on Item Response to Likert Scales. Bert Weijters Maggie Geuens Hans Baumgartner. Motivating Example.
a French researcher wants to replicate an empirical finding that was established in the U.S. using data based on consumer self-reports in France;
in the English questionnaire, a Likert scale with endpoints of ‘strongly disagree’ and ‘strongly agree’ was used;
should the French researcher use ‘fortementd’accord’ or ‘tout à fait d’accord’?
Do the labels attached to the response scale categories influence response behavior (i.e., how many respondents endorse the extreme scale categories)?
What causes this effect?
How can the effect be mitigated?
What are the implications for multilingual and monolingual surveys?
various characteristics of rating scales have been studied, but the problem of choosing appropriate labels for the response categories has been largely ignored;
this is surprising because category labels typically apply to many if not all of the items in a questionnaire;
if differences in responding to survey items as a function of the category labels have been acknowledged, the effect has generally been attributed to the perceived intensity of the labels (intensity hypothesis);
in this research we propose the familiarity hypothesis (i.e., scale categories marked by labels that are used more often in day-to-day language are more likely to be endorsed) and contrast it with the intensity hypothesis;
intensity is defined as the degree or extent of the attribute expressed by the label (e.g., degree of agreement or disagreement, extent of liking);
prior research shows that scale anchors in general (e.g., adjectives for evaluating products, such as “good”, “terrific”, or “superior”, as in Wildt and Mazis 1978) and amplifiers used in Likert scales (e.g., “slightly”, “somewhat” or “very much” agree, as in Spector 1976) differ in perceived intensity;
more intense labels represent more extreme positions, which should be endorsed less often (e.g., agree vs. strongly agree; superior vs. very good);
Wyatt and Meyers (1987) found that when the extremes of the response scale were anchored by narrower or less absolute labels (i.e., “agree” and “disagree”), responses were distributed more evenly across all five scale steps, whereas when the response scale was bordered by wider or more absolute labels (i.e., “strongly agree” and “strongly disagree”), responses were concentrated more on the intermediate scale steps;
even more subtle adverbial modifiers (e.g., strongly vs. completely agree) may influence response behavior;
prior evidence that different intensities are associated with different adverbs (e.g., Cliff 1959; Smith et al. 2009), but little evidence that different adverbs lead to differential category endorsement;
Hintensity: Endpoint response categories are endorsed less frequently if their labels are more intense.
certain word combinations co-occur more often than would be expected based on their individual frequencies (e.g., strong tea vs. powerful tea);
since collocations have been shown to be processed more quickly, familiar (vs. unfamiliar) labels, because of their greater processing fluency, should be chosen more confidently as the true and preferred response option;
Arce-Ferrer (2006) showed that respondents who were less familiar with the meaning of the intermediate scale categories were more likely to engage in extreme responding and therefore less likely to endorse response options with which they were not familiar;
Hfamiliarity: Endpoint response categories are endorsed more frequently if their labels are more familiar.
Two alternative hypotheses to explain the effect of response category labels
If the intensity or familiarity of scale labels is to have a reliable effect on responses to questionnaires, consistent differences in the perceived intensity and fluency of category labels should emerge across respondents.
We need two labels that imply contradictory responses under the intensity and familiarity hypotheses.
Sample 1: 83 undergraduates; pairwise comparisons of intensity and familiarity of six endpoint labels;
Sample 2: 112 respondents (mean age 32.03, 66% female) from an online panel; direct ratings of intensity and familiarity on 11-point scales;
Sample 3: 125 undergraduates (57% female); lexical decision task;
for intensity, the correlation of the means obtained from the paired comparison and direct rating tasks is .92;
the correlations of the means derived from the four familiarity methods range from .94 to .97;
thus, there is considerable consistency in respondents’ judgments of the perceived intensity and familiarity of different category labels;
‘sterkeens’ (strongly agree) consistently emerged as one of the least intense and least familiar labels, while ‘volledigeens’ (completely agree) surfaced as one of the most intense and most familiar labels;
The endorsement rate for a high intensity and high fluency label should be relatively low if the intensity hypothesis is true, and it should be relatively high if the fluency hypothesis is true.
The manipulation of intensity/familiarity was successful;
The findings support the familiarity hypothesis:
the results of Study 2 are presumably due to the fact that more familiar labels are more easily processed and that this ease of processing inadvertently influences respondents’ answers to survey questions;
as long as the relevance of meta-cognitive experiences is not called into question, people consider this information as diagnostic and incorporate it into their judgments by relying on naïve theories such as, “If the information comes to my mind easily, it must be true or I must like it”;
however, when the diagnosticity or informational value of meta-cognitive experiences is called into question, people discount this information and either turn to alternative naïve theories such as “The information comes to mind easily because I have often heard it” or use the cognitive content of the stimulus;
this suggests that making respondents aware that more familiar response labels may attract more responses and that this may lead them to more readily select the category label “completely (dis)agree” should eliminate the previously observed familiarity effect;
(based on Bassetti and Cook 2011)
approx. 200 English- or French-speaking respondents in five regions (nationality/language combinations) of North America and Europe;
five endpoint labels in each language;
16 heterogeneous items from Greenleaf (1992), rated on 5-point scales;
pairwise comparisons of the six labels plus “agree” or “d’accord” in terms of intensity and familiarity;
Intensity and familiarity ratings by region
Note: Correlation between the familiarity ratings and the natural logarithm of the number of Google hits was at least .88.
demonstration that familiarity is a viable determinant of extreme responding differences between regions in a large-scale international survey;
illustration of how to construct and use relative measures of familiarity and extreme responding based on secondary data only;
relative measure of familiarity as the natural logarithm of the ratio of the number of Google hits for the 1st and 7th category (strongly agree or disagree) to the number of Google hits for the 2nd and 6th category (agree or disagree);
relative endorsement of the 1st and 7th vs. the 2nd and 6th response categories (natural logarithm);
Note: Standardized B = .68, p < .05, R² = 46%.
prior research has generally attributed differences in response distributions in cross-cultural comparisons to nationality and national culture;
our findings demonstrate that different labels may vary in terms of familiarity, which can lead todifferent response patterns across languages;
in particular, if the endpoint label used in a certain language is more familiar than the one used in another language (relative to the adjacent category label), it is likely that the endpoint will be selected more frequently in the former than in the latter language;
response category labels that are more commonly used in day-to-day language (i.e., that are more familiar) lead to higher endorsement of their associated response categories;
respondents do not simply scale response categories along an intensity dimension and then map their latent response to the best-matching category, but they are also influenced by the familiarity of the labels;
the category label familiarity effect can be eliminated by making respondents aware of the potentially biasing effect of label familiarity,
the problem may be particularly serious in cross-cultural research when different languages are used;
however, researchers can control for differences in label familiarity across languages based on secondary data;
imagine a situation in which the strength of a relationship is compared across two groups and labels that differ in familiarity are used to collect data in the two groups;
the DV, an attitudinal variable (ATT), is measured on an agreement rating scale, and the IV (e.g., AGE in years) is measured on an objective scale and hence not affected by differences in label familiarity;
compared to respondents in the unfamiliar label condition, respondents in the familiar label condition who have a moderately positive or negative true attitude will exhibit a more extreme positive or negative observed attitude because they are more likely to endorse the endpoints;
this can result in a steeper observed slope and thus a stronger relationship between the objective antecedent and the observed attitude in the familiar label condition;
e.g., the German and Dutch labels “vollkommeneinverstanden” and “volledigeens” are literal translations (similar to “completely agree”), but in German this expression is more familiar, resulting in more endpoint responses than in Dutch(based on Study 5);
in two languages