Evaluating semantic similarity and sameness in studies of
This presentation is the property of its rightful owner.
Sponsored Links
1 / 51

Evaluating semantic similarity and sameness in studies of polysemy and synonymy PowerPoint PPT Presentation


  • 180 Views
  • Uploaded on
  • Presentation posted in: General

Evaluating semantic similarity and sameness in studies of polysemy and synonymy. Jarno Raukko (U. Helsinki). For a full version of the PPT, see handout distributed Oct 28, 2010. SKY webpage version. Examples. Are thrifty and stingy synonyms? EXPECTED ANSWER: ”Well, not quite.”

Download Presentation

Evaluating semantic similarity and sameness in studies of polysemy and synonymy

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Evaluating semantic similarity and sameness in studies of polysemy and synonymy

Jarno Raukko (U. Helsinki)


For a full version of the PPT, seehandoutdistributedOct 28, 2010.

SKY webpage version


Examples

Are thrifty and stingy synonyms?

EXPECTED ANSWER:

”Well, not quite.”

2. Are violin and fiddle synonyms?

EXPECTED ANSWER:

”Well, almost.”

(SYNONYMY)

3. Does back have the same meaning in

My back hurts and

I came back?

EXPECTED ANSWER:

”Not at all. Different.”

4. Does back have the same meaning in

I came back and

I got it back?

EXPECTED ANSWER:

”Well, almost.”

(POLYSEMY)


Relevance of semantic similarity (vs. difference)

  • In synonymy: you expect similarity for a pair/(set) of items to be of interest

  • In polysemy: primarily, you expect difference for a pair/(set) of items to be of interest; secondarily, you group items according to similarity and difference


Yet…


synonymy --- polysemy ?

  • Dirk Geeraerts tomorrow in Helsinki:

    • ”The problem of synonymy and the problem of polysemy are essentially the same”

  • Dylan Glynn & Justyna Robinson (eds, in press)

    • Polysemy and Synonymy. Corpus methods and applications in Cognitive Linguistics. Amsterdam: Benjamins.


synonymy

WORD 1WORD 2

If their semantic content is similar or the same,

this is a case of synonymy.

If their semantic content is (very) different, a researcher of synonymy ignores this case.


polysemy

MEANING 1MEANING 2

OF WORD 1OF WORD 1

The starting point is that Word 1 has at least 2 (different) meanings.

If meanings 1 and 2 are very similar, this might be a case of vagueness.

If meanings 1 and 2 are totally different (and not related semantically), this might be a case of homonymy.

If meanings 1 and 2 are somewhat different but somehow relatable (or a bit similar), this is probably a case of polysemy.

henceforth W = word M = meaning


scale of similarity: synonymy

The meaning of W1 and W2 is…

THE SAME --------------------------------- DIFFERENT

perfectnear-weakNOT

synonymysynonymysynonymyWORTH

fullsemi-quasi-DISCUSSION

synonymysynonymysynonymy


scale of similarity: polysemy

The meaning of M1 and M2 (of W1) is…

THE SAME --------------------------------- very DIFFERENT

twovague-polysemyhomonymy

instancesness(ambiguity)

of the same

meaningtwo instancesinstances

of the same of different (yet related)

meaning typemeaning types


Main question

  • Is semantic similarity somehow different when we look at polysemy than we look at synonymy?


Differences so far

  • Which is the default, similarity or difference?

  • In synonymy, we idealize on the extreme of the scale, but mainly look at the part of the scale which is (fairly) close to the extreme.

  • In polysemy, we operate pretty much on the whole scale, with focus on the middle.


synonymy --- polysemy ?

  • when you study synonymy, the polysemy of the items gets in the way

    • can you ever say “W1 and W2 are synonymous”?

    • should you always say “Mx of W1 and My of W2” are synonymous?

  • when you study polysemy, you often use synonyms to talk about meanings

    • “Are get ‘receive’ and get ‘arrive’ meanings of the same verb?”


synonymy --- polysemy ?

  • Synonymy occurs when meaning is shared (but form differs)

  • Polysemy occurs when form is shared (but meaning differs)

  • Synonymy is a relational lexical-semantic property that unites (parts of the semantic potential of) “accidentally” coinciding words

    • The forms of words involved in the synonymy relationship are arbitrary (although the relationships might be non-arbitrary, cf. Levin this morning)

    • The semantic value (that is shared) is motivating enough that two or more forms coincide on it

    • It is typical that one meaning can be expressed with two different words.


synonymy --- polysemy ?

  • Polysemy is a semantic property of one word at a time that unites meanings. The relationship between them is motivated, but it is only sometimes predictable.

    • It is not accidental or arbitrary that words acquire polysemy. It is in their nature. :-)

    • It is typical for semantic value to be flexible, extended, and “multiplied”.

    • Polysemy is about categorization, both between words (W1 covers a semantic territory) and within a word (M1 and M2 are categories too).

  • One form : One meaning

    • a principle that cognition may strive for / take as a default

    • synonymy breaks it

    • polysemy breaks it


synonymy --- polysemy ?

  • The role of co(n)text

    • You can evaluate synonymy in identical co(n)texts:

      • I like to play the fiddle in bars.I like to play the violin in bars.

    • Usually you evaluate polysemy in non-identical co(n)texts

      • I got to Zabriskie Point.I got to a point in my life where…

    • But you can use identical co(n)texts as well.

    • I got to be the last one. I got to be the last one.


evaluating

  • To study shades of semantic similarity, we need to evaluate it.

  • A corpus cannot tell us if two instances are semantically similar

    • It requires human judgement

  • The main use of evaluating in this paper:

    • How informants / test subjects / speakers

      • evaluate the semantic similarity (or difference)

      • of linguistic items in a more or less experimental setting (e.g., similarity rating test)

    • ≈ Data elicitation ≈ Population test


evaluating

  • quantitative:

    • Estimate the degree of synonymy

      (or semantic distance between two meanings in polysemy)

  • qualitative:

    • Justify / explain / explicate

      the nature of / the reason for

      semantic similarity


evaluating takes place in real life as well

  • synonymy (examples)

    • in linguistic production, you e.g. estimate which of the near-synonyms might suit your needs best

    • in comprehension, you e.g. estimate whether near-synonyms that you have encountered refer to the same semantic value

    • in communication, when you negotiate meaning, you e.g. operate with synonymous alternatives

  • polysemy (examples)

    • in production, you e.g. apply words to new contexts

    • in comprehension, you e.g. approximate meanings according to related meanings of the same word

    • jokes often exploit polysemy

    • polysemy may cause misunderstandings

    • in communication, when you negotiate meaning, you e.g. cross-check with polysemy of other words


(Back to experiments/elicitation.) Expected difference between synonymy and polysemy, 1

  • If an informant is asked to rate the semantic similarity/difference of two words,

    • the very fact that they are different words might cause her/him to presuppose that there is at least some semantic difference.

    • Therefore, rating two words ”semantically identical” requires a marked choice.

    • However, if the informant realizes that the researcher is after synonymy, then evaluating W1 and W2 as semantically similar is more likely.


Expected difference between synonymy and polysemy, 2

  • If an informant is asked to rate the semantic similarity/difference of two meanings of one word,

    • the very fact that they are uses/instances of the same word might cause her/him to presuppose that there is at least some semantic similarity.

    • Therefore, rating two words ”semantically totally/very different” requires a marked choice.

    • However, if the informant realizes that the researcher is after polysemy, then evaluating M1 and M2 as semantically different is more likely.


Factors that influence

  • In both cases (synonymy and polysemy)

    • it matters a great deal

      • Which test (type) we use

      • What the instructions (exact phrasings) are

      • Whether there is an example rating given by the researcher

      • What the selection of stimuli is

      • What the linguistic context of each stimulus is

      • Which types of cases have been placed in the beginning of the test (or, the order in general)


Factors that influence

  • Should we expect (total) consensus?

  • No. There will be subjective differences.

  • Why?

    • The nature of semantics:

      • Based on intersubjective convention

      • Based on negotiation and flexibility

      • Must allow for variability and variation


Examples from (more or less) experimental studies on synonymy and polysemy


Whitten & al. 1979 (synonymy)

  • “Indicate the degree to which two words have the same meaning by writing a digit from 1 to 7.”

    • 7 =excellent synonymy

    • 1 = poor synonymy

  • All 464 stimulus noun pairs were listed as synonyms in standard references.

  • The rated degree of synonymy ranged from 6.79 to 2.24. The median was 5.08.

  • If placed within context of nonsynonym pairs, the ratings for the low end might have been higher.


  • Whitten & al. 1979 (synonymy) cont’d

    • Stimulus pairs at the high end:

      • purchase – buy6.79

      • lawyer – attorney6.78

      • autumn – fall6.72

      • penny – cent6.71

      • taxi – cab6.71

  • Stimulus pairs close to the median

    • college – university5.12

    • output – yield5.10

    • expert – authority5.09

    • effort – attempt5.08

    • servant – maid 5.08

    • soldier – warrior5.07


  • Whitten & al. 1979 (synonymy) cont’d

    • Stimulus pairs at the low end:

      • thunder – clap2.72

      • patient – invalid 2.55

      • visit – chat 2.52

      • suburb – neighborhood2.34

      • needle – spike2.24

  • Although instructions said that all stimuli are nouns, some of these are more common as verbs: buy, purchase, visit, chat

  • The polysemy is obvious in many cases: fall, authority, clap, patient, invalid


  • Whitten & al. 1979 (synonymy) cont’d

    • The main variable that they paid attention to was the order of the two stimuli: ½ of the informants got “forward order”, ½ got “back order”.

      • In 1979 one of their main aims was to study the structurings of the mental lexicon and lexical access.

      • Example: purchase => buy 6.72

        buy => purchase6.86

      • On average, perceived synonymy was affected by word order.

      • For 21 word pairs, the effect of the order was significant.


    Whitten & al. 1979 (synonymy) cont’d

    • Some of the 21 word pairs where the order played a significant role in the rating of the degree of synonymy:

      motive => reason6.28reason => motive5.56

      quarter => fourth6.24fourth => quarter5.00

      mission => task5.66task => mission4.84

      era => age5.80age => era4.60

      appetite => hunger5.18hunger => appetite4.24

      nectar => honey4.94honey => nectar3.68

      aborigine => native 4.52native => aborigine3.22

    • Generalization: a more specific, more academic, and less polysemous word prompts a positive synonymy judgement more readily than vice versa.


    Whitten & al. 1979 (synonymy) cont’d

    • Variance (between informants)

      • Mostly .50–1.20 at the end of 50 most synonymous

        • Exceptionally high variance at the high synonymy end:

          • murder => homicide 2.75 (cf. homicide => murder 1.03)

      • Mostly 2.00–3.00 at the median of the scale

        • Exceptionally low variance: province => territory 1.55

        • Exceptionally high variance: congress => legislature 3.79

      • Mostly 2.50–4.00 at the end of 50 least synonymous

        • That is, there was little consensus at the lower end of the scale.


    Raukko 1994 (polysemy)

    • “Decide whether the word get carries the same meaning or two different meanings in the sentences.”

      • 0 = the same meaning

      • 2 = somewhat different meaning

      • 4 = very different meaning

        (heuristic post hoc: 4 might mean homonymy; 0 would refer to two instances of the same meaning type; typical polysemy would be 1...3)


    Raukko 1994 (polysemy)(cont’d)

    • Data from my 1994 test, see handout.


    Comparisons so far

    • Whitten & al. / synonymy

      • scale 1...7 (1 = very different meaning, 7 = same meaning)

      • synonymy ratings ranged 2.24...6.79

      • median 5.08 (most pairs were viewed at least somewhat synonymous)

    • Raukko / polysemy

      • scale 0...4 (0 = same meaning, 4 = very different meaning)

      • polysemy ratings ranged 0.45...3.13

      • average rating 1.55, median 1.34 (most pairs were viewed as having fairly similar but not identical meaning)


    Comparisons so far

    • Whitten & al. / synonymy

      • informants saw synonymy where they were supposed to

    • Raukko / polysemy

      • informants did not see large meaning difference for the most part => get is polysemous, not homonymous

      • they saw some similarities, some differences, as predicted => they saw polysemy

    • both

      • differing degrees of similarity were apparent

      • many ratings make sense, some don’t

      • method is useful but there are skewing effects and irreliability in several details of the setting


    Conclusions so far

    • In both synonymy and polysemy studies, semantic intuitions vary.

    • In both synonymy and polysemy studies, finding a scale of semantic similarity is useful.

      • Cf. Sandra & Rice 1995: 125

        • “[researchers of prepositional polysemy] cannot propose extremely fine-grained distinctions without bothering about empirical data”

        • “language users’ mental representation [...] is [in fact] characterized by a high degree of granularity”


    quantitative => qualitative

    • Whitten & al’s and Raukko’s similarity rating tests did not include informants justifying and explaining their ratings.

    • E.g., Liu (this symposium) reports tests with informants explaining their choices.

    • In Raukko’s study, qualitative results come from other types of tests

      • sorting test: (1) combine stimuli into categories, (2) give names to categories, etc.

      • production test: (1) produce examples of the use of polysemy, (2) explain links you find between them, etc.

    • Vanhatalo 2005


    Vanhatalo 2005 (synonymy)

    • her PhD, The use of questionnaires in exploring synonymy

    • several types of tests

      • choose most likely components

      • rate components

      • choose better alternative (cf. Liu)

      • complete as sentences (only the word given)

      • define typical frames

      • spell out semantic differences


    Vanhatalo 2005 (cont’d)

    • several factors investigated

      • 18 Finnish verbs of “nagging”, 17 Estonian verbs of nagging

        • the gender and age of the portrayed speaker (the subject of “nag”)

        • the degree of irritation of the portrayed speaker and hearer

        • the volume of the vocal act

      • 2-4 Finnish adjectives ‘important, central, crucial, significant’: open questions mainly


    Vanhatalo 2005 (cont’d)

    • main results (Vanhatalo 2005: 40-45): the questionnaire method

      • helped to trace differences in the meaning and use of synonyms

        • many differences not documented before in dictionaries

        • sometimes consensus, sometimes deviation

        • useful especially for large groups of semantically similar words

        • (Vanhatalo did not use the method for placing synonyms on a scale of similarity)

        • both open questions and ratings should be used


    Vanhatalo 2005 (cont’d)

    • main results (Vanhatalo) (cont’d)

      • helped to find differences between related words in Estonian and Finnish

      • sociodemographic variables caused fairly little variation

        • age and education affected a bit more than gender

        • answers critique


    Vanhatalo 2005 (cont’d)

    • main results (Vanhatalo) (cont’d)

      • when both corpus method and questionnaire method were applicable, they yielded similar results

        • however, justification of results was different

        • questionnaire method dug up semantic properties that corpus method could not

        • in addition, can tackle low-frequency words

      • results of questionnaire method can be utilized in the production of electronic dictionaries


    Other studies of synonymy that employ experimental techniques

    • Arppe & Järvikivi 2002, 2007

    • Divjak & Gries 2008

    • Liu, in this symposium

    • Oversteegen, in this symposium

    • etc.


    polysemy / qualitative

    • In experimental settings (e.g., the sorting test):

      • An informant gives a name to a meaning type, a category within polysemy

      • An informant spells out the semantic link between two meanings

      • An informant draws a hierarchy between macrotypes and microtypes (more general and more specific meaning types)

      • An informant pinpoints at cases difficult to evaluate


    And…

    • to conclude…


    Evaluating semantic similarity

    • Both synonymy and polysemy operate on the scale of semantic similarity vs. difference.

    • Knowing about the degree of similarity is one useful property of both.

    • The way to find out about it is to use elicitation/experiments.

    • There is deviation in informants’ ratings.

    • A simple explanation: informants use different criteria for evaluation.

    • Solutions:let them explicate the criteria.

      use multiple methods.


    Synonymy vs. polysemy

    • Evaluating semantic similarity between the meanings of two separate words (synonymy) is a matter of evaluating the match between two separate ”semantic events”

      • There should be mismatch, but there isn’t.

    • Evaluating semantic similarity/relatedness/ difference between the meanings of one word (polysemy) is a matter of comparing the applications of one single category.

      • There should be match between the semantic events.


    Synonymy vs. polysemy

    • When you evaluate near-synonyms, you balance between (i) the ideal of what would constitute a perfect match and (ii) the nuances of the near-synonyms

    • When you evaluate meanings of a polysemous word, you balance between (i) the assumption that some meaning should be shared and (ii) the actual semantic profile of the uses


    Synonymy vs. polysemy

    • In evaluating synonymy, the idealized equivalence can be taken from the semantic description of either of the two words.

    • In evaluating polysemy, the common factor (”core meaning”, ”shared meaning”) may be hard to find, or become too abstract.

      Maybe the first task is easier?


    General relevance

    • ”Insights in the equality or similarity of meaning may shed light on meaning itself” (Oversteegen / SKY 2010, Helsinki)

    • The question of “identical meaning” is a crucial basis for e.g. typology and language comparisons: the problem of tertium comparationis

      • Cf. Haspelmath’s plenary on Saturday


    References

    Arppe, Antti & Juhani Järvikivi 2007. Every method counts – Combining corpus-based and experimental evidence in the study of synonymy. Corpus Lingustics and Linguistic Theory 3: 2: 131-159.

    Colombo, Lucia & Giovanni B. Flores d’Arcais 1984. The meaning of Dutch prepositions: a psycholinguistic study of polysemy. Linguistics 22: 51-98.

    Divjak, Dagmar & Stefan Gries 2008: Clusters in the mind? Converging evidence from near-synonymy in Russian. The Mental Lexicon 3: 2: 188-213.

    Geeraerts, Dirk – in this symposium

    Liu, Dilin – in this symposium

    Oversteegen, Eleonore – in this symposium

    Raukko, Jarno 2003. Polysemy as flexible meaning: experiments with English get and Finnish pitää. In Brigitte Nerlich & al (eds) Polysemy. Flexible patterns of meaning in mind and language. 161-193.

    CONTINUED...


    References Author’scont’d contact information

    Sandra, Dominiek & Sally Rice 1995. Network analyses of prepositional meaning: mirroring whose mind – the linguist’s or the language user’s? Cognitive Linguistics 6: 89-130.

    Vanhatalo, Ulla 2005. Kyselytestit synonymian selvittämisessä (etc.) [The use of questionnaires in exploring synonymy, etc.] PhD thesis, U-Helsinki. http:/ethesis.helsinki.fi/julkaisut/hum/suoma/vk/vanhatalo/kyselyte.pdf

    Whitten, William B. II, W: Newton Suter, and Michael L. Frank 1979. Bidirectional Synonym Ratings of 464 Noun Pairs. Journal of Verbal Learning and Verbal Behavior 18: 109-127.

    • e-mail:

      See handout and list of participants.

    • home postal address

      See handout.

    • affiliation

      Department of Modern Languages

      Metsätalo (Unioninkatu 40 B)

      FIN-00014 University of Helsinki

      Finland


  • Login