Evaluating semantic similarity and sameness in studies of
This presentation is the property of its rightful owner.
Sponsored Links
1 / 51

Evaluating semantic similarity and sameness in studies of polysemy and synonymy PowerPoint PPT Presentation


  • 175 Views
  • Uploaded on
  • Presentation posted in: General

Evaluating semantic similarity and sameness in studies of polysemy and synonymy. Jarno Raukko (U. Helsinki). For a full version of the PPT, see handout distributed Oct 28, 2010. SKY webpage version. Examples. Are thrifty and stingy synonyms? EXPECTED ANSWER: ”Well, not quite.”

Download Presentation

Evaluating semantic similarity and sameness in studies of polysemy and synonymy

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Evaluating semantic similarity and sameness in studies of polysemy and synonymy

Evaluating semantic similarity and sameness in studies of polysemy and synonymy

Jarno Raukko (U. Helsinki)


Sky webpage version

For a full version of the PPT, seehandoutdistributedOct 28, 2010.

SKY webpage version


Evaluating semantic similarity and sameness in studies of polysemy and synonymy

Examples

Are thrifty and stingy synonyms?

EXPECTED ANSWER:

”Well, not quite.”

2. Are violin and fiddle synonyms?

EXPECTED ANSWER:

”Well, almost.”

(SYNONYMY)

3. Does back have the same meaning in

My back hurts and

I came back?

EXPECTED ANSWER:

”Not at all. Different.”

4. Does back have the same meaning in

I came back and

I got it back?

EXPECTED ANSWER:

”Well, almost.”

(POLYSEMY)


Evaluating semantic similarity and sameness in studies of polysemy and synonymy

Relevance of semantic similarity (vs. difference)

  • In synonymy: you expect similarity for a pair/(set) of items to be of interest

  • In polysemy: primarily, you expect difference for a pair/(set) of items to be of interest; secondarily, you group items according to similarity and difference


Evaluating semantic similarity and sameness in studies of polysemy and synonymy

Yet…


Evaluating semantic similarity and sameness in studies of polysemy and synonymy

synonymy --- polysemy ?

  • Dirk Geeraerts tomorrow in Helsinki:

    • ”The problem of synonymy and the problem of polysemy are essentially the same”

  • Dylan Glynn & Justyna Robinson (eds, in press)

    • Polysemy and Synonymy. Corpus methods and applications in Cognitive Linguistics. Amsterdam: Benjamins.


Evaluating semantic similarity and sameness in studies of polysemy and synonymy

synonymy

WORD 1WORD 2

If their semantic content is similar or the same,

this is a case of synonymy.

If their semantic content is (very) different, a researcher of synonymy ignores this case.


Evaluating semantic similarity and sameness in studies of polysemy and synonymy

polysemy

MEANING 1MEANING 2

OF WORD 1OF WORD 1

The starting point is that Word 1 has at least 2 (different) meanings.

If meanings 1 and 2 are very similar, this might be a case of vagueness.

If meanings 1 and 2 are totally different (and not related semantically), this might be a case of homonymy.

If meanings 1 and 2 are somewhat different but somehow relatable (or a bit similar), this is probably a case of polysemy.

henceforth W = word M = meaning


Evaluating semantic similarity and sameness in studies of polysemy and synonymy

scale of similarity: synonymy

The meaning of W1 and W2 is…

THE SAME --------------------------------- DIFFERENT

perfectnear-weakNOT

synonymysynonymysynonymyWORTH

fullsemi-quasi-DISCUSSION

synonymysynonymysynonymy


Evaluating semantic similarity and sameness in studies of polysemy and synonymy

scale of similarity: polysemy

The meaning of M1 and M2 (of W1) is…

THE SAME --------------------------------- very DIFFERENT

twovague-polysemyhomonymy

instancesness(ambiguity)

of the same

meaningtwo instancesinstances

of the same of different (yet related)

meaning typemeaning types


Main question

Main question

  • Is semantic similarity somehow different when we look at polysemy than we look at synonymy?


Differences so far

Differences so far

  • Which is the default, similarity or difference?

  • In synonymy, we idealize on the extreme of the scale, but mainly look at the part of the scale which is (fairly) close to the extreme.

  • In polysemy, we operate pretty much on the whole scale, with focus on the middle.


Evaluating semantic similarity and sameness in studies of polysemy and synonymy

synonymy --- polysemy ?

  • when you study synonymy, the polysemy of the items gets in the way

    • can you ever say “W1 and W2 are synonymous”?

    • should you always say “Mx of W1 and My of W2” are synonymous?

  • when you study polysemy, you often use synonyms to talk about meanings

    • “Are get ‘receive’ and get ‘arrive’ meanings of the same verb?”


Evaluating semantic similarity and sameness in studies of polysemy and synonymy

synonymy --- polysemy ?

  • Synonymy occurs when meaning is shared (but form differs)

  • Polysemy occurs when form is shared (but meaning differs)

  • Synonymy is a relational lexical-semantic property that unites (parts of the semantic potential of) “accidentally” coinciding words

    • The forms of words involved in the synonymy relationship are arbitrary (although the relationships might be non-arbitrary, cf. Levin this morning)

    • The semantic value (that is shared) is motivating enough that two or more forms coincide on it

    • It is typical that one meaning can be expressed with two different words.


Evaluating semantic similarity and sameness in studies of polysemy and synonymy

synonymy --- polysemy ?

  • Polysemy is a semantic property of one word at a time that unites meanings. The relationship between them is motivated, but it is only sometimes predictable.

    • It is not accidental or arbitrary that words acquire polysemy. It is in their nature. :-)

    • It is typical for semantic value to be flexible, extended, and “multiplied”.

    • Polysemy is about categorization, both between words (W1 covers a semantic territory) and within a word (M1 and M2 are categories too).

  • One form : One meaning

    • a principle that cognition may strive for / take as a default

    • synonymy breaks it

    • polysemy breaks it


Evaluating semantic similarity and sameness in studies of polysemy and synonymy

synonymy --- polysemy ?

  • The role of co(n)text

    • You can evaluate synonymy in identical co(n)texts:

      • I like to play the fiddle in bars.I like to play the violin in bars.

    • Usually you evaluate polysemy in non-identical co(n)texts

      • I got to Zabriskie Point.I got to a point in my life where…

    • But you can use identical co(n)texts as well.

    • I got to be the last one. I got to be the last one.


Evaluating semantic similarity and sameness in studies of polysemy and synonymy

evaluating

  • To study shades of semantic similarity, we need to evaluate it.

  • A corpus cannot tell us if two instances are semantically similar

    • It requires human judgement

  • The main use of evaluating in this paper:

    • How informants / test subjects / speakers

      • evaluate the semantic similarity (or difference)

      • of linguistic items in a more or less experimental setting (e.g., similarity rating test)

    • ≈ Data elicitation ≈ Population test


Evaluating

evaluating

  • quantitative:

    • Estimate the degree of synonymy

      (or semantic distance between two meanings in polysemy)

  • qualitative:

    • Justify / explain / explicate

      the nature of / the reason for

      semantic similarity


Evaluating takes place in real life as well

evaluating takes place in real life as well

  • synonymy (examples)

    • in linguistic production, you e.g. estimate which of the near-synonyms might suit your needs best

    • in comprehension, you e.g. estimate whether near-synonyms that you have encountered refer to the same semantic value

    • in communication, when you negotiate meaning, you e.g. operate with synonymous alternatives

  • polysemy (examples)

    • in production, you e.g. apply words to new contexts

    • in comprehension, you e.g. approximate meanings according to related meanings of the same word

    • jokes often exploit polysemy

    • polysemy may cause misunderstandings

    • in communication, when you negotiate meaning, you e.g. cross-check with polysemy of other words


Evaluating semantic similarity and sameness in studies of polysemy and synonymy

(Back to experiments/elicitation.) Expected difference between synonymy and polysemy, 1

  • If an informant is asked to rate the semantic similarity/difference of two words,

    • the very fact that they are different words might cause her/him to presuppose that there is at least some semantic difference.

    • Therefore, rating two words ”semantically identical” requires a marked choice.

    • However, if the informant realizes that the researcher is after synonymy, then evaluating W1 and W2 as semantically similar is more likely.


Evaluating semantic similarity and sameness in studies of polysemy and synonymy

Expected difference between synonymy and polysemy, 2

  • If an informant is asked to rate the semantic similarity/difference of two meanings of one word,

    • the very fact that they are uses/instances of the same word might cause her/him to presuppose that there is at least some semantic similarity.

    • Therefore, rating two words ”semantically totally/very different” requires a marked choice.

    • However, if the informant realizes that the researcher is after polysemy, then evaluating M1 and M2 as semantically different is more likely.


Evaluating semantic similarity and sameness in studies of polysemy and synonymy

Factors that influence

  • In both cases (synonymy and polysemy)

    • it matters a great deal

      • Which test (type) we use

      • What the instructions (exact phrasings) are

      • Whether there is an example rating given by the researcher

      • What the selection of stimuli is

      • What the linguistic context of each stimulus is

      • Which types of cases have been placed in the beginning of the test (or, the order in general)


Evaluating semantic similarity and sameness in studies of polysemy and synonymy

Factors that influence

  • Should we expect (total) consensus?

  • No. There will be subjective differences.

  • Why?

    • The nature of semantics:

      • Based on intersubjective convention

      • Based on negotiation and flexibility

      • Must allow for variability and variation


Evaluating semantic similarity and sameness in studies of polysemy and synonymy

Examples from (more or less) experimental studies on synonymy and polysemy


Whitten al 1979 synonymy

Whitten & al. 1979 (synonymy)

  • “Indicate the degree to which two words have the same meaning by writing a digit from 1 to 7.”

    • 7 =excellent synonymy

    • 1 = poor synonymy

  • All 464 stimulus noun pairs were listed as synonyms in standard references.

  • The rated degree of synonymy ranged from 6.79 to 2.24. The median was 5.08.

  • If placed within context of nonsynonym pairs, the ratings for the low end might have been higher.


  • Whitten al 1979 synonymy cont d

    Whitten & al. 1979 (synonymy) cont’d

    • Stimulus pairs at the high end:

      • purchase – buy6.79

      • lawyer – attorney6.78

      • autumn – fall6.72

      • penny – cent6.71

      • taxi – cab6.71

  • Stimulus pairs close to the median

    • college – university5.12

    • output – yield5.10

    • expert – authority5.09

    • effort – attempt5.08

    • servant – maid 5.08

    • soldier – warrior5.07


  • Whitten al 1979 synonymy cont d1

    Whitten & al. 1979 (synonymy) cont’d

    • Stimulus pairs at the low end:

      • thunder – clap2.72

      • patient – invalid 2.55

      • visit – chat 2.52

      • suburb – neighborhood2.34

      • needle – spike2.24

  • Although instructions said that all stimuli are nouns, some of these are more common as verbs: buy, purchase, visit, chat

  • The polysemy is obvious in many cases: fall, authority, clap, patient, invalid


  • Whitten al 1979 synonymy cont d2

    Whitten & al. 1979 (synonymy) cont’d

    • The main variable that they paid attention to was the order of the two stimuli: ½ of the informants got “forward order”, ½ got “back order”.

      • In 1979 one of their main aims was to study the structurings of the mental lexicon and lexical access.

      • Example: purchase => buy 6.72

        buy => purchase6.86

      • On average, perceived synonymy was affected by word order.

      • For 21 word pairs, the effect of the order was significant.


    Whitten al 1979 synonymy cont d3

    Whitten & al. 1979 (synonymy) cont’d

    • Some of the 21 word pairs where the order played a significant role in the rating of the degree of synonymy:

      motive => reason6.28reason => motive5.56

      quarter => fourth6.24fourth => quarter5.00

      mission => task5.66task => mission4.84

      era => age5.80age => era4.60

      appetite => hunger5.18hunger => appetite4.24

      nectar => honey4.94honey => nectar3.68

      aborigine => native 4.52native => aborigine3.22

    • Generalization: a more specific, more academic, and less polysemous word prompts a positive synonymy judgement more readily than vice versa.


    Whitten al 1979 synonymy cont d4

    Whitten & al. 1979 (synonymy) cont’d

    • Variance (between informants)

      • Mostly .50–1.20 at the end of 50 most synonymous

        • Exceptionally high variance at the high synonymy end:

          • murder => homicide 2.75 (cf. homicide => murder 1.03)

      • Mostly 2.00–3.00 at the median of the scale

        • Exceptionally low variance: province => territory 1.55

        • Exceptionally high variance: congress => legislature 3.79

      • Mostly 2.50–4.00 at the end of 50 least synonymous

        • That is, there was little consensus at the lower end of the scale.


    Raukko 1994 polysemy

    Raukko 1994 (polysemy)

    • “Decide whether the word get carries the same meaning or two different meanings in the sentences.”

      • 0 = the same meaning

      • 2 = somewhat different meaning

      • 4 = very different meaning

        (heuristic post hoc: 4 might mean homonymy; 0 would refer to two instances of the same meaning type; typical polysemy would be 1...3)


    Raukko 1994 polysemy cont d

    Raukko 1994 (polysemy)(cont’d)

    • Data from my 1994 test, see handout.


    Comparisons so far

    Comparisons so far

    • Whitten & al. / synonymy

      • scale 1...7 (1 = very different meaning, 7 = same meaning)

      • synonymy ratings ranged 2.24...6.79

      • median 5.08 (most pairs were viewed at least somewhat synonymous)

    • Raukko / polysemy

      • scale 0...4 (0 = same meaning, 4 = very different meaning)

      • polysemy ratings ranged 0.45...3.13

      • average rating 1.55, median 1.34 (most pairs were viewed as having fairly similar but not identical meaning)


    Comparisons so far1

    Comparisons so far

    • Whitten & al. / synonymy

      • informants saw synonymy where they were supposed to

    • Raukko / polysemy

      • informants did not see large meaning difference for the most part => get is polysemous, not homonymous

      • they saw some similarities, some differences, as predicted => they saw polysemy

    • both

      • differing degrees of similarity were apparent

      • many ratings make sense, some don’t

      • method is useful but there are skewing effects and irreliability in several details of the setting


    Conclusions so far

    Conclusions so far

    • In both synonymy and polysemy studies, semantic intuitions vary.

    • In both synonymy and polysemy studies, finding a scale of semantic similarity is useful.

      • Cf. Sandra & Rice 1995: 125

        • “[researchers of prepositional polysemy] cannot propose extremely fine-grained distinctions without bothering about empirical data”

        • “language users’ mental representation [...] is [in fact] characterized by a high degree of granularity”


    Quantitative qualitative

    quantitative => qualitative

    • Whitten & al’s and Raukko’s similarity rating tests did not include informants justifying and explaining their ratings.

    • E.g., Liu (this symposium) reports tests with informants explaining their choices.

    • In Raukko’s study, qualitative results come from other types of tests

      • sorting test: (1) combine stimuli into categories, (2) give names to categories, etc.

      • production test: (1) produce examples of the use of polysemy, (2) explain links you find between them, etc.

    • Vanhatalo 2005


    Vanhatalo 2005 synonymy

    Vanhatalo 2005 (synonymy)

    • her PhD, The use of questionnaires in exploring synonymy

    • several types of tests

      • choose most likely components

      • rate components

      • choose better alternative (cf. Liu)

      • complete as sentences (only the word given)

      • define typical frames

      • spell out semantic differences


    Vanhatalo 2005 cont d

    Vanhatalo 2005 (cont’d)

    • several factors investigated

      • 18 Finnish verbs of “nagging”, 17 Estonian verbs of nagging

        • the gender and age of the portrayed speaker (the subject of “nag”)

        • the degree of irritation of the portrayed speaker and hearer

        • the volume of the vocal act

      • 2-4 Finnish adjectives ‘important, central, crucial, significant’: open questions mainly


    Vanhatalo 2005 cont d1

    Vanhatalo 2005 (cont’d)

    • main results (Vanhatalo 2005: 40-45): the questionnaire method

      • helped to trace differences in the meaning and use of synonyms

        • many differences not documented before in dictionaries

        • sometimes consensus, sometimes deviation

        • useful especially for large groups of semantically similar words

        • (Vanhatalo did not use the method for placing synonyms on a scale of similarity)

        • both open questions and ratings should be used


    Vanhatalo 2005 cont d2

    Vanhatalo 2005 (cont’d)

    • main results (Vanhatalo) (cont’d)

      • helped to find differences between related words in Estonian and Finnish

      • sociodemographic variables caused fairly little variation

        • age and education affected a bit more than gender

        • answers critique


    Vanhatalo 2005 cont d3

    Vanhatalo 2005 (cont’d)

    • main results (Vanhatalo) (cont’d)

      • when both corpus method and questionnaire method were applicable, they yielded similar results

        • however, justification of results was different

        • questionnaire method dug up semantic properties that corpus method could not

        • in addition, can tackle low-frequency words

      • results of questionnaire method can be utilized in the production of electronic dictionaries


    Other studies of synonymy that employ experimental techniques

    Other studies of synonymy that employ experimental techniques

    • Arppe & Järvikivi 2002, 2007

    • Divjak & Gries 2008

    • Liu, in this symposium

    • Oversteegen, in this symposium

    • etc.


    Polysemy qualitative

    polysemy / qualitative

    • In experimental settings (e.g., the sorting test):

      • An informant gives a name to a meaning type, a category within polysemy

      • An informant spells out the semantic link between two meanings

      • An informant draws a hierarchy between macrotypes and microtypes (more general and more specific meaning types)

      • An informant pinpoints at cases difficult to evaluate


    Evaluating semantic similarity and sameness in studies of polysemy and synonymy

    And…

    • to conclude…


    Evaluating semantic similarity

    Evaluating semantic similarity

    • Both synonymy and polysemy operate on the scale of semantic similarity vs. difference.

    • Knowing about the degree of similarity is one useful property of both.

    • The way to find out about it is to use elicitation/experiments.

    • There is deviation in informants’ ratings.

    • A simple explanation: informants use different criteria for evaluation.

    • Solutions:let them explicate the criteria.

      use multiple methods.


    Synonymy vs polysemy

    Synonymy vs. polysemy

    • Evaluating semantic similarity between the meanings of two separate words (synonymy) is a matter of evaluating the match between two separate ”semantic events”

      • There should be mismatch, but there isn’t.

    • Evaluating semantic similarity/relatedness/ difference between the meanings of one word (polysemy) is a matter of comparing the applications of one single category.

      • There should be match between the semantic events.


    Synonymy vs polysemy1

    Synonymy vs. polysemy

    • When you evaluate near-synonyms, you balance between (i) the ideal of what would constitute a perfect match and (ii) the nuances of the near-synonyms

    • When you evaluate meanings of a polysemous word, you balance between (i) the assumption that some meaning should be shared and (ii) the actual semantic profile of the uses


    Synonymy vs polysemy2

    Synonymy vs. polysemy

    • In evaluating synonymy, the idealized equivalence can be taken from the semantic description of either of the two words.

    • In evaluating polysemy, the common factor (”core meaning”, ”shared meaning”) may be hard to find, or become too abstract.

      Maybe the first task is easier?


    Evaluating semantic similarity and sameness in studies of polysemy and synonymy

    General relevance

    • ”Insights in the equality or similarity of meaning may shed light on meaning itself” (Oversteegen / SKY 2010, Helsinki)

    • The question of “identical meaning” is a crucial basis for e.g. typology and language comparisons: the problem of tertium comparationis

      • Cf. Haspelmath’s plenary on Saturday


    Evaluating semantic similarity and sameness in studies of polysemy and synonymy

    References

    Arppe, Antti & Juhani Järvikivi 2007. Every method counts – Combining corpus-based and experimental evidence in the study of synonymy. Corpus Lingustics and Linguistic Theory 3: 2: 131-159.

    Colombo, Lucia & Giovanni B. Flores d’Arcais 1984. The meaning of Dutch prepositions: a psycholinguistic study of polysemy. Linguistics 22: 51-98.

    Divjak, Dagmar & Stefan Gries 2008: Clusters in the mind? Converging evidence from near-synonymy in Russian. The Mental Lexicon 3: 2: 188-213.

    Geeraerts, Dirk – in this symposium

    Liu, Dilin – in this symposium

    Oversteegen, Eleonore – in this symposium

    Raukko, Jarno 2003. Polysemy as flexible meaning: experiments with English get and Finnish pitää. In Brigitte Nerlich & al (eds) Polysemy. Flexible patterns of meaning in mind and language. 161-193.

    CONTINUED...


    References author s cont d contact information

    References Author’scont’d contact information

    Sandra, Dominiek & Sally Rice 1995. Network analyses of prepositional meaning: mirroring whose mind – the linguist’s or the language user’s? Cognitive Linguistics 6: 89-130.

    Vanhatalo, Ulla 2005. Kyselytestit synonymian selvittämisessä (etc.) [The use of questionnaires in exploring synonymy, etc.] PhD thesis, U-Helsinki. http:/ethesis.helsinki.fi/julkaisut/hum/suoma/vk/vanhatalo/kyselyte.pdf

    Whitten, William B. II, W: Newton Suter, and Michael L. Frank 1979. Bidirectional Synonym Ratings of 464 Noun Pairs. Journal of Verbal Learning and Verbal Behavior 18: 109-127.

    • e-mail:

      See handout and list of participants.

    • home postal address

      See handout.

    • affiliation

      Department of Modern Languages

      Metsätalo (Unioninkatu 40 B)

      FIN-00014 University of Helsinki

      Finland


  • Login