Semantic annotation of corpora
1 / 46

Semantic Annotation of Corpora - PowerPoint PPT Presentation

  • Uploaded on

Semantic Annotation of Corpora. Christiane Fellbaum Princeton University . Outline. Preliminaries --what is (isn’t) semantic annotation --why we need it Three experiments --SemCor and what we learned from it --WordNet gloss corpus --MASC Conclusions and Outlook. Semantic annotation.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Semantic Annotation of Corpora' - orrick

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Semantic annotation of corpora

Semantic Annotation of Corpora

Christiane Fellbaum

Princeton University



--what is (isn’t) semantic annotation

--why we need it

Three experiments

--SemCor and what we learned from it

--WordNet gloss corpus


Conclusions and Outlook

Semantic annotation
Semantic annotation

Determine the meaning of (polysemous) words in their contexts

He asked the waiter for the check(bill)

She cashed the check (bank check)

I’ll check the door lock (control)


The most frequent words are also the most polysemous

Why semantic annotation
Why semantic annotation?

Synchronic perspective:

  • Understand usage of words/senses

  • Collocational properties (Firth, Church)

  • Interchangeability with other words (synonymy)

  • Distribution in local/broader contexts

  • Frequency

Why semantic annotation1
Why semantic annotation?

Diachronic perspective:

meaning shifts (e.g., gay)

new usages

--nouns used as verbs: “I’ll text you”

--frequency (Pullover => Pulli >1970)

Why semantic annotation2
Why semantic annotation?


  • Manually semantically annotated corpora provide “gold standard” for Machine Learning

  • Determining word senses are key to Natural Language Processing applications (e.g., machine translation)

What is annotated
What is annotated

  • Words from the major POS (nouns, verbs, adjectives, some adverbs)

  • Multi-word units with atomic meaning

    --phrasal verbs (check out, check up)

    --opaque compounds (road rage)

    --idioms (hit the ceiling)

Not annotated
Not annotated

  • Function words

  • Sublexical meaning/semantic components

  • Aspect, tense etc.

  • Anaphora

Method procedure

Annotation against dictionaries:

Which of the senses of a polysemous word in a dictionary corresponds to a token in a corpus?

Task can be seen as word sense disambiguation


Cut-out of WordNet’s entry for check

Semantic annotation of corpora


* S: (n) check, bank check, cheque (a written order directing a bank to pay money) "he paid all his bills by check"

* S: (n) assay, check (an appraisal of the state of affairs) "they made an assay of the contents"; "a check on its dependability under stress"

* S: (n) check, chit, tab (the bill in a restaurant) "he asked the waiter for the check"

* S: (n) arrest, check, halt, hitch, stay, stop, stoppage (the state of inactivity following an interruption) "the negotiations were in arrest"; "held them in check"; "during the halt he got some lunch"; "the momentary stay enabled him to escape the blow"; "he spent the entire stop in his seat"

* S: (n) confirmation, verification, check, substantiation (additional proof that something that was believed (some fact or hypothesis or theory) is correct) "fossils provided further confirmation of the evolutionary theory"

* S: (n) check, checkout, check-out procedure (the act of inspecting or verifying) "they made a check of their equipment"; "the pilot ran through the check-out procedure"

* S: (n) check mark, check, tick (a mark indicating that something has been noted or completed etc.) "as he called the role he put a check mark by each student's name"

* S: (n) hindrance, hinderance, deterrent, impediment, balk, baulk, check, handicap (something immaterial that interferes with or delays action or progress)

* S: (n) check, chip (a mark left after a small piece has been chopped or broken off of something)

* S: (n) check (a textile pattern of squares or crossed lines (resembling a checkerboard)) "she wore a skirt with checks"

* S: (n) bridle, check, curb (the act of restraining power or action or limiting excess) "his common sense is a bridle to his quick temper"

* S: (n) check (obstructing an opponent in ice hockey)

* S: (n) check ((chess) a direct attack on an opponent's king)


* S: (v) check, check up on, look into, check out, suss out, check over, go over, check into (examine so as to determine accuracy, quality, or condition) "check the brakes"; "Check out the engine"

* S: (v) check (make an examination or investigation) "check into the rumor"; "check the time of the class"

* S: (v) see, check, insure, see to it, ensure, control, ascertain, assure (be careful or certain to do something; make certain of something) "He verified that the valves were closed"; "See that the curtains are closed"; "control the quality of the product"

* S: (v) control, hold in, hold, contain, check, curb, moderate (lessen the intensity of; temper; hold in restraint; hold or keep within limits) "moderate your alcohol intake"; "hold your tongue"; "hold your temper"; "control your anger"

* S: (v) check (stop for a moment, as if out of uncertainty or caution) "She checked for an instant and missed a step”


Method procedure1

Manual annotation:

Trained annotators select context-appropriate sense from the dictionary

Record choice(s) via an interface

More than one sense may be selected

Semantic annotation of corpora

Machine Learning systems learn contextual features from manually annotated corpora

Contextual features can be grammatical (e.g., definiteness, number: the works) and lexical (lexical preferences)

Some assumptions
Some assumptions

Annotation against dictionary assumes that the dictionary “is right”

--covers all senses of a word

--senses are distinguished/distinguishable

Lexicographers, annotators rely on their native speaker intuition

Some na ve assumptions
Some (naïve) assumptions

Annotation is inverse of lexicography:

Lexicographer examines corpus data for a target word (KWIC lines)

Distinguishes senses (native intuition)

Crafts corresponding dictionary entries

Some na ve assumptions1
Some (naïve) assumptions

Annotator inspects target word in contexts (kind of KWIC lines)

Matches tokens to dictionary entries (using native intuition)

It s not so simple
It’s not so simple!

  • Comparison of multiple dictionaries shows little agreement

  • Lexicographers (speakers) carve up semantic space occupied by a word in different ways

  • Different assumptions of “related senses”

  • Fine-grained vs. coarse-grained distinctions (splitters vs. lumpers)

  • Missing senses

It s not so simple1
It’s not so simple!

Dictionaries were not made for annotation and word sense disambiguation

Made for look-up of unknown words, senses

User stops look-up when unknown word/sense is explained

User doesn’t examine all senses

Overlap, duplicate senses are unproblematic for lexicography…

…but problematic for annotation

Three experiments with manual semantic annotation
Three experiments with manual semantic annotation

  • SemCor

  • WordNet gloss corpus

  • MASC

Semantic concordance semcor
Semantic Concordance (SemCor)

  • First semantically annotated corpus (mid-1990s)

  • parts of Brown Corpus

  • novel Red Badge of Courage

  • annotation against WordNet

  • sequential annotation

Semcor experiment
SemCor experiment

Analyzed sub-part of annotated corpus:

660-word passage

254 target words:

88 nouns

100 verbs

39 adjectives

27 adverbs

Semcor experiment1
SemCor experiment

Number of senses ranged from 2 to 42

Mean across POS: 6.6

Nouns: 4.7

Verbs: 8.6

Adjectives: 7.9

Adverbs: 3.3

Semcor experiment2
SemCor experiment

Annotation done by

two “experts” (linguists/lexicographers) served as “gold standard”

17 trained student annotators

Analyzed expert-annotator and inter-annotator agreement

Semcor predictions
SemCor predictions

  • Higher disagreements for verbs than nouns and adjectives

  • Many nouns refer to concrete, imageable entities; meanings are stable

  • Verb meanings depend on argument structure and semantics of arguments (event participants)

  • Speakers interpret verbs but not nouns flexibly (Gentner)

  • Adjective meanings are very flexible; depend on modified noun (thus highly polysemous)

Semcor predictions1
SemCor predictions

Disagreement rate increases with number of senses (polysemy, not homonymy)

Semcor predictions2
SemCor predictions

First sense is usually the most salient, broadest

In WordNet: most frequently annotated

Annotators prefer it

May save examining remaining senses

Semcor experiment3
SemCor experiment

WordNet sense inventory was presented in two conditions

  • Frequency order

  • Randomly scrambled order

Results overview
Results (Overview)

  • Overall agreement of annotators with “experts” was 72%

  • Overall inter-annotator agreement was 82%

  • Sharp drop-off in agreement with increasing sense number (polysemy)

  • Significantly higher agreement rate for first sense

Results overview1
Results (Overview)

Annotators were asked to rate confidence with which they chose senses

Overall high (1.8 on a scale 1-5)

Lower for verbs than nouns, higher polysemy, random order

Lessons learned
Lessons learned

Sense annotation (word sense disambiguation) is feasible but hard

Difficulty depends on POS, degree of polysemy

Strong preference for broad sense

“expert”-”naïve” annotator difference

Semantic annotation of corpora

Agreement rates found in SemCor experiment are not good enough for NLP appliations

Modify annotation procedure?

Modify sense inventory? (Note that WordNet’s inventory is smaller than that of standard dictionaries)

Wordnet gloss corpus
WordNet Gloss corpus enough for NLP appliations

  • Glosses: definitions in WordNet’s sense entries (synsets)

  • Annotate nouns, verbs, adjectives in glosses against WN synsets

  • Closed system: annotated glosses are a corpus; for a given sense, glosses provide contexts

Gloss annotation
Gloss Annotation enough for NLP appliations

{debate, “discussion in which reasons are advanced for and against some proposition or proposal”}

{discussion, give_and_take,…}

{discussion, treatment,..}

{advance, move_forward,...}

{advance, progress,..}

{ advance, bring_forward}

Gloss annotation1
Gloss annotation enough for NLP appliations

Each WordNet synset has a gloss with an average of five words to be tagged (N, V, Adj, Adv)

Preliminary steps:

--parse glosses, do POS tagging


--“glob” multi-word units (phrasal verbs etc.)

Gloss tagging
Gloss Tagging enough for NLP appliations

Most words are monosemous and can be tagged automatically (monosemy is relative to WordNet!)

Metalinguistic words in the glosses are not tagged:

“to scowl is to grimace in some manner”

Manner is not meaningfully related to scowl,

but grimace is

So only grimace can provide useful information for ML

Gloss annotation2
Gloss Annotation enough for NLP appliations

Develop tagging interface (“Mantag”)

Recruit, train, and supervise annotators

Targeted, not sequential annotation: one word form (type) at a time, annotate all tokens

--annotators learn sense inventory once, apply it to all tokens

--easier, faster, greater reliability (spot checks only)

Some numbers
Some Numbers enough for NLP appliations

>35,000 fully tagged glosses

approx. 550,000 tags assigned (incl. monosemous words)

Two years and many $$

Gloss corpus has been reproduced in other languages with wordnets

Gloss corpus
Gloss corpus enough for NLP appliations

Glosses were translated into Logical Form (Hobbs)

Variables were indexed with WordNet senses

Axioms from glosses example
Axioms from Glosses: Example enough for NLP appliations

{ bridge, span (any structure that allows people or vehicles to cross an obstacle such as a river or canal...)}


<--> structureN1(x) & allowV1(x,e1) & crossV1(e1,z,y)

& obstacleN2(y) & person/vehicle(z)

personN1(z) --> person/vehicle(z)

vehicleN1(z) --> vehicle/person(z)

riverN2(y) --> obstacleN2(y)

canalN3(y) --> obstacleN2(y)

Where we are
Where we are enough for NLP appliations

Manual tagging is too expensive, too slow

Large semantically annotated corpora are still elusive

Must scale up with (semi-)automatic annotation

Semantic annotation of corpora
MASC enough for NLP appliations

Manually Annotated Sub-Corpus

A part of the American National Corpus (Ide & Suderman)

Annotated for many different phenomena

Semantic annotation against WordNet and FrameNet (Fillmore)

Masc annotation
MASC annotation enough for NLP appliations

Hand-select words for targeted annotation

High- to medium polysemy, frequency

Four annotators do sample of 50 tokens

Sense inventory is revised if necessary

Annotation of 1000 tokens against vetted sense inventory

Plan: annotate remaining tokens automatically

Automatic annotation
Automatic annotation enough for NLP appliations

Won’t be highly reliable and will at best reflect human disagreement

State of the art of automatic sense disambiguation against WordNet is 65% accuracy (against human gold standard)

Not good enough for applications!

Directions to explore
Directions to explore enough for NLP appliations

Coarser sense clustering (OntoNotes)

Crowdsourcing of annotation (Amazon Mechanical Turk)

Harness other resources linked to WordNet (PropBank, VerbNet)

Richer set of features for machine learning algorithms

Better evaluation of annotations (Passonneau): discard outliers, cluster annotators, identify confusable senses

Conclusions enough for NLP appliations

Semantically annotated corpora are highly desirable, even necessary:

Better understanding of lexical semantics, word use, etc.

NLP applications

But goals must be realistic, modest in view of the nature of the task and enumerative lexicon