Chapter 20 part 2
Download
1 / 25

Chapter 20 Part 2 - PowerPoint PPT Presentation


  • 75 Views
  • Uploaded on

Chapter 20 Part 2. Computational Lexical Semantics Acknowledgements: these slides include material from Rada Mihalcea, Ray Mooney, Katrin Erk, and Ani Nenkova. 1. Knowledge-based WSD. Task definition

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Chapter 20 Part 2' - rasul


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Chapter 20 part 2

Chapter 20Part 2

Computational Lexical Semantics

Acknowledgements: these slides include material from Rada Mihalcea, Ray Mooney, Katrin Erk, and Ani Nenkova

1


Knowledge based wsd
Knowledge-based WSD

  • Task definition

  • Knowledge-based WSD = class of WSD methods relying (mainly) on knowledge drawn from dictionaries and/or raw text

  • Resources

    • Yes

      • Machine Readable Dictionaries

      • Raw corpora

    • No

      • Manually annotated corpora


Machine readable dictionaries
Machine Readable Dictionaries

  • In recent years, most dictionaries made available in Machine Readable format (MRD)

    • Oxford English Dictionary

    • Collins

    • Longman Dictionary of Ordinary Contemporary English (LDOCE)

  • Thesauruses – add synonymy information

    • Roget Thesaurus

  • Semantic networks – add more semantic relations

    • WordNet

    • EuroWordNet


Mrd a resource for knowledge based wsd

WordNet definitions/examples for the noun plant

  • buildings for carrying on industrial labor; "they built a large plant to manufacture automobiles”

  • a living organism lacking the power of locomotion

  • something planted secretly for discovery by another; "the police used a plant to trick the thieves"; "he claimed that the evidence against him was a plant"

  • an actor situated in the audience whose acting is rehearsed but seems spontaneous to the audience

MRD – A Resource for Knowledge-based WSD

  • For each word in the language vocabulary, an MRD provides:

    • A list of meanings

    • Definitions (for all word meanings)

    • Typical usage examples (for most word meanings)


Mrd a resource for knowledge based wsd1
MRD – A Resource for Knowledge-based WSD

  • A thesaurus adds:

    • An explicit synonymy relation between word meanings

  • A semantic network adds:

    • Hypernymy/hyponymy (IS-A), meronymy/holonymy (PART-OF), antonymy, etc.

WordNet synsets for the noun“plant”

1. plant, works, industrial plant

2. plant, flora, plant life

WordNet related concepts for the meaning “plant life”

{plant, flora, plant life}

hypernym: {organism, being}

hypomym: {house plant}, {fungus}, …

meronym: {plant tissue}, {plant part}

member holonym: {Plantae, kingdom Plantae, plant kingdom}


Lesk algorithm
Lesk Algorithm

  • (Michael Lesk 1986): Identify senses of words in context using definition overlap. That is, disambiguate more than one word.

  • Algorithm:

    • Retrieve from MRD all sense definitions of the words to be disambiguated

    • Determine the definition overlap for all possible sense combinations

    • Choose senses that lead to highest overlap

Example: disambiguate PINE CONE

  • PINE

    1. kinds of evergreen tree with needle-shaped leaves

    2. waste away through sorrow or illness

  • CONE

    1. solid body which narrows to a point

    2. something of this shape whether solid or hollow

    3. fruit of certain evergreen trees

Pine#1  Cone#1 = 0

Pine#2  Cone#1 = 0

Pine#1  Cone#2 = 1

Pine#2  Cone#2 = 0

Pine#1  Cone#3 = 2

Pine#2  Cone#3 = 0


Lesk algorithm for more than two words
Lesk Algorithm for More than Two Words?

  • I saw a man who is 98 years old and can still walk and tell jokes

    • nine open class words: see(26), man(11), year(4), old(8), can(5), still(4), walk(10), tell(8), joke(3)

  • 43,929,600 sense combinations! How to find the optimal sense combination?

  • Simulated annealing (Cowie, Guthrie, Guthrie 1992)

    • Let’s review (from CS1571)


Search types
Search Types

  • Backtracking state-space search

  • Local Search and Optimization

  • Constraint satisfaction search

  • Adversarial search


Local search
Local Search

  • Use a single current state and move only to neighbors.

  • Use little space

  • Can find reasonable solutions in large or infinite (continuous) state spaces for which the other algorithms are not suitable


Optimization
Optimization

  • Local search is often suitable for optimization problems. Search for best state by optimizing an objective function.


Visualization
Visualization

  • States are laid out in a landscape

  • Height corresponds to the objective function value

  • Move around the landscape to find the highest (or lowest) peak

  • Only keep track of the current states and immediate neighbors


Simulated annealing
Simulated Annealing

  • Based on a metallurgical metaphor

    • Start with a temperature set very high and slowly reduce it.


Simulated annealing1
Simulated Annealing

  • Annealing: harden metals and glass by heating them to a high temperature and then gradually cooling them

  • At the start, make lots of moves and then gradually slow down


Simulated annealing2
Simulated Annealing

  • More formally…

    • Generate a random new neighbor from current state.

    • If it’s better take it.

    • If it’s worse then take it with some probabilityproportional to the temperature and the delta between the new and old states.


Simulated annealing3
Simulated annealing

  • Probability of a move decreases with the amount ΔE by which the evaluation is worsened

  • A second parameter T isalso used to determine the probability: high Tallows more worse moves, Tclose to zero results in few or no bad moves

  • Scheduleinput determines the value of Tas a function of the completed cycles


function Simulated-Annealing(problem, schedule) returns a solution state

inputs: problem, a problem

schedule, a mapping from time to “temperature”

current ← Make-Node(Initial-State[problem])

for t ← 1 to ∞ do

T ← schedule[t]

ifT=0 then return current

next ← a randomly selected successor of current

ΔE ← Value[next] – Value[current]

if ΔE > 0 then current ← next

else current ← next only with probability eΔE/T


Intuitions
Intuitions

  • the algorithm wanders around during the early parts of the search, hopefully toward a good general region of the state space

  • Toward the end, the algorithm does a more focused search, making few bad moves


Lesk algorithm for more than two words1
Lesk Algorithm for More than Two Words?

  • I saw a man who is 98 years old and can still walk and tell jokes

    • nine open class words: see(26), man(11), year(4), old(8), can(5), still(4), walk(10), tell(8), joke(3)

  • 43,929,600 sense combinations! How to find the optimal sense combination?

  • Simulated annealing (Cowie, Guthrie, Guthrie 1992)

  • Given: W, set of words we are disambiguating

  • State: One sense for each word in W

  • Neighbors of state: the result of changing one word sense

  • Objective function: value(state)

    • Let DWs(state) be the words that appear in the union of the definitions of the senses in state;

    • value(state) = sum over words in DWs(state): # times it appears in the union of the definitions of the senses

    • The value will be higher, the more words appear in multiple definitions.

  • Start state: the most frequent sense of each word


Lesk algorithm a simplified version
Lesk Algorithm: A Simplified Version

  • Original Lesk definition: measure overlap between sense definitions for all words in the text

    • Identify simultaneously the correct senses for all words in the text

  • Simplified Lesk (Kilgarriff & Rosensweig 2000): measure overlap between sense definitions of a word and its context in the text

    • Identify the correct sense for one word at a time

  • Search space significantly reduced (the context in the text is fixed for each word instance)


Lesk algorithm a simplified version1
Lesk Algorithm: A Simplified Version

  • Algorithm for simplified Lesk:

    • Retrieve from MRD all sense definitions of the word to be disambiguated

    • Determine the overlap between each sense definition and the context of the word in the text

    • Choose the sense that leads to highest overlap

Example: disambiguate PINE in

“Pine cones hanging in a tree”

  • PINE

    1. kinds of evergreen tree with needle-shaped leaves

    2. waste away through sorrow or illness

Pine#1  Sentence = 1

Pine#2  Sentence = 0


Selectional preferences
Selectional Preferences

  • A way to constrain the possible meanings of words in a given context

  • E.g. “Wash a dish” vs. “Cook a dish”

    • WASH-OBJECT vs. COOK-FOOD

  • Alternative terminology

    • Selectional Restrictions

    • Selectional Preferences

    • Selectional Constraints


Acquiring selectional preferences
Acquiring Selectional Preferences

  • From raw corpora

    • Frequency counts

    • Information theory measures


Preliminaries learning word to word relations
Preliminaries: Learning Word-to-Word Relations

  • An indication of the semantic fit between two words

  • 1. Frequency counts (in a parsed corpus)

    • Pairs of words connected by a syntactic relations

  • 2. Conditional probabilities

    • Condition on one of the words


Learning selectional preferences
Learning Selectional Preferences

  • Word-to-class relations (Resnik 1993)

    • Quantify the contribution of a semantic class using all the senses subsumed by that class (e.g., the class is an ancestor in WordNet)


Using selectional preferences for wsd
Using Selectional Preferences for WSD

  • Algorithm:

    • Let N be a noun that stands in relationship R to predicate P. Let s1…sk be its possible senses.

    • For i from 1 to k, compute:

    • Ci = {c |c is an ancestor of si}

    • Ai = max for c in Ci A(P,c,R)

    • Ai is the score for sense i. Select the sense with the highest score.

  • For example: Letter has 3 senses in WordNet (written message; varsity letter; alphabetic character) and belongs to 19 classes in all.

  • Suppose we have predicate “write”. For each sense, calculate a score, by measuring association of “write” & direct object, with each ancestor of that sense.


ad