Lecture 24 distributiona l based similarity ii
This presentation is the property of its rightful owner.
Sponsored Links
1 / 14

Lecture 24 Distributiona l based Similarity II PowerPoint PPT Presentation


  • 63 Views
  • Uploaded on
  • Presentation posted in: General

Lecture 24 Distributiona l based Similarity II. CSCE 771 Natural Language Processing. Topics Distributional based word similarity Readings: NLTK book Chapter 2 ( wordnet ) Text Chapter 20. April 10, 2013. Overview. Last Time (Programming) Examples of thesaurus based word similarity

Download Presentation

Lecture 24 Distributiona l based Similarity II

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Lecture 24 distributiona l based similarity ii

Lecture 24Distributional based Similarity II

CSCE 771 Natural Language Processing

  • Topics

    • Distributional based word similarity

  • Readings:

    • NLTK book Chapter 2 (wordnet)

    • Text Chapter 20

April 10, 2013


Overview

Overview

  • Last Time (Programming)

    • Examples of thesaurus based word similarity

      • path-similarity – memory fault ; sim-path(c1,c2) = -log pathlen(c1,c2)nick, Lin

    • extended Lesk – glosses of words need to include hypernyms

  • Today

    • Distributional methods

  • Readings:

    • Text 19,20

    • NLTK Book: Chapter 10

  • Next Time: Distributional based Similarity II


Figure 20 8 summary of thesaurus similarity measures

Figure 20.8 Summary of Thesaurus Similarity measures

  • Elderly moment IS-A memory fault IS-A mistake

  • sim-path correct in table


Example computing ppmi

Example computing PPMI

  • Need counts so lets make up some

    • we need to edit this table to have counts


Associations

Associations

  • PMI-assoc

  • assocPMI(w, f) = log2 P(w,f) / P(w) P(f)

  • Lin- assoc - f composed of r (relation) and w’

  • assocLIN(w, f) = log2 P(w,f) / P(r|w) P(w’|w)

  • t-test_assoc (20.41)


Figure 20 10 co occurrence vectors

Figure 20.10 Co-occurrence vectors

  • Dependency based parser – special case of shallow parsing

  • identify from “I discovered dried tangerines.” (20.32)

    • discover(subject I)I(subject-of discover)

    • tangerine(obj-of discover)tangerine(adj-mod dried)


Figure 20 11 objects of the verb drink hindle 1990

Figure 20.11 Objects of the verb drink Hindle 1990


Vectors review

vectors review

  • dot-product

  • length

  • sim-cosine


Figure 20 12 similarity of vectors

Figure 20.12 Similarity of Vectors


Fig 20 13 vector similarity summary

Fig 20.13 Vector Similarity Summary


Figure 20 14 hand built patterns for hypernyms hearst 1992

Figure 20.14 Hand-built patterns for hypernyms Hearst 1992


Figure 20 15

Figure 20.15


Figure 20 16

Figure 20.16


Lecture 24 distributiona l based similarity ii

  • http://www.cs.ucf.edu/courses/cap5636/fall2011/nltk.pdf how to do in nltk

  • NLTK 3.0a1 released : February 2013

  • This version adds support for NLTK’s graphical user interfaces. http://nltk.org/nltk3-alpha/

  • which similarity function in nltk.corpus.wordnet is Appropriate for find similarity of two words?

  • I want use a function for word clustering and yarowskyalgorightm for find similar collocation in a large text.

  • http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Linguistics

  • http://en.wikipedia.org/wiki/Portal:Linguistics

  • http://en.wikipedia.org/wiki/Yarowsky_algorithm

  • http://nltk.googlecode.com/svn/trunk/doc/howto/wordnet.html


  • Login