Lecture 24 distributiona l based similarity ii
Sponsored Links
This presentation is the property of its rightful owner.
1 / 14

Lecture 24 Distributiona l based Similarity II PowerPoint PPT Presentation


  • 70 Views
  • Uploaded on
  • Presentation posted in: General

Lecture 24 Distributiona l based Similarity II. CSCE 771 Natural Language Processing. Topics Distributional based word similarity Readings: NLTK book Chapter 2 ( wordnet ) Text Chapter 20. April 10, 2013. Overview. Last Time (Programming) Examples of thesaurus based word similarity

Download Presentation

Lecture 24 Distributiona l based Similarity II

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Lecture 24Distributional based Similarity II

CSCE 771 Natural Language Processing

  • Topics

    • Distributional based word similarity

  • Readings:

    • NLTK book Chapter 2 (wordnet)

    • Text Chapter 20

April 10, 2013


Overview

  • Last Time (Programming)

    • Examples of thesaurus based word similarity

      • path-similarity – memory fault ; sim-path(c1,c2) = -log pathlen(c1,c2)nick, Lin

    • extended Lesk – glosses of words need to include hypernyms

  • Today

    • Distributional methods

  • Readings:

    • Text 19,20

    • NLTK Book: Chapter 10

  • Next Time: Distributional based Similarity II


Figure 20.8 Summary of Thesaurus Similarity measures

  • Elderly moment IS-A memory fault IS-A mistake

  • sim-path correct in table


Example computing PPMI

  • Need counts so lets make up some

    • we need to edit this table to have counts


Associations

  • PMI-assoc

  • assocPMI(w, f) = log2 P(w,f) / P(w) P(f)

  • Lin- assoc - f composed of r (relation) and w’

  • assocLIN(w, f) = log2 P(w,f) / P(r|w) P(w’|w)

  • t-test_assoc (20.41)


Figure 20.10 Co-occurrence vectors

  • Dependency based parser – special case of shallow parsing

  • identify from “I discovered dried tangerines.” (20.32)

    • discover(subject I)I(subject-of discover)

    • tangerine(obj-of discover)tangerine(adj-mod dried)


Figure 20.11 Objects of the verb drink Hindle 1990


vectors review

  • dot-product

  • length

  • sim-cosine


Figure 20.12 Similarity of Vectors


Fig 20.13 Vector Similarity Summary


Figure 20.14 Hand-built patterns for hypernyms Hearst 1992


Figure 20.15


Figure 20.16


  • http://www.cs.ucf.edu/courses/cap5636/fall2011/nltk.pdf how to do in nltk

  • NLTK 3.0a1 released : February 2013

  • This version adds support for NLTK’s graphical user interfaces. http://nltk.org/nltk3-alpha/

  • which similarity function in nltk.corpus.wordnet is Appropriate for find similarity of two words?

  • I want use a function for word clustering and yarowskyalgorightm for find similar collocation in a large text.

  • http://en.wikipedia.org/wiki/Wikipedia:WikiProject_Linguistics

  • http://en.wikipedia.org/wiki/Portal:Linguistics

  • http://en.wikipedia.org/wiki/Yarowsky_algorithm

  • http://nltk.googlecode.com/svn/trunk/doc/howto/wordnet.html


  • Login