Distributional clustering of english words
Download
1 / 12

Distributional clustering of English words - PowerPoint PPT Presentation


  • 61 Views
  • Uploaded on

Distributional clustering of English words. Authors: Fernando Pereira, Naftali Tishby, Lillian Lee Presenter: Marian Olteanu. Introduction. Method for automatic clustering of words Distribution in particular syntactic contexts Deterministic annealing Find lowest distortion sets of clusters

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Distributional clustering of English words' - scorpio-goldberg


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Distributional clustering of english words

Distributional clustering of English words

Authors: Fernando Pereira, Naftali Tishby, Lillian Lee

Presenter: Marian Olteanu


Introduction
Introduction

  • Method for automatic clustering of words

    • Distribution in particular syntactic contexts

    • Deterministic annealing

      • Find lowest distortion sets of clusters

      • Increasing annealing parameters

        • Clusters subdivide – hierarchical “soft” clustering

    • Clusters

      • Class models

      • Word co-occurrence


Introduction1
Introduction

  • Simple tabulation of frequencies

    • Data sparseness

  • Hindle proposed smoothing based on clustering

    • Estimating likelihood of unseen events from the frequencies of “similar” events that have been seen

      • Example: estimating the likelihood of a particular direct object for a verb from the likelihood of that direct object for similar verbs


Introduction2
Introduction

  • Hindle’s proposal

    • Words are similar if there is strong statistical evidence that they tend to participate in the same events

  • This paper

    • Factor word association tendencies into associations of words to certain hidden classes and association between classes themselves

    • Derive classes directly from data


Introduction3
Introduction

  • Classes

    • Probabilistic concepts or clusters c

      • p(c|w) for each word w

    • Different than classical “hard” Boolean classes

    • Thus, this method is more robust

      • Is not strongly affected by errors in frequency counts

  • Problem in this paper

    • 2 word classes: V and N

      • Relation between a transitive main verb and the head noun of the direct object


Problem
Problem

  • Raw knowledge:

    • fvn– frequency of occurrence of a particular pair (v,n) in the training corpus

  • Unsmoothed probability - conditional density:

    • pn(v) =

    • This is p(v|n)

  • Problem

    • How to use pn to classify the nN


Methodology
Methodology

  • Measure of similarity between distributions

    • Kullback-Leibler distance

  • This problem

    • Unsupervised learning – leardn underlying distribution of data

    • Objects have no internal structure, the only info. – statistics about joint appearance (kind of supervised learning)


Distributional clustering
Distributional Clustering

  • Goal – find clusters such that pn(v) is approximated by:

  • Solve by EM


Hierarchical clustering
Hierarchical clustering

  • Deterministic annealing

    • Sequence of phase transitions

      • Increasing the parameter β

        • Local influence of each noun on the definition of centroids



Evaluation
Evaluation

  • Relative entropy

    • Where tn is the relative frequency distribution of verbs taking n as direct object in the test set


Evaluation1
Evaluation

  • Check if the model can disambiguate between two verbs, v and v’


ad