distributional clustering of english words
Download
Skip this Video
Download Presentation
Distributional clustering of English words

Loading in 2 Seconds...

play fullscreen
1 / 12

Distributional clustering of English words - PowerPoint PPT Presentation


  • 61 Views
  • Uploaded on

Distributional clustering of English words. Authors: Fernando Pereira, Naftali Tishby, Lillian Lee Presenter: Marian Olteanu. Introduction. Method for automatic clustering of words Distribution in particular syntactic contexts Deterministic annealing Find lowest distortion sets of clusters

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Distributional clustering of English words' - scorpio-goldberg


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
distributional clustering of english words

Distributional clustering of English words

Authors: Fernando Pereira, Naftali Tishby, Lillian Lee

Presenter: Marian Olteanu

introduction
Introduction
  • Method for automatic clustering of words
    • Distribution in particular syntactic contexts
    • Deterministic annealing
      • Find lowest distortion sets of clusters
      • Increasing annealing parameters
        • Clusters subdivide – hierarchical “soft” clustering
    • Clusters
      • Class models
      • Word co-occurrence
introduction1
Introduction
  • Simple tabulation of frequencies
    • Data sparseness
  • Hindle proposed smoothing based on clustering
    • Estimating likelihood of unseen events from the frequencies of “similar” events that have been seen
      • Example: estimating the likelihood of a particular direct object for a verb from the likelihood of that direct object for similar verbs
introduction2
Introduction
  • Hindle’s proposal
    • Words are similar if there is strong statistical evidence that they tend to participate in the same events
  • This paper
    • Factor word association tendencies into associations of words to certain hidden classes and association between classes themselves
    • Derive classes directly from data
introduction3
Introduction
  • Classes
    • Probabilistic concepts or clusters c
      • p(c|w) for each word w
    • Different than classical “hard” Boolean classes
    • Thus, this method is more robust
      • Is not strongly affected by errors in frequency counts
  • Problem in this paper
    • 2 word classes: V and N
      • Relation between a transitive main verb and the head noun of the direct object
problem
Problem
  • Raw knowledge:
    • fvn– frequency of occurrence of a particular pair (v,n) in the training corpus
  • Unsmoothed probability - conditional density:
    • pn(v) =
    • This is p(v|n)
  • Problem
    • How to use pn to classify the nN
methodology
Methodology
  • Measure of similarity between distributions
    • Kullback-Leibler distance
  • This problem
    • Unsupervised learning – leardn underlying distribution of data
    • Objects have no internal structure, the only info. – statistics about joint appearance (kind of supervised learning)
distributional clustering
Distributional Clustering
  • Goal – find clusters such that pn(v) is approximated by:
  • Solve by EM
hierarchical clustering
Hierarchical clustering
  • Deterministic annealing
    • Sequence of phase transitions
      • Increasing the parameter β
        • Local influence of each noun on the definition of centroids
evaluation
Evaluation
  • Relative entropy
    • Where tn is the relative frequency distribution of verbs taking n as direct object in the test set
evaluation1
Evaluation
  • Check if the model can disambiguate between two verbs, v and v’
ad