unsupervised clustering of people places organizations in u s diplomatic cables l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Unsupervised Clustering of People, Places & Organizations in U.S. Diplomatic Cables PowerPoint Presentation
Download Presentation
Unsupervised Clustering of People, Places & Organizations in U.S. Diplomatic Cables

Loading in 2 Seconds...

play fullscreen
1 / 8

Unsupervised Clustering of People, Places & Organizations in U.S. Diplomatic Cables - PowerPoint PPT Presentation


  • 105 Views
  • Uploaded on

Unsupervised Clustering of People, Places & Organizations in U.S. Diplomatic Cables. Xuwen Cao Beyang Liu. Process Outline. Identify entities in 3891 leaked U.S. diplomatic cables published by Wikileaks Extract features from window around entities Sentiment scores Co-occurying entities

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Unsupervised Clustering of People, Places & Organizations in U.S. Diplomatic Cables' - grady


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
unsupervised clustering of people places organizations in u s diplomatic cables

Unsupervised Clustering of People, Places & Organizations in U.S. Diplomatic Cables

Xuwen Cao

Beyang Liu

process outline
Process Outline
  • Identify entities in 3891 leaked U.S. diplomatic cables published by Wikileaks
  • Extract features from window around entities
    • Sentiment scores
    • Co-occurying entities
    • Adjectives in some fixed-size window
  • Cluster entities in feature space
k means clustering
K-Means Clustering
  • Stanford NLP (NER + POS)
  • Extract Locations (LOCATION & NN)
    • eg. London, Africa, China, Caucasus
  • Sentiment Analysis on JJ (SentiWordNet)
  • Calibrate Using sentiment towards US
  • Frequency Counting
k means results
K-means Results

Entity frequency

Sentiment score

multinomial mixture model
Multinomial Mixture Model
  • Model many features as (probabilistic) function of cluster assignment
  • Naïve Bayes independence assumption
  • Maximize expected log-likelihood objective with EM

(Cluster Label)

(Features)

em initialization issues
EM Initialization Issues

Histograms of cluster sizes (k = 100)

sample clusters from multinomial mixture model
Sample Clusters from Multinomial Mixture Model
  • Examples
    • Good
      • cairo iran saudi arabiawest bankpalestinianauthorityqatar middle eastkarachi maliki
      • tripolidutch franceabujamuammaral-qadhafiicc (international criminal court)
    • Bad
      • atmar ben ali saleh european union eu icrc (red cross) wto ahmadinejad
      • helmand, karzai, seoul, brown, williams, tadic
  • Many other clusters very small or heterogeneous
  • Model seems to be cuing off of co-occurrence features the most
future direction
Future Direction
  • More advanced features, targeted toward sentiment
    • E.g. n-gram adjective phrases
  • Better model: mixture of CRF clustering, rather than Naïve Bayes