1 / 27

Graph-based WSD の続き

小町守. Graph-based WSD の続き. DMLA 2008-12-10. Word sense disambiguation task of Senseval-3 English Lexical Sample. Predict the sense of “bank”. … the financial benefits of the bank (finance) 's employee package ( cheap mortgages and pensions, etc ) , bring this up to ….

shanta
Download Presentation

Graph-based WSD の続き

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 小町守. Graph-based WSD の続き DMLA 2008-12-10

  2. Word sense disambiguation task of Senseval-3 English Lexical Sample Predict the sense of “bank” … the financial benefits of the bank (finance) 's employee package ( cheap mortgages and pensions, etc ) , bring this up to … Training instances are annotated with their sense In that same year I was posted to South Shields on the south bank (bank of the river) of the River Tyne and quickly became aware that I had an enormous burden Predict the sense of target word in the test set Possibly aligned to water a sort of bank(???) by a rushing river.

  3. WSD with adjacency matrix • Assumption • Similar examples tend to have the same label • Can define (dis-)similarity between examples • Prior knowledge, kNN • Idea • Perform clustering on an adjacency matrix

  4. Intuition behind using similarity graph • Can propagate known labels to unlabeled data without any overlapping • (Pictures taken from Zhu 2007)

  5. Using unlabeled data by similarity graph

  6. Pros and cons • Pros • Mathematically well-founded • Can achieve high performance if the graph is well-constructed • Cons • Hard to determine appropriate graph structure (and its edges’ weight) • Relatively large computational complexity • Mostly transductive • Transductive learning: (unlabeled) test instances are given when building classification model • Inductive: test instances are not known during training

  7. Word sense disambiguation by kNN Seed instance = the instance to predict its sense System output = k-nearest neighbor(k=3) Seed instance

  8. Simplified Espresso is HITS Simplified Espresso = HITS in a bipartite graph whose adjacency matrix is A Problem • No matter which seed you start with, the same instance is always ranked topmost • Semantic drift (also called topic drift in HITS) The ranking vector i tends to the principal eigenvector of ATA as the iteration proceeds regardless of the seed instances!

  9. Convergence process of Espresso Heuristics in Espresso helps reducing semantic drift (However, early stopping is required for optimal performance) Original Espresso Simplified Espresso Most frequent sense (baseline) Semantic drift occurs (always outputs the most frequent sense) Output the most frequent sense regardless of input

  10. Learning curve of Original Espresso:per-sense breakdown Most frequent sense # of most frequent sense predictions increases Recall for infrequent senses worsens even with original Espresso Other senses

  11. Q. What caused drift in Espresso? A. Espresso's resemblance to HITS HITS is an importance computation method (gives a single ranking list for any seeds) Why not use a method for another type of link analysis measure - which takes seeds into account? "relatedness" measure (it gives different rankings for different seeds)

  12. The regularized Laplacian kernel • A relatedness measure • Takes higher-order relations into account • Has only one parameter A: adjacency matrix of the graph D: (diagonal) degree matrix Graph Laplacian Regularized Laplacian matrix β: parameter Each column of Rβ gives the rankings relative to a node

  13. WSD on all nouns in Senseval-3 Espresso needs optimal stopping to achieve an equivalent performance Outperforms other graph-based methods

  14. More experiments on WSD dataset • Niu et al. “Word Sense Disambiguation using LP-based Semi-Supervised Learning” (ACL-2005) • Pham et al. “Word Sense Disambiguation with Semi-Supervised Learning” (AAAI-2005)

  15. Dataset • Pedersen (2000) line, interest data • Line: six senses= 線, 生産物, … • Interest: four senses= 利息, 関心, … • Features • Bag-of-words feature • Local collocation feature • Parts-of-speech feature

  16. Result

  17. Discussion • Proposed method (simple k-NN) achieved comparable performance to previous semi-supervised WSD systems • Does additional data help?

  18. “line” data with 90 labeled instances

  19. “line” data with 150 labeled instances

  20. “interest” data with 60 labeled instances

  21. “interest” data with 300 labeled instances

  22. Discussion (cont.) • Additional data doesn’t always help • Sometimes gets worse than nothing! • Haven’t succeeded to use large-scale data on this task (BNC data can be used) • All system suffers from data sparseness problem • Needs robust feature selection (smoothing)

  23. Multiple clusters in similarity graphs Generative model of co-occurrence

  24. Construction of similarity matrix • Let Gz be a hidden topic graph • The edge between ii and ij has weight P(z|ii,pj) • Adjacency graph Az = A(Gz) is a graph whose (i,j)-th element holds P(z|ii,pj) and all the other element are set 0 • A similarity matrix is computed by AzTAz • The (i,j)-th element holds the co-occurrence value between instance ii and ij with respect to topic z

  25. Combination of von Neumann kernels • The von Neumann kernel matrix is defined as follows: • Final kernel matrix is computed by summing the kernel matrices of all hidden topic

  26. Result

  27. Discussion • Poor result on proposed method • Likely to be caused by mis-implimentation or a bug • The number of clusters (hidden variable: z) does not seem to strongly affect the performance (tested |z| = 5, 20. Got 3 points improvement on increasing |z| to 20, but still below most frequent sense baseline)

More Related