1 / 55

Literature Survey: Graph-based Clustering and its Application in Coreference Resolution

Literature Survey: Graph-based Clustering and its Application in Coreference Resolution. Zheng Chen Computer Science Department The Graduate Center , The City University of New York November 24, 2009 . Motivations and Goals. Motivations

Albert_Lan
Download Presentation

Literature Survey: Graph-based Clustering and its Application in Coreference Resolution

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Literature Survey: Graph-based Clustering and its Application in Coreference Resolution Zheng Chen Computer Science Department The Graduate Center , The City University of New York November 24, 2009

  2. Motivations and Goals • Motivations • Graph-based clustering has attracted researchers from various fields • Theoreticians are busy studying quality measures and algorithms • Practitioners are busy adapting the algorithms to their own applications • Some algorithms are known and popular in one field while new for the other • Goals • Provide an overview of graph-based clustering methodology • Applied to: coreference resolution Literature Survey

  3. Outline (Part I: Theory) Graph-based Clustering Methodology (a five-part story) Literature Survey

  4. Outline (Part II: Application) Coreference Resolution: A Case Study of Applying Graph-based Clustering • Entity Coreference Resolution • A two step procedure: classification and clustering • Graph-based (Nicolae and Nicolae, 2006) • Event Coreference Resolution • Agglomerative clustering algorithm (Chen et al., 2009a) • Graph-based (Chen and Ji, 2009b) Conclusions Literature Survey

  5. Part I Graph-based Clustering Methodology

  6. Clustering in Graph Perspective Literature Survey

  7. Graph Notation Literature Survey

  8. Hypothesis The hypothesis can be stated in different ways: • a graph can be partitioned into densely connected subgraphs that are sparsely connected to each other • A random walk that visits a dense sub-graph will likely stay in the sub-graph until many of its nodes have been visited • Considering all shortest paths between all pairs of nodes, edges between dense sub-graphs are likely to be in many shortest paths Manhattan Queens Literature Survey

  9. Modeling • Determine the meaning of vertices, edges • Compute the edge weights • Graph construction • Which graph should be chosen and how to choose parameters? (no theoretical justifications) Literature Survey

  10. Measure Literature Survey

  11. Measure: Cheat Sheet Formulas: Objective functions: Literature Survey

  12. 0.1 5 0.8 1 0.8 0.8 0.6 6 2 4 0.7 0.2 0.8 3 Measure: Computation Examples T=total weights in the graph intra_density (C1) = (0.8+0.8+0.6)/T inter_density(C1, C2)=(0.1+0.2)/T cut (C1, C2) =0.1+0.2 ratiocut(C1)= cut (C1, C2) /3 vol (C1)=0.8+0.8+0.6+0.1+0.2 ncut (C1)= cut (C1, C2) / vol (C1) expansion(C1)=min{1.6/1, 1.4/1, 1.4/1} =1.4 conductance(C1)=min{1.6/1.6, 1.4/1.5,1.4/1.6}=1.4/1.6 C2 C1 the fraction of edges inside cluster C1 expected fraction of edges in C1,if edges were located at random in the graph modularity(C1)=3/8-(4/8)2 Literature Survey

  13. Measure Summary: NP-hard problem for optimizing each of the measures • intra-cluster density and inter-cluster sparsity in (Ausiello et al., 2002; Wagner and Wagner, 1993) • ncut (Shi and Malik, 2000) • expansion and conductance (Ausiello et al., 2002; Šíma and Schaeffer, 2006) • bicriteria in (Kannan et al., 2000) • modularity (Brandes, 2006) Any efficient algorithm, which has been claimed to solve the optimal problem with polynomial-time complexity, is heuristic and yields sub-optimal clustering. Literature Survey

  14. Algorithm m: number of edges, n: number of nodes, k: number of clusters Literature Survey

  15. Spectral clustering: Laplacian Matrix Literature Survey

  16. Spectral clustering: Main Algorithm Literature Survey

  17. Spectral clustering: Comments • unnormalized spectral clustering:ratiocut measure normalized spectral clustering: ncutmeasure • Which spectral clustering algorithm do we choose? • Regular graph: works equally well • the degrees in the graph are broadly distributed, prefer normalized rather than unnormalized • normalized case: prefer rather than • Why successful? • Does not make assumption on the form of the clusters • Efficient: • Lanczos algorithm to solve eigenvalue problem m: the number of edges, n: the number of vertices • No worry about “local” optimum traps • Unstable under different choices of the parameters when constructing the graph Literature Survey

  18. Girvan and Newman Algorithm(Girvan and Newman, 2002) • Edge Betweenness • when a graph is made of tightly bound clusters, loosely interconnected, all shortest paths between clusters have to go through few inter-cluster connections. • Algorithm • 1. Calculate betweensess score for each edge • 2. Remove the one with the highest score • 3.Recalculate betweensess • 4. repeat from step 2 • Comments • optimizing modularity measure • Good results in real data • Complexity remains an issue, for sparse graph Literature Survey

  19. Newman fast algorithm (Newman, 2004) • Algorithm • 1. Separate each node solely into n clusters. • 2. Calculate the increase of Q for all possible cluster pairs. • 3. Merge the pair which leads to the greatest increase in Q. • 4. Repeat 2 & 3 until the modularity Q reaches the maximal value. • Comments • Greedy optimizations technique • Advantage in complexity with on a sparse graph, 50 000 nodes in minutes rather than years Literature Survey

  20. Algorithm: Summary • No algorithm is a panacea • A clustering algorithm was usually proposed to optimize some quality measure. Unfair to compare between two algorithms favoring two different measures • No measure can capture the full characteristics of cluster structures, thus no perfect algorithm • No definition for so called “best clustering”. The “best” depends on applications, data characteristics, granularity and so on. Literature Survey

  21. Evaluation • Internal (intrinsic) measures • External(extrinsic) measures • Are there any formal constraints (properties, criteria) that an ideal extrinsic measure should satisfy? • Do the extrinsic measures proposed so far satisfy the constraints? Literature Survey

  22. Evaluation: Formal Constraints (Amigo et al., 2008) • homogeneity • completeness • rag bag • cluster size vs. quantity Rosenberg and Hirschberg (2007) Literature Survey

  23. Evaluation Measures Literature Survey

  24. Measures for Coreference Resolution • MUC : • no credits for separating out singleton clusters • all errors are considered to be equal • B-Cubed : • overcomes the two drawbacks of MUC measure • give multiple credits to a single item • ECM : • seeks an optimal alignment between the system clustering and the reference clustering Literature Survey

  25. Satisfaction of Formal Constraints for Various Measures • Extend the work of (Amigo et al., 2008) on more measures: adjusted rand index, V measure, MUC measure and ECM measure • Re-compute all the scores • None of the measures except B-Cubed F-measure can satisfy all the four constraints • ECM F-measure fails three constraints: homogeneity, completeness and rag bag Literature Survey

  26. Literature Survey

  27. Future Directions • Scalability • graphs in real applications are growing rapidly • graphs are changing dynamically • Stability • perturbations in the graph • Statistical significance • how significant is it comparing with a clustering produced by a null model of the graph Literature Survey

  28. Part II Coreference Resolution: a Case Study of Applying Graph-based Clustering Methodology

  29. Coreference Resolution • Entity coreference resolution Identifying which noun phrases (NPs, or mentions) refer to the same real-world entity in text. • An entity is an object in the real world such as person, organization, facility • A mention is a textual reference to an entity. • Event coreference resolutionIdentifying which event mentions refer to the same event in text. • An event is a specific occurrence involving participants. • An event mention includes a distinguished trigger(the word that most clearly expresses an event occurs) and involving arguments (entities/temporal expressions that play certain roles in the event). Literature Survey

  30. Entity Coreference Resolution: an Example John Perry, of Weston Golf Club, announced his resignation yesterday. He was the President of the Massachusetts Golf Association. During his two years in office, Perry guided the MGA into a closer relationship with the Women's Golf Association of Massachusetts. Literature Survey

  31. Event Coreference Resolution: an Example EM4Ankara police chief ErcumentYilmaz visited the site of the morningblast . EM1An explosion in a cafe at one of the capital's busiest intersections killed one woman and injured another Tuesday. EM2Police were investigating the cause of the explosion inthe restroom of the multistory Crocodile Cafe in the commercial district of Kizilayduring the morning rush hour . EM5The explosion comes a month after EM6a bomb exploded at a McDonald's restaurant in Istanbul, causing damage but no injuries . EM7Radical leftist, Kurdish and Islamic groups are active in the country and have carried out the bombing in the past . EM3The blast shattered walls and windows in the building . Literature Survey

  32. Event Coreference Resolution: an Example Literature Survey

  33. A Parallel Comparison between Entity Coreference Resolution and Event Coreference Resolution The two problems are similar because: • the problem descriptions are similar • the mathematical interpretations are similar • They can be solved by applying a two-step procedure • they can be solved by applying graph-based clustering methodology They are different because: • entity and event have different attributes and values Literature Survey

  34. Solution: a Two-step Procedure • classification step: compute the likelihood one entity mention corefers with the other • clustering step: group the mentions into clusters such that all mentions in a cluster refer to the same entity. Literature Survey

  35. Solution: a Two-step Procedure Classification step • Learning algorithm • decision tree: McCarthy and Lehnert (1995) , Soon et al. (2001) , Strube el al. (2002) , Strube and Muller (2003) and Yang et al. (2003) • maximum entropy: Luo et al. (2004) • SVM: Finley and Joachims (2005) • Kernel :Yang et al. (2006) • Feature sets • Soon et al. (2001) define12 surface level features in four categories lexical, grammatical, semantic and positional • Ng and Cardie (2002) extend 12 to 53 with new features based on common-sense knowledge and linguistic intuitions • Ng (2007) proposes another six semantic features • Yang and Su (2007) extract semantic relatedness features from Wikipedia Literature Survey

  36. Solution: a Two-step Procedure Literature Survey

  37. Solution: a Two-step Procedure Clustering step • closest-first clustering (Soon et al., 2001) • Best-first clustering (Ng and Cardie, 2002) closest-first threshold=0.5 0.3 0.2 0.4 E1 E2 EM1 EM2 EM3 EM4 EM1 EM2 EM3 EM4 best-first E1 EM1 E2 0.6 0.7 EM2 EM3 EM4 0.8 Literature Survey

  38. John Perry1, of Weston Golf Club2, announced his3 resignation yesterday. Link Model: Start Model: Solution: From Local clustering to Global clustering • Problem in the two-step procedure: • works in a greedy style without searching the space of all possible clusterings • Luo et al. (2004) [1,2, 3] [1,2] 3* [1,2] [3] [1] 2* 3 [1,3] [2] [1] [2] 3* [1] [2,3] [1] [2] [3] • Heuristic search algorithm that finds the most probable clustering, i.e., at each step of the search process, only the most promising nodes in the tree are expanded. • Still works in greedy style and may miss the optimal clustering Literature Survey

  39. Solution: From Local clustering to Global clustering • Ng (2005) • 54 coreference resolution systems (3 classification algorithms, 3 clustering algorithms, 3 instance creation methods and 2 feature sets) • global ranking model • rank the 54 candidate clusterings to get the best clustering • performance depends on the best clustering from one of 54 systems system1 ... system54 clustering1 clustering54 ranking model best clustering Literature Survey

  40. Solution: From Supervised to Unsupervised • classification step is supervised • semi-supervised: • co-training (Muller et al., 2002) • self-training • EM • unsupervised: • Non-Parametric Bayesian Models based on Dirichlet Processes (Haghighi and Klein 2007) • Integer Linear Programming (Denis and Baldridge, 2007) • markov logic (Poon and Domingos, 2008) Literature Survey

  41. Solution: graph-based clustering methodology Literature Survey

  42. Solution: graph-based clustering methodology • Nicolae and Nicolae (2006) Literature Survey

  43. Solution: graph-based clustering methodology • Minimum cut Minimum cut is measured as the number of mentions that are correctly placed in their cluster. two correct cases: average and maximum weight √ 5 0.1 0.6 √ 1 0.2 0.1 0.5 x x 3 4 score(cut) = 3 0.7 0.5 2 √ Literature Survey

  44. Solution: graph-based clustering methodology • BESTCUT Algorithm Mary1has a brother2, John3. The boy4is older than the girl5 Clustering: {Mary1, the girl5} and {a brother2, John3, The boy4} 5 Recursive procedure 0.1 Find the best cut using algorithm (Stoer and Wagner,1997) 0.6 1 0.2 0.1 Stop the cut? Yes: continue the procedure on the two subgraphs No: form entities 0.5 3 4 0.7 0.5 2 Literature Survey

  45. Finding the Best Cut (Stoer and Wagner,1997) 5 5 5 5 0.1 0.1 0.1 0.1 0.6 0.6 0.6 0.6 Best Cut 1 1 1 1 0.2 0.2 0.2 0.2 0.1 0.1 0.1 0.1 0.5 0.5 0.5 0.5 3 4 3 3 3 4 4 4 score(cut1) = 3 score(cut2) = 4 score(cut3) = 5 0.7 0.7 0.7 0.7 0.5 0.5 0.5 0.5 2 2 2 2 score(cut2) = 3.5 Literature Survey

  46. Solution: graph-based clustering methodology • Evaluation Literature Survey

  47. Event Coreference Resolution • Pioneering work in MUC (Message Understanding Conference) Evaluations in the 1990s • Humphreys et al.,1997 (ontology) • Bagga and Baldwin,1998 (Vector Space Model) • Events are based on scenarios, e.g., management succession, resignation, election, espionage. • ACE Evaluations define 8 fine-grained event types • Recent work: • Chen et al., 2009a (agglomerative clustering) • Chen and Ji, 2009b (spectral graph-based clustering) Literature Survey

  48. Event Coreference Resolution: agglomerative clustering(Chen et al., 2009a) • Similar to Luo et al. (2004)’s bell tree searching algorithm but using different notations • A pairwise event coreference model using event specific features (triggers/arguments/event attributes) • Event attributes play important role in distinguishing coreference from non-coreference • Performance bottleneck comes from system generated event mentions Literature Survey

  49. Event Coreference Resolution: graph-based clustering methodology • Chen and Ji (2009b) Literature Survey

  50. Event Coreference Resolution: graph-based clustering methodology Literature Survey

More Related