1 / 17

Relation Extraction CSCI-GA.2591

NYU. Relation Extraction CSCI-GA.2591. Ralph Grishman. ACE Relations. An ACE relation mention connects two entity mentions in the same sentence: the CEO of Microsoft  OrgAff:employment(the CEO of MIcrosoft , Microsoft)

failla
Download Presentation

Relation Extraction CSCI-GA.2591

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NYU Relation ExtractionCSCI-GA.2591 Ralph Grishman

  2. ACE Relations An ACE relation mention connects two entity mentions in the same sentence: • the CEO of Microsoft OrgAff:employment(the CEO of MIcrosoft, Microsoft) • in the West Bank, a passenger was wounded Phys:Located(a passenger, the WestBank) ACE 2005 had 6 types of relations and 18 subtypes • most papers report on types only Most relations are local … • in roughly 70% of relations, arguments are adjacent or separated by one word • so chunking is important but full parsing is not critical

  3. Benchmarks • ACE 2003 / 2003 / 2005 corpora • generally assuming perfect entity mentions on input • some work assumes only position (and not semantic type) is given • Semeval-2010 task 8 • carefully selected examples of 10 relations • a classification task

  4. Using MaxEnt • First description of an ACE relation extractor • IBM system [Kambhatla ACL 2004] • Used features: • words • entity type • mention level • overlap • dependency tree • parse tree • used 2003 ACE data • F = 55 (perfect mentions) 23 (system mentions) • good system mentions are important

  5. Lots of features • Singapore system [Zhou et al. ACL 2005] used a very rich feature set, including • 11 chunk-based features • family-relative feature • 2 country-name features • 7 dependency-based features • . . . • highly tuned to ACE task • F = 68 (relation type) F = 55 (subtype) • reports several % gain over IBM • used perfect mentions • further extended at NYU, on ACE 2004: F=70.1

  6. Kernel methods and SVMs • As an alternative to a feature-based model, one can provide a kernel function: a similarity function between pairs of the objects being classified • kernel can be used directly by a kNN nearest neighbor classifier • or can be used in training an SVM [Support Vector Machine]

  7. SVM • The SVM, when trained, creates a separating hyperplane • if data is fully separable, all data on one side of the hyperplane are classified +, on the other side – • inherently binary classifier

  8. Benefit of kernel methods • provides a natural way of handling structured input of variable size: sequences and trees • feature-based system may require a large number of features for the same effect

  9. Shortest-path kernel • [Bunescu & Mooney EMNLP 2005] • Sept 2002 corpus • Based on dependency path between arguments • Kernel function between two paths x and y of lengths m and n • c = degree of match (lexical / POS) • Train SVM • F = 52.5

  10. Tree kernel • To take account of more of the tree than the dependency path, use PET (path-enclosed tree) • PET = Portion of tree enclosed by shortest path • Using entire sentence tree introduces too much irrelevant data • Use a tree kernel which recursively compares the two trees • For example, counts number of shared subtrees • Best kernel is a composite kernel: • tree kernel + entity kernel

  11. Lexical Generalization • Test data will include words not seen in training • Remedies • Use lemmas • Use Brown clusters • Use word embedings • Can be used with feature-based or kernel-based methods

  12. FCM Feature-Rich Compositional Embedding Models • Combines word embedding and hand-made discrete features: • where • e is the word embedding vector • f is a vector of hand-coded features • T is a matrix of weights • If e is fixed during training, this is a feature-rich log linear model

  13. Neural Network • neural networks • provide a richer model than logLinear • reduce the need for feature engineering • although it may help to add features to embeddings • but are slow to train and hard to inspect • several types of networks have been used • convolutionalNNs • recurrent NNs • an ensemble of different NN types appears most effective • may even include log linear model in ensemble

  14. Some comparisons • ACE 2005, train nw+bn, test bc, • perfect mentions, including entity types • LogLinearsystem 57.8 • FCM 61.9 • hybrid FCM 63.5 • CNN 63.0 • NN ensemble 67.0 • The richer model of even a simple NN beats a log linear (maxent system) • [Nguyen and Grishman, IJCAI Workshop 2016]

  15. Comparing scores Using subset of ACE 2005 (news) Feature-based system Perfect mention position but no type info • Baseline 51.4 • Single Brown Cluster 52.3 • Multiple clusters 53.7 • Word Embedding (WE) 54.1 • Multiple clusters + WE 55.5 • Mult. clusters + WE + regularization 59.4 Moral: lexical generalization & regularization are worthwhile (probably for all ACE tasks) [Nguyen & Grishman ACL 2014]

  16. Distant Supervision • We have focused on supervised methods, which produce the best performance • If we have a large data base with instances of the relations of interest, we can use distant supervision • Use data base to tag corpus • If DB has relation R(x,y),tag all sentences in corpus containing x and y as examples of R • Train model from tagged corpus

  17. Distant Supervision • By itself, distant supervision is too noisy • If the same pair <x, y> is connected by several relations, which one to we label? • But it can be combined with selective manual annotation to produce a satisfactory result

More Related