1 / 13

How feasible is the reuse of grammars for Named Entity Recognition?

How feasible is the reuse of grammars for Named Entity Recognition? . Katerina Pastra, Diana Maynard, Oana Hamza, Hamish Cunningham and Yorick Wilks. Department of Computer Science, Natural Language Processing Group, University of Sheffield, U.K. . The paradox.

istas
Download Presentation

How feasible is the reuse of grammars for Named Entity Recognition?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. How feasible is the reuse of grammars for Named Entity Recognition? Katerina Pastra, Diana Maynard, Oana Hamza, Hamish Cunningham and Yorick Wilks Department of Computer Science, Natural Language Processing Group, University of Sheffield, U.K. Pastra et al., LREC 2002

  2. The paradox NER results: close to human performance Reuse of NER resources: minimal • We will focus on: • Traditional rule-based NER systems • NER in text • Reuse of grammars for NER • Manual adaptation of grammars Pastra et al., LREC 2002

  3. What is it that hinders grammar reuse? 1) Grammar Formalism 2) Application Domain 3) Natural Language The use of Flexible System Architectures guarantees reusability of resources >>> But is this a “sine qua non” solution ? Does the lack of such architectures render reusability simply “not feasible” ? Pastra et al., LREC 2002

  4. Grammar Formalism (1) • Translating formalisms: a time-effective solution? • Time gained-information lost: is there a trade-off? >> Current Practice: No standardised formalism >> Traditional pattern-matching languages: inappropriate for NER >> Norm: Use of AV notations (allow for reference to token attributes from multiple analysis levels). Pastra et al., LREC 2002

  5. Grammar Formalism (2) The need: NER for SOCIS (not main task – limited time) The problem:Existing grammar in another formalism >> NEA – JAPE Similarities: Declarative, context-sensitive, non-det PM… >>NEA – JAPE Differences: BU rule invocation – FST cascades Appelt control mechanism - Appelt, First, Brill Rules augmented with PROLOG – JAVA Wildcards, “don’t care sequ”: not common Iterations, (!=) : different mechanisms Pastra et al., LREC 2002

  6. Grammar Formalism (3) The experiment: From the NEA notation to JAPE • NEA notation: A => B\C/D • JAPE: (B)(C) :label (D)  :label.EntityType = {attr} • one’s LHS another’s RHS • same things handled in different ways • differences in modules run before NER affect rules STILL: Original set in 2 months – SOCIS set in 1 week Pastra et al., LREC 2002

  7. Application Domain (1) Is there a core set of grammar rules that are always domain independent ? • General purpose NER grammars: • Developed to serve grammar reuse, but originated • themselves from specific applications • They separate specific from general information. • MUSE: automatic resource switches ~ text features • HaSIE: company reports on health and safety issues Pastra et al., LREC 2002

  8. Application Domain (2) From newswire text onBiotechnology to … Crime Scene Police Reports • The experiment: • The gazetteers were enriched with police and crime • related information • All original domain-specific rules were deleted • Original results with no modifications to the • grammar : close to 90% • Only 1 change to the core set and addition of rules Pastra et al., LREC 2002

  9. Natural Language (1) NER Grammar in language (A) + linguistic knowledge of NE in (B) = NER grammar for (B) ? • Parameters to consider: • The relation of A and B (close related or not) •  determines the extent of reuse • Nature of NEs (formation, syntagmatic relations) •  unpredictable behaviour and structure •  finite set Pastra et al., LREC 2002

  10. Natural Language (2) The experiment: Run NER grammar for English on Romanian text • Romanian NE (compared to English): • Rich inflection • Flexible word order • Different word order (e.g modifier follows noun) Pastra et al., LREC 2002

  11. Natural Language (3) Corpus: 1MB of Romanian newspaper texts Manual marking of NEs – Romanian NER (3 weeks) • 1st experiment: Romanian Gaz + English grammar • >> Overall Results: P = 0.82, R = 0.67 • Low recall even for entity types rec with high P • (e.g. Org 0.75P – 0.39R) • 2nd experiment: Romanian Gaz + Adapted grammar • >> Overall Results: P = 0.95, R = 0.94 Pastra et al., LREC 2002

  12. Natural Language (3) Pastra et al., LREC 2002

  13. Conclusions Reuse of existing NER grammars is time effective and should be attempted even when the formalisms, applications and languages involved are different • Further issues to be addressed: • Reuse of NER grammars for spoken NEs • Reuse in statistical/ML NER approaches • Automating grammar reuse Pastra et al., LREC 2002

More Related