1 / 15

Exploiting diverse knowledge source via Maximum Entropy in Name Entity Recognition

Exploiting diverse knowledge source via Maximum Entropy in Name Entity Recognition. Author: Andrew Borthwick John Sterling Eugene Agichtein Ralph Grishman Speaker: Shasha Liao. Content. Name Entity Recognition (NER) Maximum Entropy (ME)

amyclark
Download Presentation

Exploiting diverse knowledge source via Maximum Entropy in Name Entity Recognition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Exploiting diverse knowledge source via Maximum Entropy in Name Entity Recognition Author: Andrew Borthwick John Sterling Eugene Agichtein Ralph Grishman Speaker: Shasha Liao

  2. Content • Name Entity Recognition (NER) • Maximum Entropy (ME) • System Architecture • Results • Conclusions

  3. Name Entity Recognition (NER) • Give a tokenization of a test corpus and a set of n (n=7) tags, NER is the problem of how to assigning one of (4n+1) tags to each token. • x_begin, x_continue, x_end, x_unique • MUC-7: • Proper names (people, organizations, locations) • expressions of time • quantities • monetary values • percentages

  4. Name Entity Recognition (NER) Jim bought 300 shares o Acme Corp. in 2006. Jim bought 300 shares of Acme Corp. in 2006 . per_unnique other qua_unique other other org_begin org-end other time_unique other

  5. Maximum Entropy (ME) • Statistical modeling technique • Estimate probability distribution based on partial knowledge • Principle: correct probability distribution maximizes entropy (uncertainty) based on what is known

  6. Maximum Entropy (ME) ---build ME model

  7. Maximum Entropy (ME) --- Initialize Features

  8. Maximum Entropy (ME) --- ME Estimation

  9. Maximum Entropy (ME) --- Generalized Interactive Scaling

  10. System Architecture --- Features(1) • Feature set • Binary:similar to BBN’s Nymble/Identification system • Lexical:all tokens with a count of 3 or more • Section:date, preamble, text… • Dictionary:name list • External system:futures in other systems become histories • Compound:external system : section feature

  11. System Architecture --- Features(2) • Feature selection • Features which activate on default value of a history view.(99% cases are not names) • Lexicons which predict the future ”other” less than 6 times instead of 3 • Features which predict “other” at position token-2 and tokens2

  12. System Architecture --- Decoding and Viterbi Search • Viterbi Search: dynamic programming • Find the highest probability legal path through the lattice of conditional probabilities • Example: Mike England • person_start(0.66) gpe_unique(0.6) p(g_u/p_s) = 0 • person-start(0.66) person_end(0.3) p(p_e/p_s) =0.7

  13. Result(1)

  14. Result(2) • Probable reasons: • Dynamic updating of vocabulary during decoding.( reference resolution) Andrew Borthwick • Binary model VS multi-class model.

  15. Conclusion • Future work: • Incorporating long-range reference resolution • Use general compound features • Use Acronyms • Advantage of MENE: • Can incorporate previous token’s information • Features can be overlap • Highly portable • Easy to be combined with other systems

More Related