1 / 22

Exploiting Domain Structure for Named Entity Recognition

Exploiting Domain Structure for Named Entity Recognition. Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign. Named Entity Recognition. A fundamental task in IE An important and challenging task in biomedical text mining

saad
Download Presentation

Exploiting Domain Structure for Named Entity Recognition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Exploiting Domain Structure for Named Entity Recognition Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign

  2. Named Entity Recognition • A fundamental task in IE • An important and challenging task in biomedical text mining • Critical for relation mining • Great variation and different gene naming conventions

  3. Need for domain adaptation • Performance degrades when test domain differs from training domain • Domain overfitting

  4. Existing work • Supervised learning • HMM, MEMM, CRF, SVM, etc. (e.g., [Zhou & Su 02], [Bender et al. 03], [McCallum & Li 03] ) • Semi-supervised learning • Co-training [Collins & Singer 99] • Domain adaptation • External dictionary [Ciaramita & Altun 05] • Not seriously studied

  5. Outline • Observations • Method • Generalizability-based feature ranking • Rank-based prior • Experiments • Conclusions and future work

  6. Observation I • Overemphasis on domain-specific features in the trained model “suffix –less” weighted high in the model trained from fly data wingless daughterless eyeless apexless … fly • Useful for other organisms? in general NO! • May cause generalizable features to be downweighted

  7. Observation II • Generalizablefeatures: generalize well in all domains • …decapentaplegic and wingless are expressed in analogous patterns in each primordium of… (fly) • …that CD38 is expressed by both neurons and glial cells…that PABPC5 is expressed in fetal brain and in a range of adult tissues. (mouse)

  8. Observation II • Generalizablefeatures: generalize well in all domains • …decapentaplegic and wingless are expressed in analogous patterns in each primordium of… (fly) • …that CD38 is expressed by both neurons and glial cells…that PABPC5 is expressed in fetal brain and in a range of adult tissues. (mouse) “wi+2 = expressed” is generalizable

  9. Generalizability-based feature ranking training data fly mouse D3 … Dm 1 2 3 4 5 6 7 8 … … -less … … expressed … … 1 2 3 4 5 6 7 8 … … … expressed … … … -less 1 2 3 4 5 6 7 8 … … … expressed … … -less … … 1 2 3 4 5 6 7 8 … … … … expressed … … -less s(“expressed”) = 1/6 = 0.167 s(“-less”) = 1/8 = 0.125 … expressed … … … -less … … … 0.125 … … … 0.167 … …

  10. Feature ranking & learning F ... expressed … … … -less … … top k features labeled training data supervised learning algorithm trained classifier

  11. Feature ranking & learning F ... expressed … … … top k features labeled training data supervised learning algorithm trained classifier

  12. Feature ranking & learning rank-based prior variances in a Gaussian prior F ... expressed … … … -less … … prior logistic regression model (MEMM) labeled training data supervised learning algorithm trained classifier

  13. Prior variances • Logistic regression model • MAP parameter estimation prior for the parameters σj2 is a function of rj

  14. Rank-based prior variance σ2 important features → large σ2 a non-important features → small σ2 r = 1, 2, 3, … rank r

  15. Rank-based prior variance σ2 a a and b are set empirically b = 6 b = 4 b = 2 r = 1, 2, 3, … rank r

  16. Summary training data E test data Dm D1 … λ1, … , λm testing learning individual domain feature ranking entity tagger … O1 Om b = λ1b1 + … + λmbm rank-based prior generalizability-based feature ranking optimal b1 for D1 optimal b2 for D2 O’ rank-based prior optimal bm for Dm

  17. Experiments • Data set • BioCreative Challenge Task 1B • Gene/protein recognition • 3 organisms/domains: fly, mouse and yeast • Experimental setup • 2 organisms for training, 1 for testing • Baseline: uniform-variance Gaussian prior • Compared with 3 regular feature ranking methods: frequency, information gain, chi-square

  18. Comparison with baseline

  19. Comparison with regular feature ranking methods generalizability-based feature ranking feature frequency information gain and chi-square

  20. Conclusions and future work • We proposed • Generalizability-based feature ranking method • Rank-based prior variances • Experiments show • Domain-aware method outperformed baseline method • Generalizability-based feature ranking better than regular feature ranking • To exploit the unlabeled test data

  21. The end Thank you!

  22. How to set a and b? • a is empirically set to a constant • b is set from cross validation • Let bi be the optimal b for domain Di • b is set as a weighted average of bi’s, where the weights indicate the similarity between the test domain and the training domains

More Related