1 / 31

Multi-Criteria-based Active Learning for Named Entity Recognition

Multi-Criteria-based Active Learning for Named Entity Recognition. Dan Shen †‡ , Jie Zhang †‡ , Jian Su † , Guodong Zhou † , Chew-Lim Tan ‡ † Institute for Infocomm Research ‡ National University of Singapore. Outline. Introduction Active Learning in NER

Download Presentation

Multi-Criteria-based Active Learning for Named Entity Recognition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multi-Criteria-based Active Learning for Named Entity Recognition Dan Shen†‡, Jie Zhang†‡, Jian Su†, Guodong Zhou†, Chew-Lim Tan‡ †Institute for Infocomm Research ‡National University of Singapore

  2. Outline • Introduction • Active Learning in NER • Multiple Criteria for Active Learning • Informativeness • Representativeness • Diversity • Active Learning Strategies • Experiments and Results • Conclusion ACL 2004

  3. Motivation • Named Entity Recognition (NER) • most of current work: supervised learning • a large annotated corpus • MUC-6 / MUC-7 corpus (newswire domain) • GENIA corpus (biomedical domain) • Limitation of supervised NER • corpus annotating: tedious and time-consuming • adaptability: in limited level • Target of our work • explore active learning in NER • minimize the human annotation effort • without degrading performance ACL 2004

  4. research focus Active Learning Framework • Given • a small labeled data set L • a large unlabeled data set U • Repeat • Train a model M on L • Use M to test U • select the most useful example from U • require human expert to label it • add the labeled example to L • Until M achieves a certain performance level ACL 2004

  5. Related Work • Active Learning Criteria • Most of current work: informativeness • Few work: representativeness or diversity • E.g. [McCallum and Nigam 1998] and [Tang et al. 2002] • E.g. [Brinker 2003] NO workexplored multiple criteria in active learning • Active Learning in NLP • E.g. POS tagging / text classification / statistical parsing NO work explored active learning in NER ACL 2004

  6. Outline • Introduction • Active Learning in NER • Multiple Criteria for Active Learning • Informativeness • Representativeness • Diversity • Active Learning Strategies • Experiments and Results • Conclusion ACL 2004

  7. Active Learning in NER • SVM-based NER system • Recognize one class of NEs at a time • Features – Different from supervised learning! • Cannot be produced statistically from training data set • No gazetteer or dictionaries • Active Learning in NER system • Select a most useful batch of examples • An example – a word sequence • E.g. a named entity and its context • Measurements – from words to NEs • Only word-based score is available from SVM ACL 2004

  8. Outline • Introduction • Active Learning in NER • Multiple Criteria for Active Learning • Informativeness • Representativeness • Diversity • Active Learning Strategies • Experiments and Results • Conclusion ACL 2004

  9. 1. Informativeness Criterion Most informative example: most uncertain in existing model Most previous works are only based on this criterion.

  10. Informativeness Measurement • Informativeness for Word • Change / induce support vectors in SVM • Distance of the word’s feature vector to separating hyperplane • Informativeness for NE • NE -- a sequence of words • NE = w1w2…wN , wi is the ith word of NE • Heuristic scoring functions • E.g. ACL 2004

  11. 2. Representativeness Criterion Most representative example: represent most examples Only few works [McCallum and Nigam 1998; Tang et al. 2002] consider this criterion.

  12. Similarity Measurement • Similarity between Words • Cosine-similarity Measurement • Adapted to SVM • Similarity between NEs • Alignment of two word sequences • Dynamic Time Warping (DTW) algorithm • Given • point-by-point distance • To find an optimal path • Minimize accumulated distance along the path ACL 2004

  13. Representativeness Measurement • Representativeness of NEiin NESet • NESet = {NE1, …NEi , …NEN} • Quantified by its density • The average similarity between NEi and the other NEj(j≠i ) in NESet • Most representative NE • Largest density among all NEs in NESet • centroid of NESet ACL 2004

  14. 3. Diversity Criterion Maximize the training utility of a batch: the members in the batch have high variance to each other Only one work [Brinker 2003] considered this criterion.

  15. Global Consideration • Consider the examples in a whole sample space • K-Means Clustering • Cluster all named entities in NESet • Suppose: • the examples in one cluster are quite similar to each other • Select the examples from different clusters at a time • Time consuming • For efficiency, filter out NEs before clustering ACL 2004

  16. Local Consideration • Consider the examples in a batch • For an example candidate: • Compare it with all previously selected examples in the batch one by one • Add it into the batch • If the similarity between all of them is below a threshold • More efficient! ACL 2004

  17. Outline • Introduction • Active Learning in NER • Multiple Criteria for Active Learning • Informativeness • Representativeness • Diversity • Active Learning Strategies • Experiments and Results • Conclusion ACL 2004

  18. Select M most informative examples (Informativeness Criterion) Unlabeled Data Set Select centroid of each cluster (Representativeness Criterion) Strategy 1 Intermediate Set Batch Clustering (K clusters) (Diversity Criterion) ACL 2004

  19. Select example with max score: λInfo+(1- λ)Rep (Informativeness & Representativeness Criteria) Unlabeled Data Set Batch example candidate Strategy 2 Compare the candidate with each example in Batch IF any of the similarity values > threshold THEN reject ELSE add to Batch (Diversity Criterion) ACL 2004

  20. Outline • Introduction • Active Learning in NER • Multiple Criteria for Active Learning • Informativeness • Representativeness • Diversity • Active Learning Strategies • Experiments and Results • Conclusion ACL 2004

  21. Experimental Setting 1 • Corpus • MUC-6 Corpus • To recognize Person, Location and Organization • GENIA Corpus V1.1 • To recognize Protein • Corpus Split • Initial training data set • Test data set • Unlabeled data set • Size of each data set • Batch size K • = 50 in biomedical domain • = 10 in newswire domain ACL 2004

  22. Experimental Setting 2 • Supervised learning • trained on the entire annotated corpus. • Newswire: 408 WSJ articles • Biomedical: 590 MEDLINE abstracts • Random Selection • a batch of examples is randomly selected in each round • F-Measurement ACL 2004

  23. Experimental Results 1 • Effectiveness of Single-Criterion-based Active Learning Supervised (223K) Info_based (52K, 62%,23%) Random (83K) ACL 2004

  24. Avg(3.5K/11.5K, 2.1K/13.6K, 7.8K/20.2K) = 28% Avg(3.5K/157K, 2.1K/157K, 7.8K/157K) = 5% 31K/83K = 37% 31K/223K = 14% more 9K words Experimental Results 2 • Overall Results of Multi-Criteria-based Active Learning ACL 2004

  25. Experimental Results 3 • Effectiveness of Multi-Criteria-based Active Learning Strategy2 (31K, 60%) Info_based (52K) Supervised (223K) Strategy1 (40K, 77%) ACL 2004

  26. Outline • Introduction • Active Learning in NER • Multiple Criteria for Active Learning • Informativeness • Representativeness • Diversity • Active Learning Strategies • Experiments and Results • Conclusion ACL 2004

  27. Conclusion • The first work-- multiple criteria in active learning • Informativeness, Representativeness & Diversity • Two active learning strategies • Outperform single-criterion-based method (60%) • The first work -- active learning in NER • Measurements for multiple criteria • Greatly reduce annotation cost • General measurements and strategies • Can be easily adapted to other NLP tasks ACL 2004

  28. Future Work • How to automatically decide the optimal value of these parameters? • Batch size K • Linear interpolation parameter • When to stop the active learning process? • the change of support vectors ACL 2004

  29. Acknowledgement Thanks to International Post-Graduate College (IGK) at Saarland University for the generous travel support. The first author is now a Ph.D. student in this program. ACL 2004

  30. The End Thank You !

  31. Corpus Split ACL 2004

More Related