1 / 13

Active Learning in Text Retrieval

Active Learning in Text Retrieval. Introduction. Passive Learning vs. Active Learning Active Learning: Intelligently choose good questions to reach high performance using as few examples as possible. Learning conjunctions. Protocol I: teacher proposes questions to learner

linore
Download Presentation

Active Learning in Text Retrieval

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Active Learning in Text Retrieval

  2. Introduction • Passive Learning vs. Active Learning • Active Learning: Intelligently choose good questions to reach high performance using as few examples as possible

  3. Learning conjunctions • Protocol I: teacher proposes questions to learner • Protocol II: learner randomly choose questions • Protocol III: learner proposes questions to teacher

  4. Active Learning in HARD Track • Think text retrieval as a classification problem • HARD Track permit participants to ask several questions (Clarification Form) • Research Problem: what kind of question to ask?

  5. Baseline: Random Sampling • Randomly choose unlabeled samples • Incorporate new labeled example to retrain a new classifier • Not efficient ! There are already some clues to choose unlabeled example(s)

  6. Relevance Feedback • A kind of active learning • Let the user to label top ranked retrieved results • Is it optimal?

  7. Uncertainty Sampling [SIGIR94] • Create an initial classifier • When teacher is willing to label examples • Apply the current classifier to each unlabeled example • Find the b examples for which the classifier is least certain of class membership • Have the teacher label the subsample of b examples • Train a new classifier on all labeled examples

  8. A Probabilistic Text Classifier • Logistic regression to P(C|x)

  9. Comment • Choose most uncertain unlabeled example (reduce version space?) vs. examples that can minimize future error • Several samples one time vs. A sample one time • Incremental Training - Computation Issue • Sequential process vs. Two times process (HARD)

  10. Query By Committee (QBC) • QBC [COLT1992, NIPS 1992] • Generate a committee of classifiers, and next query is chosen by the principle of maximal disagreement among these classifier [COLT 1992]. • The effect of training a set of examples can be achieved for the cost of getting corresponding examples that are not yet labeled and then labeling logarithmic fraction of them

  11. Active Learning with Statistical Models • Cohn et al [JAIR 1996] • Provide a statistically optimal solution: selects training example that once labeled and added to the training data, is expected to result in the lowest error on future test examples • This optimal solution can’t be efficiently found

  12. Sampling Estimation • Roy, McCallum [NIPS 2002] • Using sampling estimation of error reduction to reach optimal active learning

  13. Active Learning Framework for CBIR • Zhang [IEEE Transaction of Multimedia 2002] • CBIR: weighted sum of semantic distance and low-level feature distance. Semantic Feature is Attributes annotated. • Examples to be annotated are what system is the most uncertain of • Biased Kernel Regression, Entropy as Uncertainty

More Related