Accelerating Corpus Annotation through Active Learning
Download
1 / 75

Accelerating Corpus Annotation through Active Learning - PowerPoint PPT Presentation


  • 89 Views
  • Uploaded on

Accelerating Corpus Annotation through Active Learning. Peter McClanahan Eric Ringger, Robbie Haertel, George Busby, Marc Carmen, James Carroll , Kevin Seppi, Deryle Lonsdale (PROJECT ALFA) ‏. March 15, 2008 – AACL. Our Situation. Operating on a fixed budget

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Accelerating Corpus Annotation through Active Learning' - frye


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Accelerating Corpus Annotation through Active Learning

Peter McClanahan

Eric Ringger, Robbie Haertel, George Busby, Marc Carmen, James Carroll , Kevin Seppi, Deryle Lonsdale (PROJECT ALFA)‏

March 15, 2008 – AACL


Our Situation

  • Operating on a fixed budget

  • Need morphologically annotated text

  • For a significant part of the project, we can only afford to annotate a subset of the text

  • Where to focus manual annotation efforts in order to produce a complete annotation of highest quality?


Improving on Manual Tagging

  • Can we reduce the amount of human effort while maintaining high levels of accuracy?

  • Utilize probabilistic models for computer-aided tagging

  • Focus: Active Learning for Sequence Labeling


Outline

  • Statistical Part-of-Speech Tagging

  • Active Learning

    • Query by Uncertainty

    • Query by Committee

  • Cost Model

  • Experimental Results

  • Conclusions


Probabilistic POS Tagging

<s> Perhaps the biggest hurdle


Probabilistic POS Tagging

<s> Perhaps the biggest hurdle

?

Task: predict a tag for a word in a given context

Actually: predict a probability

distribution over possible tags


Probabilistic POS Tagging

RB .87

RBS .05

...

FW 6E-7

<s> Perhaps the biggest hurdle

Task: predict a tag for a word in

a given context

Actually: predict a probability

distribution over possible tags


Probabilistic POS Tagging

RB .87

RBS .05

...

FW 6E-7

<s> Perhaps the biggest hurdle

?

Task: predict a tag for a word in

a given context

Actually: predict a probability

distribution over possible tags


Probabilistic POS Tagging

RB .87 DT .95

RBS .05 NN .003

... ...

FW 6E-7 FW 1E-8

<s> Perhaps the biggest hurdle

Task: predict a tag for a word in

a given context

Actually: predict a probability

distribution over possible tags


Probabilistic POS Tagging

RB .87 DT .95

RBS .05 NN .003

... ...

FW 6E-7 FW 1E-8

<s> Perhaps the biggest hurdle

?

Task: predict a tag for a word in

a given context

Actually: predict a probability

distribution over possible tags


Probabilistic POS Tagging

RB .87 DT .95 JJS .67

RBS .05 NN .003 JJ .17

... ... ...

FW 6E-7 FW 1E-8 UH 3E-7

<s> Perhaps the biggest hurdle

Task: predict a tag for a word in

a given context

Actually: predict a probability

distribution over possible tags


Probabilistic POS Tagging

RB .87 DT .95 JJS .67

RBS .05 NN .003 JJ .17

... ... ...

FW 6E-7 FW 1E-8 UH 3E-7

<s> Perhaps the biggest hurdle

Task: predict a tag for a word in

a given context

Actually: predict a probability

distribution over possible tags


Probabilistic POS Tagging

RB .87 DT .95 JJS .67 NN .85

RBS .05 NN .003 JJ .17 VV .13

... ... ... ...

FW 6E-7 FW 1E-8 UH 3E-7 CD 2E-7

<s> Perhaps the biggest hurdle

Task: predict a tag for a word in

a given context

Actually: predict a probability

distribution over possible tags


Probabilistic POS Tagging

  • Train a model to perform precisely this task:

    • Estimate a distribution over tags for a word in given contexts.

  • Probabilistic POS Tagging is a fairly easy computational task

    • On the Penn Treebank (Wall Street Journal text)

      • No context

      • Choosing most frequent tag for a given word

      • 92.45% accuracy

    • There are much smarter things to do

      • Maximum Entropy Markov Models




Sequence Labeling

  • Insteadof predicting one tag, we must predict entire sequence

  • The best tag for each word might give an unlikely sequence

  • We use Viterbi algorithm—a dynamic programming approach that gives best sequence.


Features for POS Tagging

  • Combination of lexical, orthographic, contextual, and frequency-based information.

    • Following Toutanova & Manning (2000) approximately

  • For each word:

    • the textual form of the word itself

    • the POS tags of the preceding two words

    • all prefix and suffix substrings up to 10 characters long

    • whether words contain hyphens, digits, or capitalization

    • Additional features

    • prefix and suffix substrings up to 10 characters long


More Data for Less Work

  • The goal is to produce annotated corpora with less time and annotator effort

  • Active Learning

    • A machine learning meta-algorithm

    • A model is trained with the help of an oracle

    • Oracle helps with the annotation process

      • Human annotator(s)‏

      • Simulated with selective querying of hidden labels on pre-labeled data


Active Learning Data Sets

  • Annotated: data labeled by oracle

  • Unannotated: data not (yet) labeled by oracle

  • Test: labeled data used to determine accuracy


Active Learning Models

  • Sentence Selecting Model

    • Model that chooses the sentences to be annotated

  • MEMM

    • Model that tags the sentences


Test Data

Active Learning Algorithm

Unannotated Data

Annotated Data


Test Data

Active Learning Algorithm

Unannotated Data

Annotated Data

Automatic Tagger

Sentence Selector Model


Test Data

Active Learning Algorithm

Unannotated Data

Annotated Data

Automatic Tagger

Sentence Selector Model


Test Data

Active Learning Algorithm

Unannotated Data

Annotated Data

Automatic Tagger

Sentence Selector Model


Most “Important”

Samples

Test Data

Active Learning Algorithm

Unannotated Data

Annotated Data

Automatic Tagger

Sentence Selector Model


Most “Important”

Samples

Oracle

Test Data

Active Learning Algorithm

Unannotated Data

Annotated Data

Automatic Tagger

Sentence Selector Model


Most “Important”

Samples

Oracle

Test Data

Active Learning Algorithm

Unannotated Data

Annotated Data

Corrected

Sentences

Automatic Tagger

Sentence Selector Model


Most “Important”

Samples

Oracle

Test Data

Active Learning Algorithm

Unannotated Data

Annotated Data

Corrected

Sentences

Automatic Tagger

Sentence Selector Model


Test Data

Active Learning Algorithm

Unannotated Data

Annotated Data

Automatic Tagger

Sentence Selector Model


Test Data

Active Learning Algorithm

Unannotated Data

Annotated Data

Automatic Tagger

Sentence Selector Model


Test Data

Active Learning Algorithm

Unannotated Data

Annotated Data

Automatic Tagger

Sentence Selector Model


Most “Important”

Samples

Test Data

Active Learning Algorithm

Unannotated Data

Annotated Data

Automatic Tagger

Sentence Selector Model


Most “Important”

Samples

Most “Important”

Samples

Oracle

Test Data

Active Learning Algorithm

Unannotated Data

Annotated Data

Automatic Tagger

Sentence Selector Model


Most “Important”

Samples

Oracle

Test Data

Active Learning Algorithm

Unannotated Data

Annotated Data

Corrected

Sentences

Automatic Tagger

Sentence Selector Model


Most “Important”

Samples

Oracle

Test Data

Active Learning Algorithm

Unannotated Data

Annotated Data

Corrected

Sentences

Automatic Tagger

Sentence Selector Model


Test Data

Active Learning Algorithm

Unannotated Data

Annotated Data

Automatic Tagger

Sentence Selector Model


Test Data

Active Learning Algorithm

Unannotated Data

Annotated Data

Automatic Tagger

Sentence Selector Model


Test Data

Active Learning Algorithm

Unannotated Data

Annotated Data

Automatic Tagger

Sentence Selector Model


Most “Important”

Samples

Test Data

Active Learning Algorithm

Unannotated Data

Annotated Data

Automatic Tagger

Sentence Selector Model


Most “Important”

Samples

Oracle

Test Data

Active Learning Algorithm

Unannotated Data

Annotated Data

Automatic Tagger

Sentence Selector Model


Most “Important”

Samples

Oracle

Test Data

Active Learning Algorithm

Unannotated Data

Annotated Data

Corrected

Sentences

Automatic Tagger

Sentence Selector Model


Most “Important”

Samples

Oracle

Test Data

Active Learning Algorithm

Unannotated Data

Annotated Data

Corrected

Sentences

Automatic Tagger

Sentence Selector Model


Test Data

Active Learning Algorithm

Unannotated Data

Annotated Data

Automatic Tagger

Sentence Selector Model


Test Data

Active Learning Algorithm

Unannotated Data

Annotated Data

Automatic Tagger

Sentence Selector Model


Determining Sentence Importance

  • Determining which samples will provide the most information:

    • Random Baseline (Random)‏

    • Query-By-Uncertainty

      • Using entropy (QBU)‏

      • Using Viterbi score (QBUV)

    • Query-By-Committee (QBC)‏


Random Baseline

  • Instead of giving informative sentences to the oracle, we choose random sentences.

    • This is very fast

    • Not the smartest thing to do


Query by Uncertainty

  • Importance of a sample corresponds to the uncertainty in the model’s decision

  • “Uncertainty Sampling”

  • Thrun & Moeller, 1992

  • Lewis & Gale, 1994

  • Hwa, 2000

  • Anderson, 2005

  • Ringger et al., 2007


Query-by-Uncertainty using Entropy

  • How is uncertainty determined?

    • Per-word entropy

    • Measure of the uncertainty in the distribution over tags for a single word

    • Easy to calculate

    • Our approximation to sentence entropy = the sum of per-word entropies

    • This performs equivalently to full-sequence entropy.


QBUV

  • Alternative approach to QBU

    • Ringger et al., 2007 show that it often outperforms QBU

  • Estimate per-sentence uncertainty with 1-P(t)‏

  • Where t is the best tag sequence from the Viterbi decoder (hence the “V”)‏


Query by Committee

  • Utilize a committee (or ensemble) of N models

    • Each model is bootstrap sampled (with replacement) from annotated data

  • Each model “votes” on the correct tagging of a sentence

  • Those sentences on which the committee has most disagreement should yield the most useful information


Previous Work on QBC

  • Seung, Opper, & Sompolinsky, 1992

  • Freund, Seung, Shamir, & Tishby, 1997

    • Analysis of QBC

  • Engelsohn (Argamon) & Dagan, 1996

    • QBC with HMMs for POS tagging

    • Most relevant to our work


Experimental Setup

  • We use 80% of data for training data

    • Data is Penn Treebank

  • Start with one random sentence

  • Assume the rest of training data to be unlabeled (hide the tags)

  • We simulate annotation by the oracle by revealing labels

  • Results are averaged from 10 randomized runs





Which model?

  • The best active learning model depends on how the annotator is paid

  • We usually pay by the hour

  • We need a good cost model


User Study

  • See Ringger et al., 2008

  • 47 participants (undergraduate linguistic students)

  • Each tagged 36 sentences from Penn Treebank

  • Tracked correcting times for correcting tags in a sentence


  • Cost Model

    • From the User Study, we performed linear regression to infer a cost model.

    • Cost (in seconds)= 3.795 * L + 5.387 * C + 12.57

    • L = length of the sentence

    • C = number of tags the annotator needs to change



    How much do you actually save?

    Say I want 95% accuracy




    Is active learning better, though?

    • The Penn Treebank has approximately one million words.

    • According to the cost model, an average PTB sentence would take about 4 minutes.

    • This means 26,908 hours for the full corpus.

    • With active learning we achieve 96% accuracy after about 70 hours.


    Portability

    • Do the results hold up on other data?

      • We use random selections of English poetry from the BNC

      • We also use Kiraz Syriac New Testament




    Conclusions

    • Active learning model depends on how cost is determined

    • Demonstrated significant annotation savings for POS tagging

      • On English news prose

      • On English poetry

      • On Syriac



    ad