1 / 17

May 19 2010, LREC 2010

Using an Error-Annotated Learner Corpus to Develop an ESL/EFL Error Correction System. Na-Rae Han (University of Pittsburgh), Joel Tetreault (ETS), Soo-Hwa Lee ( Chungdahm Learning, Inc.), Jin-Young Ha ( Kangwon University). May 19 2010, LREC 2010. Objective.

riva
Download Presentation

May 19 2010, LREC 2010

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using an Error-Annotated Learner Corpus to Develop an ESL/EFL Error Correction System Na-Rae Han (University of Pittsburgh), Joel Tetreault (ETS), Soo-Hwa Lee (Chungdahm Learning, Inc.), Jin-Young Ha (Kangwon University) May 19 2010, LREC 2010

  2. Objective • A feedback tool for detecting and correcting preposition errors • I wait /foryou. (<NULL,p>: omitted prep) • So I go to/ home quickly. (<p,NULL>: extraneous prep) • Adult give money at/on birthday. (<p1,p2>: selection error) • Why preposition errors? • Preposition usage is one of the most difficult aspects of English for non-native speakers • 18% of sentences from ESL essays contain a preposition error (Dalgish, 1985) • 8-10% of all prepositions in TOEFL essays are used incorrectly (Tetreault and Chodorow, 2008)

  3. Diagnosing L2 Errors • Statistical modeling on large corpora. But what kind? • General corpora composed of well-edited texts by native speakers (“native speaker corpora”)  Currently dominant approach • Error-annotated learner corpora: consist of texts written by ESL learners  Our approach

  4. Our Learner Corpus • Chungdahm English Learner Corpus • A collection of English essays written by Korean-speaking students of Chungdahm Institute, operated in S. Korea • 130,754,000 words in 861, 481 essays, written on 1,545 prompts • Over 6.6 million error annotations in 4 categories: • grammar, strategy, style, substance • Non-exhaustive error marking (more on this later)

  5. The Preposition Data Set • Our preposition data set • The 11 “preposition” types: NULL, about, at, by, for, from, in, of, on, to, with  represents 99% of student error tokens in data • Text set consists of 20.5 mil words • 117,665 preposition errors • 1,104,752 preposition non-errors • Preposition error rate as marked in the data: 9.6%

  6. Method • Cast error correction as a classification problem • Train an 11-way Maximum Entropy classifier on preposition events extracted from the Chungdahm corpus • A preposition annotationis represented as <s,c> (s: student’s prep choice, c: correct preposition) where s and c range over: { NULL, about, at, by, for, from, in, of, on, to, with } • s≠c for prep errors; s=c for non-errors • A preposition eventconsists of: • Outcome (prediction target): c • Contextual features extracted from immediate contexts surrounding preposition tokens, including the student’s original preposition choice (i.e., s)

  7. Preposition Context • Student prep choice + 3 words to left and right • MOD: Head of the phrase modified by the prep phrase • ARG: Noun argument of the preposition  Identified using Stanford Parser • Example text and annotation:

  8. Event Representation • Represented as an event: • Outcome: in • Features: (24 total)

  9. Training and Testing • Training set: 978,000 events • The rest is set aside for evaluation and development • Creating an evaluation set for testing • Error annotation in Chungdahm corpus is not exhaustive:  Many student errors are left unmarked by tutors • This necessitates creating a re-annotated evaluation set • 1,000 preposition contexts annotated by 3 trained annotators • Inter-annotator agreement (0.860~0.910), kappa (0.662~0.804)

  10. Evaluation Results • 11-way classification - works as error correction (multi-outcome decision) model - can be backed-off to an error detection (binary decision) model • Omission errors (I wait /foryou. ) *Error detection is trivial for this type • Extraneous prep errors (So I go to/ home quickly.) • Selection errors (Adult give money at/on birthday.)

  11. Related Work • Chodorow et al. (2007) • Error detection model targeting 34 prepositions • Trained on San Jose Mercury news + Lexile data • 0.88 (precision) 0.16 (recall) for detecting selection errors • Gamon et al. (2008) • Error detection and correction model of 13 prepositions • One classifier to determine whether a preposition/article should be present; another for correct choice; an additional filter • Trained on MS Encarta data, tested on Chinese learner writing • 80% precision; recall not reported • Izumi et al. (2003, 2004) • Trained on Standard Speaking Test Corpus (Japanese) • 56 speakers, 6,216 sentences • 25% precision and 7% recall on 13 grammatical error types

  12. Comparison: Native-Corpus-Trained Models • Question: Will models trained on native-speaker-produced texts outperform our model? • The advantage of native corpora: They are plentiful.  We allowed these models to have a larger training size. • Experimental setup: • Build models on native corpora, using varying training set sizes (1mil – 5mil) • Data: the Lexile Corpus, 7th and 8th grade reading levels • A comparable feature set was employed

  13. Learner Model vs. Native Models • Testing results on learner data (replacement errors only): • Learner model outperforms all native models • Native models: performance gain with larger size insignificant beyond 2-3mil point

  14. What Does This Prove? • Are the native models flawed? Bad feature set? • No. In-set testing (against held-out native text) shows performance levels comparable to those in published studies • Could some of the performance gaps be due to genre differences? • Highly likely. However, 7th-8th grade reading materials were the closest match we could find to student essays. • In sum: Native models’ advantage of larger training size does not outweigh those of the learner model’s: genre/text similarity and error-annotation

  15. Discussion: Learner language vs. native corpora • Modeling on native corpora: • Produces a one-size-fits-all model of “native” English • More generic & universally applicable? • Modeling on a learner corpus: • Produces a model specific to the particular learner language • Can it be applied to the language of other learner groups? • ex. French citizens? Japanese-speaking English learners? • Combining two approaches: • A system with specific models for different L1 background • Plus a back-off “generic” model, built on native corpora

  16. Discussion: The Problem of Partial Error Annotation • Partial error annotation problem: • 57% of replacement errors and 85% of extraneous prepositions are unchecked by Chungdahm tutors • Training data includes conflicting evidence. • Our model’s low recall/high precision are impacted by it • Model assumes a lower-than-true error rate • Model has to reconcile between conflicting sets of evidence • When the model does flag an error, it does so with high confidence and accuracy • Solution? Bootstrapping, relabeling of unannotated errors

  17. Conclusions • As language instruction turns digital, more and more (partially) error-annotated learner corpora like the Chungdahm corpus will become available • Building a direct model of L2 errors, whenever available, offers an advantage over models based on native corpora, despite the partial annotation problem (if any) • Exhaustive annotation is not necessary for learner-corpus-trained models to outperform standard native-text-trained models with much larger training data set

More Related