This research is supported by the U.S. Department of Education and DARPA.

The UI System in HOO 2012 Shared Task on Error Correction Alla Rozovskaya, Mark Sammons, and Dan Roth {rozovska, mssammon,danr}@illinois.edu System Overview The HOO 2012 Shared Task The Determiner Module • Pre-processing: spelling correction, POS tagging, shallow parsing. • Determiner module: • Missing, extraneous and incorrect articles. • Learning: Averaged Perceptron (AP) discriminative model, using the inflation method. • Data: the FCE training data with naturally occurring learner errors. • Preposition module: • Missing, extraneous and incorrect prepositions. • Learning: a hybrid model: AP (same as in the determiner module) and Naïve Bayes (NB) adapted with the priors method (Rozovskaya and Roth 2011). • Data: AP is trained on the FCE data with natural learner errors; NB is trained on the Google Web 1T corpus. • Focuses on mistakes in determiner and preposition usage made by non-native speakers of English. • ``Nowadays Ø*/the Internet makes us closer and closer.'‘ • ``I can see at*/on the list a lot of interesting sports.'‘ • The task consisted of three metrics: Detection, Recognition, and Correction. Among the 14 teams, the UI team scored first or second in each metric. • Trained on the FCE corpus with natural errors; artificial errors are added, using the inflation method. • Features: based on words and POS tags in the 4-word window (see paper for details). • Learning: AP in Learning Based Java (Rizzolo and Roth, 2007). Our Contributions Table 2: Article F-score results on dev. • Using the methods proposed in earlier work (Rozovskaya and Roth, 2010, 2011), we build highly competitive systems for determiner and preposition error correction. • We propose an improvement to the earlier method, error inflation, which results in significant gains in performance. The Preposition Module Test Performance • A hybrid model: AP+NB • AP is trainedon the FCE corpus with natural errors; artificial errors are added, using the inflation method (same as the determiner system). • NB is trained on the Google Web 1T corpus adapted with the priors method. • Features: words in the 4-word window and complement of the preposition (see paper for details). The Error Inflation Method • Addresses the low recall problem of the error correction tasks. • Based on the artificial errors approach from our earlier work (Rozovskaya and Roth, 2010). • The inflation method increases the error rates in training data, thereby boosting the recall of the correction models. • See paper for the error inflation algorithm. Table 1: Performance on test before revisions (F-score). Results are shown before revisions were made to the data. The rank of the system among the 14 participating teams is shown as a superscript. In the overall performance,and in each individual subtask, our system ranked first or second according to all three evaluation metrics (Detection, Recognition, and Correction). Please see Dale et al. (2012) and our paper for the results after revisions to the test data were made. Table 3: Preposition F-score results on dev. This research is supported by the U.S. Department of Education and DARPA. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAAAAAAA

This research is supported by the U.S. Department of Education and DARPA.

This research is supported by the U.S. Department of Education and DARPA.

Presentation Transcript

Research Supported by:

U.S. Department of Education

U.S. DEPARTMENT OF EDUCATION

Research activities at the Department supported by IPICS/SIDA

U.S. Department of Education

This study is supported by Tubitak.

U.S. Department of Education

This research is funded by the Michigan Department of Education

RESEARCH SUPPORTED BY:

Acknowledgements : This research is supported by NSF grant 0938074

U.S. Department of Education

U.S. Department of Education

U.S. DEPARTMENT OF EDUCATION

This research is financially supported by The Manitoba Institute of Child Health

This project is generously supported by:

Research Supported by:

*This research was supported by the U.S. Department of Energy: Contract No. DE-AC02 -98CH10886

This research is supported by the U.S. Department of Education and DARPA.

U.S. Department of Education

U.S. Department of Education, H325A120003

U.S. Department of Education

Jeff Baker, U.S. Department of Education Barbara Mroz, U.S. Department of Education