annotating esl errors challenges and rewards l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Annotating ESL Errors: Challenges and Rewards PowerPoint Presentation
Download Presentation
Annotating ESL Errors: Challenges and Rewards

Loading in 2 Seconds...

play fullscreen
1 / 27

Annotating ESL Errors: Challenges and Rewards - PowerPoint PPT Presentation


  • 378 Views
  • Uploaded on

Annotating ESL Errors: Challenges and Rewards. Alla Rozovskaya and Dan Roth University of Illinois at Urbana-Champaign. NAACL-HLT BEA-5 2010 Los Angeles, CA . Page 1. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Annotating ESL Errors: Challenges and Rewards' - sandra_john


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
annotating esl errors challenges and rewards

Annotating ESL Errors: Challenges and Rewards

Alla Rozovskaya and Dan Roth

University of Illinois at Urbana-Champaign

NAACL-HLT BEA-5 2010

Los Angeles, CA

Page 1

TexPoint fonts used in EMF.

Read the TexPoint manual before you delete this box.: AAA

annotating a corpus of english as a second language esl writing motivation
Annotating a corpus of English as a Second Language (ESL) writing: Motivation
  • Many non-native English speakers
  • ESL learners make a variety of mistakes in grammar and usage
  • Conventional proofing tools do not detect many ESL mistakes – target native English speakers and do not address many mistakes of ESL writers
  • We are not restricting ourselves to ESL mistakes

Page 2

goals
Goals
  • Developing automated techniques for detecting and correcting context-sensitive mistakes
    • Paving the way for better proofing tools for ESL writers
      • E.g., providing instructional feedback
    • Developing automated scoring techniques
      • E.g. , automated evaluation of student essays

Annotation is an important part of that process

annotating esl errors a hard problem
Annotating ESL errors: a hard problem
  • A sentence usually contains multiple errors
    • In Western countries prisson conditions are more better than in Russia , and this fact helps to change criminals in better way of life .
  • Not always clear how to mark the type of a mistake
    • “…which reflect a traditional female role and a traditional attitude to a woman…”

“…which reflect a traditional female role and a traditional attitude towards women…”

a woman

women

a

woman

<NONE>

women

annotating esl errors a hard problem5
Annotating ESL errors: a hard problem
  • Distinction between acceptable/unacceptable usage is fuzzy
    • Women were indignant at inequality from men.

Women were indignant at the inequality from men.

common esl mistakes
Common ESL mistakes
  • English as a Second Language (ESL) mistakes
    • Mistakes involving prepositions
      • We even do good to*/for other people <NONE>*/by spending money on this and asking <NONE>*/for nothing in return.
    • Mistakes involving articles
      • The main idea of their speeches is that a*/the romantic period of music was too short.
      • Laziness is the engine of the*/<NONE> progress.
      • Do you think anyone will help you? There are not many people who are willing to give their*/ahands*/hand.
purpose of the annotation
Purpose of the annotation
  • To have a gold standard set for the development and evaluation of an automated system that corrects ESL mistakes
  • There is currently no gold standard data set available for researchers
    • Systems are evaluated on different data sets – performance comparison across different systems is hard
      • Results depend on the source language of the speakers and proficiency level
    • The annotation of this corpus is available and can be used by researchers who gain access to the ICLE and the CLEC corpora.
  • This corpus is used in the experiments described in [Rozovskaya and Roth, NAACL, ’10]
outline
Outline
  • Annotating ESL mistakes: Motivation
  • Annotation
    • Data selection
    • Annotation procedure
    • Error classification
  • Annotation tool
  • Annotation statistics
  • Statistics on article corrections
  • Statistics on preposition corrections
  • Inter-annotator agreement
annotation overview
Annotation: Overview
  • Annotated a corpus of ESL sentences (63K words)
  • Extracted from two corpora of ESL essays:
    • International Corpus of Learner English (ICLE) [Granger et al.,’02]
    • Chinese Learner English Corpus (CLEC) [Gui and Yang,’03]
  • Sentences written by ESL students of 9 first language backgrounds
  • Each sentence is fully corrected and error tagged
  • Annotated by native English speakers
annotation focus of the annotation
Annotation: focus of the annotation
  • Focus of the annotation: Mistakes in article and preposition usage
    • These mistakes have been shown to be very common mistakes for learners of different first language backgrounds [Dagneaux et al, ’98; Gamon et al., ’08; Tetreault et al., ’08; others]
annotation data selection
Annotation: data selection
  • Sentences for annotation extracted from two corpora of ESL essays
    • International Corpus of Learner English (ICLE)
      • Essays by advanced learners of English
      • First language backgrounds: Bulgarian, Czech, French, German, Italian, Polish, Russian, Spanish
    • Chinese Learner of English Corpus (CLEC)
      • Essays by Chinese learners of different proficiency levels
  • Garbled sentences and sentences with near-native fluency excluded with a 4-gram language model
  • 50% of sentences for annotation randomly sampled from the two corpora
  • 50% of sentences selected manually to collect more preposition errors
annotation procedure
Annotation: procedure
  • Annotation performed by three native English speakers
    • Graduate and undergraduate students in Linguistics/foreign languages
    • With previous experience in natural language annotation
  • Annotation performed at the sentence level – all errors in the sentence are corrected and tagged
  • The annotators were encouraged to propose multiple alternative corrections
    • Useful for the evaluation of an automated error correction system
      • “ They contribute money to the building of hospitals”

to/towards

to

annotation error classification
Annotation: error classification
  • Focus of the annotation: mistakes in article and preposition usage
  • Error classification (inspired by [Tetreault and Chodorow,’08])
    • developed with the focus on article and preposition errors
      • “…which reflect a traditional female role and a traditional attitude to awoman…”  “…which reflect a traditional female role and a traditional attitude towards a*/<NONE>woman*/women…”
    • was intended to give a general idea about the types of mistakes ESL students make
outline15
Outline
  • Annotating ESL mistakes: Motivation
  • Annotation
    • Data selection
    • Annotation procedure
    • Error classification
  • The annotation tool
  • Annotation statistics
  • Statistics on article corrections
  • Statistics on preposition corrections
  • Inter-annotator agreement
the annotated esl corpus
The annotated ESL corpus

Annotating ESL sentences with an annotation tool

Sentence for annotation

Flexible infrastructure allows for an easy adaptation to a different domain

Page 16

example of an annotated sentence
Example of an annotated sentence

Before annotation

“This time asks for looking at things with our eyes opened.”

With annotation comments

“This time @period, age, time@ asks $us$ for <to> looking *look* at things with our eyes opened .”

After annotation

“This period asks us to look at things with our eyes opened.”

Annotation rate: 30-40 sentences per hour

Page 17

outline18
Outline
  • Annotating ESL mistakes: Motivation
  • Annotation
    • Data selection
    • Annotation procedure
    • Error classification
  • Annotation tool
  • Annotation statistics
  • Statistics on article corrections
  • Statistics on preposition corrections
  • Inter-annotator agreement
common article and preposition mistakes
Common article and preposition mistakes
  • Article mistakes
    • Missing articles
      • But this , as such , is already <NONE>*/a new subject for discussion .
    • Extraneous articles
      • Laziness is the engine of the*/<NONE> progress.
  • Preposition mistakes
    • Confusing different prepositions
      • Education gives a person a better appreciation of*/for such fields as art , literature , history , human relations , and science
distribution of article errors by error type
Distribution of article errors by error type

Errors are dependent on the first language of the writer

Not all confusions are equally likely

Page 22

statistics on preposition corrections

Many contexts license multiple prepositions [Tetreault and Chodorow, ’08]

Statistics on preposition corrections

Unlike with articles, preposition confusions account for over 50% of all preposition errors

conclusions
Conclusions
  • We presented the annotation of a corpus of ESL sentences
  • Annotating ESL mistakes is an important but a challenging task
    • Interacting mistakes in a sentence
    • Fuzzy distinction between acceptable/unacceptable usage
  • We have described an annotation tool that facilitates the error-tagging of a corpus of text
  • The inter-annotator agreement on the task is low and shows that this is a difficult problem
  • The annotated data can be used by other researchers for the evaluation of their systems
slide27
Annotation tool

ESL annotation

rozovska@illinois.edu

http://L2R.cs.uiuc.edu/~cogcomp/software.php

Thank you!

Questions?

Page 27