150 likes | 320 Views
An English Writing Assistant for Non Native Speakers. Projet CorrecTools ( CAPRA : Compagnon d’Apprentissage et de Perfectionnement à la Rédaction en Anglais) M. Garnier, A. Rykner Université Toulouse 2 P. Saint-Dizier CNRS France. Introduction.
E N D
An English Writing Assistantfor Non Native Speakers Projet CorrecTools (CAPRA: Compagnon d’Apprentissage et de Perfectionnement à la Rédaction en Anglais) M. Garnier, A. Rykner Université Toulouse 2 P. Saint-Dizier CNRS France
Introduction • English: main language for international communication → Necessity for Non Native Speakers of English (NNS) to produce satisfactory English texts (personal/professional spheres) • Learning and practice: necessary requirements for long-term acquisition of writing skills • Each language and linguistic community encounter specific problems in writing English (‘language transfer’) Need for an automatic English writing assistant • Presentation of our project: • Aims and challenges • Corpus constitution and error analysis • Annotation of errors • Some results of the analysis of a Thai to English corpus
1. Aims and challenges • Presence of grammatical, lexical and stylistic errors in the productions of NNS: • Make comprehension difficult + damage credibility • A lot of errors are not treated by text editors such as MS Word etc. • Didactic perspective: explanation of errors and grammar rules given alongside the corrections • Focus on pairs of languages (French to English, Thai to English): • Prototypicality of errors: easier correction process • Knowledge of the L1: more efficient analysis and correction of errors
2. Corpus constitution and error analysis • Exploratory corpus: emails, reports, scientific publications, web pages, blogs Parameters: • Variety of authors (professionals, researchers, students) • Different domains of production (business, research, personal sphere) • Different levels of control, i.e. amount of care devoted to the production of a document • First stage: manual detection, annotation, and correction of errors • Classification of errors: creation of a system of categories • Characteristics of the system: • Categories created according to linguistic criteria, i.e. NP, PP, VP, Clause and Sentence • Inclusion of two levels of subtypes of errors inside main categories • Inclusion of indications concerning broad linguistic parameters: Lexicon, Morpho-Syntax, Syntax, Semantics, Style
3. The annotation of errors • Errors are annotated using a standard XML formalism enriched with attributes • Schema designed so as to reflect cognitive strategies used by human correctors when detecting and correcting errors • Delimitation and characterization of errors:
3. The annotation of errors (2) • Delimitation and characterization of corrections:
3. The annotation of errors (3) • Example of an annotated error with multiple corrections: *The second stage has therefore two goals: [...] and the construction of the meaning utterance.
4. Some results on a Thai-English corpus • Preliminary study conducted on a limited corpus of English texts written by Thai native speakers • Description of corpus: • 10 scientific abstracts • 1755 words • Various research domains and writers • Steps completed so far: • Detection of errors • Classification of errors • Highlighting several aspects of error distribution • Future steps: • Annotation of errors • Collaboration with Thai native speakers in order to study the extent of transfer effects • Towards a correction system?
4. Some results on a Thai-English corpus (2) • Distribution of errors according to broad linguistic parameters (number of subtypes of errors vs. number of errors in total for each axis) Lexicon MorphoSyntax
4. Some results on a Thai-English corpus (3) • Distribution of errors according to main categories of our system (number of subtypes of errors vs. number of errors in total for each category)
4. Some results on a Thai-English corpus (4) • Distribution of errors according to subtypes of errors • Main types of errors: omission of determiner, omission of plural, erroneous subject/verb agreement, abusive NØN construction
4. Some results on a Thai-English corpus (5) • Omission of determiner: • *World of information technologies can be classifiied into 2 main groups. → The world of information technologies can be classified into 2 main groups • Omission of plural: • *Reading from book and website is a way to diagnose diseases. → Reading from books and websites is a way to diagnose diseases. • Erroneous subject/verb agreement: • *Precision depend on noise in each website. → Precisiondepends on noise in each website. • Abusive NØN construction: • *It will decrease the plant quality. → It will decrease the quality of the plant / the plant’s quality.
Perspectives • French to English: • Extend the initial corpus • Investigate the relevance of learner corpora • Stabilize the classification system and the annotation schema • Focus on certain errors and start drafting rules for correction • Evaluate the needs of a population of users and the demand for such a tool • Thai to English: • Extend the initial corpus • Work with Thai researchers to evaluate the needs of potential users and assess the quality of the analyses proposed • Draft a roadmap for the continuation of the project in Thailand
Kop khun khà! CorrecTools website: http://www.irit.fr/recherches/ILPL/webct/ct.html