1 / 6

David Farwell, Stephen Helmreich Computing Research Laboratory/New Mexico State University

Columbia, CRL/NMSU, ISI/USC, LTI/CMU, MITRE, UMIACS/UMD. David Farwell, Stephen Helmreich Computing Research Laboratory/New Mexico State University Lori Levin, Teruko Mitamura Language Technologies Institute/Carnegie Mellon University Bonnie Dorr, Rebecca Green

maj
Download Presentation

David Farwell, Stephen Helmreich Computing Research Laboratory/New Mexico State University

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Columbia, CRL/NMSU, ISI/USC, LTI/CMU, MITRE, UMIACS/UMD • David Farwell, Stephen Helmreich Computing Research Laboratory/New Mexico State University • Lori Levin, Teruko Mitamura Language Technologies Institute/Carnegie Mellon University • Bonnie Dorr, Rebecca Green Institute for Advanced Computer Studies/University of Md. • Eduard Hovy Information Sciences Institute/University of S. California • Keith Miller, Florence Reeder MITRE Corporation • Owen Rambow, Nizar Habash Columbia University

  2. Columbia, CRL/NMSU, ISI/USC, LTI/CMU, MITRE, UMIACS/UMD What we annotate • multiple comparable bilingual text corpora • parallel text corpora • multiple translations of texts • Genre - newspaper texts / DARPA corpus • Goals • common representation (interlingua) • common methodology and tools • observe and catalogue different surface realizations of the same meaning across and within languages

  3. Columbia, CRL/NMSU, ISI/USC, LTI/CMU, MITRE, UMIACS/UMD

  4. Columbia, CRL/NMSU, ISI/USC, LTI/CMU, MITRE, UMIACS/UMD

  5. Columbia, CRL/NMSU, ISI/USC, LTI/CMU, MITRE, UMIACS/UMD Annotation Process • Text is syntactically parsed (Connexor / IL0) • Reviewed and corrected (TrEd) • Annotation to IL1 (Tiamat) • Content words annotated for sense (Omega) • Arguments annotated for thematic role (LCS) • 2 English translations of 6 articles • Arabic, French, Hindi, Japanese, Korean, Spanish • 12 annotators, 2 at each site • Total: 144 annotated texts to IL1 level

  6. Columbia, CRL/NMSU, ISI/USC, LTI/CMU, MITRE, UMIACS/UMD Results: Agreement & Time • Tools (Tiamat) • Manuals (IL0 for 7 languages, IL1) • Inter-annotator agreement: kappa = .83 (mK), .66 (wn), .59 (theta-roles) • Annotation time: 4 hours/annotator/ text, 250 words/text, 2 annotators/text = approx. 2 person years for 100K at IL1 • Next step: merge IL1 representations and develop transformation algorithms to produce IL2

More Related