1 / 11

The AMITI É S Corpus

The AMITI É S Corpus. up to the minute report. The GE English corpus. Around 716 English dialogues were received so far from GE Leeds of which 642 are “good ones”. The GE transcribers use the Transcriber tool version 1.4.2 to deliver ( *.TRS ) documents based on an XML syntax. Good things.

taber
Download Presentation

The AMITI É S Corpus

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The AMITIÉS Corpus up to the minute report

  2. The GE English corpus • Around 716 English dialogues were received so far from GE Leeds of which 642 are “good ones”. • The GE transcribers use the Transcriber tool version 1.4.2 to deliver ( *.TRS ) documents based on an XML syntax

  3. Good things • The TRS documents being XML based are very suitable for automatic processing and delivering of the format we are interested in (DAMSL like for example ). • The transcribers successfully applied the AMITIES guidelines for transcribing.

  4. Issues • They started to transcribe the audio files using the Turn and Utterance levels of annotation provided by the Transcriber tool. • We noticed that some strange situations like:overlapping, acknowledging, completion failed to be represented correctly in the received TRS documents.

  5. Solution and examples • Making use of the third logical level of annotation provided by the Transcriber, called Section. • The transcribers were required to create a new Section level called “exception” and to use it to encapsulate all the Turns containing one of the situations described previously.

  6. Example of overlapping BEFORE using the “exception” section DAMSL LIKE annotation A: That’s A: [lovely](1) C: [Hello](1) A: my name’s Louise Mr Smith and you want to change address? AFTER using the “exception” section DAMSL LIKE annotation A: That’s [lovely](1) my name’s Louise Mr Smith and you want to change address? C: [Hello](1)

  7. Example of acknowledging similar to completion BEFORE using the “exception” section DAMSL LIKE annotation A: And your telephone number please? C: 11111 A: Uh hmmm C: 111 A: Uh hmmm C: 111111 AFTER using the “exception” section DAMSL LIKE annotation A: And your telephone number please? C: 11111 [](1) 111 [](2) 111111 A: [Uh hmmm](1) [Uh hmmm](2)

  8. Addition facts • The Turns that were not considered to be exceptions were encapsulated by the default Section. • We trained the transcribers to use this logical level and the last 100 dialogues received are annotated with the “exception” level. • 542 dialogues are not annotated with this level.

  9. A rough classification of the corpus

  10. Task distribution inside the 100 exception annotated dialogues

  11. Thank you.

More Related