1 / 63

Korpusarbete

Korpusarbete. Pragmatik VT04 Staffan Larsson. Varför använda korpus?. Hitta fenomen och mönster försöka förklara dessa med teori Testa och utveckla teorier T ex talakter: Är taxonomin av dialogdrag heltäckande? Kan den kodas på ett tillförlitligt sätt?

Download Presentation

Korpusarbete

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Korpusarbete Pragmatik VT04 Staffan Larsson

  2. Varför använda korpus? • Hitta fenomen och mönster • försöka förklara dessa med teori • Testa och utveckla teorier • T ex talakter: Är taxonomin av dialogdrag heltäckande? Kan den kodas på ett tillförlitligt sätt? • Stämmer kodningen med vad teorin förutsäger? • Hitta korrelationer mellan fenomen (t ex talakt-intonation) • Dialogsystemutveckling • Givet en domän, undersöka vilken typ av dialog som förekommer • Få fram en rimlig målsättning för systemet baserat på riktiga data

  3. Purpose of dialogue annotation (Erbach) • Linguistic description and analysis on different levels • Resources for conversation analysis (sociological, socio-linguistic research) • Resources for system engineering (acoustic models, language models) • Resources for application development (Prompts, recognition grammars, dialogue design) • Resources for system evaluation

  4. The use of corpora in dialogue systems development (Jönsson) • Initial design • System development • Fine tuning • Sub-task evaluation • Theoretical development • Evaluation

  5. The sound of dialogue • A820101 Travel Agency Dialogue I (Huppdialogen) • A travel agency customer wants to book a flight to Paris.

  6. The look of dialogue (GTS standard) $P: hup $J: [1 a:0 ]1 $P: [1 ö:m ]1 // flyg ti <1 paris >1 @ <1 name >1 $J: mm <2 >2 <3 / ska [2 du ha:0 ]2 en0 returbiljett >3 @ <2 event: P opens her bag >2 @ <3 event: people are talking in the background >3 $P: [2 ö:1 ]2 $P: va1 sa0 du $J: ska du ha0 en0 tur å0 retur $P: ja0 <4 / >4 ö1 @ <4 inhalation sound (burping): J >4 $J: // vicken månad ska du åka $P: / <5 <6 >5 >6 ja:0 typ den: ä:1 tredje fjärde <7 <8 april >7 / [3 nån]3 gång där >8 <9 / >9 så0 billit [4 som möjlit ]4 @ <5 sigh >5 @ <6 event: P is looking through some papers >6 @ <7 name >7 @ <8 puffing >8 @ <9 inhalation sound: J >9 $J: [3 mm ]3 $J: <10 [4 ja0 just ]4 de0 jo / de0 ha1 ja1 aldri hört förr / de0 billiaste>10 vi0 har <11 e:0 >11 <12 air france >12 ettusenåttahundratie / [5 plus ]5 flygplatsskatter så0 du hamnar på: <13 >13 a0 du kan få0 exakt <14vänta0 ska du se0 här vi0 gö1 såhär / ö:1 // >14 @ <10 giggling: P >10 @ <11 inhalation sound: P >11 @ <12 name >12 @ <13 inhalation sound >13 @ <14 event: J is typing on a computer keyboard >14 $P: [5 a:0 ]5

  7. The look of dialogue (CLAN standard) P:   hu:p      (0.3) ?:   ((br)a[:( P:         [ö:m      (1.4) P:   flyg ti pari:s J:   mm:      (0.7) ((P opens her bag)) P:   °(ö[:)° J:      [ö:: >en returbiljett<      (0.8) P:   va sa du? J:   ska du ha en tur å retur. P:   ja, J:   ·h[h P:     [ö:h      (2.3) J:   viken månad ska du åka i      (3.0) ((P is looking through some papers)) P:   ja: typ den: (0.7) tredje fjärde april h[h °nångång (där° J:                                           [ m:m J:   ·hh P:   så billit som mö[jlit *hhh* J:                   [ja just de jo (.) de ha ja aldri hört förr      (.) P:   (m)[(nä) J:      [de billiaste vi har [e: >air fra:nce< ettusenåttahundratie. P:                           [hh P:   [ a: J:   [ plus flygplatsskatter så ru hamnar på: ·h (.) a du kan få exakt      ((vänta ska ru °se här vi gö såhär°      ((J is typing on a computer keyboard)) (0.5)

  8. no comments $P: hup $J: [1 a:0 ]1 $P: [1 ö:m ]1 // flyg ti paris $J: mm / ska [2 du ha:0 ]2 en0 returbiljett $P: [2 ö:1 ]2 $P: va1 sa0 du $J: ska du ha0 en0 tur å0 retur $P: ja0 / 4 ö1 $J: // vicken månad ska du åka $P: / ja:0 typ den: ä:1 tredje fjärde april / [3 nån ]3 gång där / så0 billit [4 som möjlit ]4 $J: [3 mm ]3 $J: [4 ja0 just ]4 de0 jo / de0 ha1 ja1 aldri hört förr / de0 billiaste vi0 har e:0 air france ettusenåttahundratie / [5 plus ]5 flygplatsskatter så0 du hamnar på:a0 du kan få0 exakt vänta0 ska du se0 här vi0 gö1 såhär / ö:1 // $P: [5 a:0 ]5

  9. no pauses and indices $P: hup $J: [ a ] $P: [ öm... ] flyg till paris $J: mm... ska [ du ha ] en returbiljett $P: [ ö ] $P: vad sa du $J: ska du ha en tur och retur $P: ja... ö $J: vilken månad ska du åka $P: ja typ den ä tredje fjärde april... [ nån gång ] där... så billigt som [ möjligt ] $J: [ mm ] $J: [ ja just ] det jo... det har jag aldrig hört förr... de billigaste vi har är air france ettusenåttahundratie... [ plus ] flygplatsskatter så du hamnar på ja du kan få exakt vänta ska du se här vi gör såhär... ö... $P: [ a ]

  10. Typer av korpusarbete • Datainsamling & transkribering • Naturlig dialog • Wizard of Oz • Bearbetning • Destillering • Kodning • Talakter • Dialogspel • Informationstillstånd • NP-referens, presupposition, implikatur...

  11. Datainsamling • Naturlig M-M-dialog (människa-människa) • Fejkad M-D-dialog (människa-dator) • ”Wizard of Oz” • M-D-dialog med dialogsystem • För vidareutveckling och felsökning

  12. Types of dialogue corpora • Human-Human • Call Home (spontaneous telephone speech) • Map Task (direction giving on a map) • Switchboard (task-oriented human-human dialogues) • Childes (child language dialogues) • Verbmobil (appointment scheduling dialogues) • TRAINS (task-oriented dialogues in railroad freight domain) • Göteborg Spoken Language Corpus (multiple activities) • ATIS (flight reservation dialogues) • Human-Machine • Danish Dialogue System (57 dialogues, domestic flight reservation) • Philips (13500 dialogues, train timetable information) • Sundial (100 Wizard of Oz dialogues, British flight information)

  13. Collecting corpora (Slide borrowed from Arne Jönsson) • Natural dialogues + Natural user tasks and needs + Easy to set up - Not human-computer dialogues • Wizard of Oz-dialogues

  14. Wizard of Oz-simulations(Slide borrowed from Arne Jönsson) Subject Wizard

  15. Collecting corpora(Slide borrowed from Arne Jönsson) • Natural dialogues + Natural user tasks and needs + Easy to set up - Not human-computer dialogues • Wizard of Oz-dialogues - Artificial task - Resource consuming + Computer-Human interaction

  16. Wizard problems (Slide borrowed from Arne Jönsson) • Consistency • Within dialogues • Between dialogues • Computer vs human • Humans flexible — computers rigid • Humans write slow— computers are fast • Computers never do small mistakes— humans always make small mistakes

  17. Distilled dialogues (Slide borrowed from Arne Jönsson) • Post-processed human dialogues • Provides insights on natural interaction • Contains less human interaction phenomena • Requires an outline of the dialogue systems’ overall behaviour, capabilities and modalities • Requires knowledge on Computer-Human interaction

  18. Distilling guidelines(Slide borrowed from Arne Jönsson) • When to change • How to change • Three types of dialogue contributors • ‘System’ utterances • User utterances • Other

  19. Modifying ‘system’ utterances(Slide borrowed from Arne Jönsson) Depends on the dialogue system • The ‘system’ provides as much relevant information as possible • ‘System’ utterances are made more computer-like • The ‘system’ never repeats information unless explicitly asked to • The ‘system’ does not ask for information it has already achieved

  20. Removing ‘system’ utterances(Slide borrowed from Arne Jönsson) • ‘System’ utterances no longer valid are removed • Sequences of non-computer utterances are removed

  21. Modifying user utterances(Slide borrowed from Arne Jönsson) • Change user utterances as little as possible

  22. Removing user utterances(Slide borrowed from Arne Jönsson) • Utterances that are no longer valid are removed • Utterances discussing issues outside the scope of the application are removed

  23. Adding utterances(Slide borrowed from Arne Jönsson) • User and ‘system’ utterances can be added in order to have the dialogue continue U: Yees hi Anna Nilsson is my name and I would like to take the bus from Ryd center to Resecentrum in Linköping S: mm When do you want to leave?

  24. Natural dialogue(Slide borrowed from Arne Jönsson) U4: yes I wonder if you have any mm buses or (.) like express buses leaving from Linköping to Vadstena (.) on Sunday S5: no the bus does not run on sundays U6: how can you (.) can you take the train and then change some way (.) because (.) to Mjölby 'n' so S7: that you can do too yes U8: how (.) do you have any such suggestions S9: yes let's see (4s) a moment (15s) now let us see here (.) was it on the sunday you should travel U10: yes right afternoon preferably S11: afternoon preferable (.) you have train from Linköping fourteen twenty nine U12: mm S13: and then you will change from Mjölby station six hundred sixty U14: sixhundred sixty S15: fifteen and ten

  25. Distilling(Slide borrowed from Arne Jönsson) U4: yes I wonder if you have any mm buses or (.) like express buses leaving from Linköping to Vadstena (.) on Sunday S5: no the bus does not run on sundays U6: how can you (.) can you take the train and then change some way (.) because (.) to Mjölby 'n' so S7: that you can do too yes U8: how (.) do you have any such suggestions S9: yes let's see (4s) a moment (15s) now let us see here (.) was it on the sunday you should travel U10: yesrightafternoon preferably S11: afternoon preferable (.) you have train from Linköping fourteen twenty nine U12: mm S13: and then you will change from Mjölby station six hundred sixty U14: sixhundred sixty S15: fifteen and ten

  26. Distilled dialogue (Slide borrowed from Arne Jönsson) U4: yes I wonder if you have any buses or (.) like express buses going from Linköping to Vadstena (.) on Sunday S5: no the bus does not run on sundays U6: how can you (.) can you take the train and then change some way (.) because (.) to Mjölby and so S7: when do you want to leave? U8: (..) afternoon preferably S9: you can take the train from Linköping fourteen and twenty nine and then you will change at Mjölby station to bus six hundred sixty at fifteen and ten

  27. V8201011 again $P: hup $J: a $P: öm...flyg till paris $J: mm... ska [ du ha ] en returbiljett $P: [ ö ] $P: vad sa du $J: ska du ha en tur och retur $P: ja... ö... $J: vilken månad ska du åka $P: ja typ den ä tredje fjärde april... [ nån gång ] där... så billigt som [ möjligt ] $J: [ mm ] $J: [ ja just ] det jo... det har jag aldrig hört förr... de billigaste vi har är air france ettusenåttahundratie... [ plus ] flygplatsskatter så du hamnar på ja du kan få exakt vänta ska du se här vi gör såhär... ö... $P: [ a ]

  28. Slightly distilled A8201011 $U: hup $S: välkommen till resebyrån / vad kan jag stå till tjänst med $U: öm...flyg till paris $S: mm... ska [ du ha ] en returbiljett $U: [ ö ] $U: vad sa du $S: ska du ha en tur och retur $U: ja... ö... $S: vilken månad ska du åka $U: ja typ den ä tredje fjärde april...[ nån gång ] där så billit [ som möjlit ] $S: [ mm ] $S: [ ja just ] de.. det billigaste vi har är air france ettusenåttahundratie plus flygplatsskatter... för denna biljett krävs internationellt studentkort / har du det

  29. Very distilled version of A821011 $S Välkommen till resebyrån $U flyg till paris $S varifrån vill du åka? $U köpenhamn $S vill du ha en returbiljett? $U va sa du? $S vill du ha en returbiljett? $U ja $S vilken månad vill du resa? $U tredje fjärde april, så billigt som möjligt $S har du internationellt studentkort? $U nä $S då blir det det 1810 kronor.

  30. What is changed? (Slide borrowed from Arne Jönsson) • Removed • Utterances containing already provided information • Added • Utterances explicitly asking for information • Modified • Hesitations, pauses

  31. Using distilled dialogues (Slide borrowed from Arne Jönsson) • System development • Fine tuning • Task analysis • Analysis of sub-dialogues • Evaluation • Not an accurate model of the global dialogue structure • Education

  32. Development of dialogue systems requires valid corpus data • Natural dialogues do not capture human-computer interaction • Wizard of Oz-dialogues have artificial tasks • Distilled dialogues fill the gap between natural dialogues and Wizard of Oz-dialogues

  33. Levels of Annotation(slide borrowed from Gregor Erbach) • phonetic / phonological / orthographic • prosody • morphology / syntax / semantics • co-reference • dialogue acts • turn-taking • cross-level • acoustic (noise, phone line characteristics) • communication problems • speech recognition results (human-machine dialogues)

  34. Some coding schemas for speech acts/dialogue moves • DAMSL • LINLIN: Linköping, Ahrenberg et al, 1995 • HCRC: Developed for the Map Task Corpus, Andersson et al 1991 • DAMSL: By Discourse Resource Initiative as a standardized coding scheme, 1991 • SWBD-DAMSL: Modified DAMSL by Stolcke et al 2000 • GBG: Communicative Acts by Allwood 2000

  35. Properties for dialogue act coding schemes (slide borrowed from Leif Grönqvist) • How general is it? • Is it powerful enough for natural dialogue? • Does the scheme handle different modalities? • Are the definitions precise enough to make the scheme useful in dialogue systems? • Multi functional codings • Mutual exclusive categories • Discontinuous codings • Relational codings • Hierarchical coding values • Multi-layer scheme

  36. Map Task Corpus (slide borrowed from Gregor Erbach) • Map Task is a cooperative task involving two participants who sit opposite one another and each has a map which the other cannot see • One speaker (Instruction Giver) has a route marked on her map; the other speaker (Instruction Follower) has no route • Speakers are told that the goal is to reproduce the Instruction Giver's route on the Instruction Follower's map • Speakers know that the maps are not identical • 128 digitally recorded unscripted dialogues and 64 citation form readings of lists of landmark names • Transcriptions and a wide range of annotations are available as XML documents • Separation of corpus and annotation

  37. Dialogue Moves (MapTask)(slide borrowed from Gregor Erbach) • Six initiating moves • instruct - commands the partner to carry out an action • explain - states information which has not been elicited by the partner • check - requests the partner to confirm information • align - checks the attention or agreement of the partner • query-yn - asks a question which takes a "yes" or "no" answer • query-w - any query which is not covered by the other categories • One pre-initiating move • ready - a move which occurs after the close of a dialogue game and prepare the conversation for a new game to be initiated

  38. (slide borrowed from Gregor Erbach) • Five response moves: • acknowledge - a verbal response which minimally shows that the speaker has heard the move to which it responds • reply-y - any reply to any query with a yes-no surface form which means "yes", however that is expressed • reply-n - a reply to a a query with a yes/no surface form which means "no" • reply-w - any reply to any type of query which doesn't simply mean "yes" or "no" • clarify - a repetition of information which the speaker has already stated, often in response to a check move

  39. Sample MapTask annotation *g Right, em, go to your right towards the carpenter’s house [INSTRUCT] *f Alright || well I’ll need to go below. I’ve got a blacksmith marked [ACKNOWLEDGE, EXPLAIN] g* Right, well you do that [ACKNOWLEDGE] f* Do you want it to go below the carpenter? [QUERY-YN] g* No, I want you to go up the left hand side of it towards... [REPLY-N] ... *f Right [ACKNOWLEDGE] Explain- game Query- game Instruct game

  40. Speech act coding: DAMSL • Dialogue Act Markup in Several Layers • draft, by DRI (Discourse Research Initiative) • Task oriented dialogue, two participants • agents collaborate to solve some problem • Concepts: • turn: units in which a single speaker has temporary control of the dialogue and speaks/writes for some period of time • utterance: unit whose definition is based on analysis of speaker intention (speech act) • segment: a continuous group of utterances

  41. Examples from TRAINS corpus • DPs collaborate in planning how to ship oranges with trains

  42. More complex example • a multi-utterance segment with speech act tag

  43. Multiple layers: • each utterance (or segment) is annotated along several independent (”orthogonal”) dimensions • Uncertainty modifier (?) • If coder is unsure • Utterance tags • Communicative Status • Information Level • Forward Looking Function • Backward Looking Function

  44. Communicative-status • Uninterpretable • The utterance unit is not comprehensible. • Abandoned • the import of the dialog would not change if these utterance units were removed • Self-talk • The utterance unit consists of one speaker talking to him or herself.

  45. Information-Level • Task • ”Doing the task” • Task-management • ”Talking about the task” • Communication-management • ”Maintaining the communication” • Other-level

  46. Forward Looking Function (FLF) • This dimension characterizes what effect an utterance has on the subsequent dialogue and interaction. • For instance, as the result of an utterance, is the speaker now committed to certain beliefs, or to performing certain future actions? • Annotators are allowed to look ahead in the dialog to determine the effect an utterance has on the dialog • Often, there are many different effects simultaneously achieved by an utterance. • To allow for this, the coding in this dimension allows eight different aspects of every utterance to be coded

  47. FLF tree(slide borrowed from Gregor Erbach)

  48. Intuitive test : whether the utterance could be followed by ``That's not true''. • ”Let's take the train from Dansville'' • presupposes that there is a train at Dansville, • but this utterance is not considered a statement. • You couldn't coherently reply to this suggestion with ``That's not true''. • Statement • Assert • Reassert • Other-statement

  49. Influencing-addressee-future-action • Open-option (offer) • ”how about going through Corning” • Action-directive (request) • ”Move the train to Dansville” • ”Please speak more slowly” • Rough test: whether the hearer could coherently respond with ``I can't do that'’

  50. Not responding to... • Action-directive would be considered to be rude • Open-option need not have any negative effect since no obligation (beyond normal conversational constraints) is placed on the listener • For example, the first utterance below is an Open-option (abbreviated here as OO) because B does not need to address it and can coherently answer with utt2. utt1 OO A: There is an engine in Elmira utt2 Action-dir B: Let's take the engine from Bath. • On the other hand, in the following example utt1 is an Action-directive and B should explicitly refuse the suggestion if it is not adopted. utt1 Action-dir A: Let's use the engine in Elmira. utt2 Reject(utt1)B: No utt3 Action-dir B: Let's take the engine from Bath.

More Related