1 / 18

Assessing results?

Measuring transaction success in spoken dialogue information systems Hans Dybkjær SpeechLogic ™ , Prolog Development Center A/S & Laila Dybkjær NISLab, University of Southern Denmark. Assessing results?. Subjective listening Fine and important Not suitable for contracts

cady
Download Presentation

Assessing results?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Measuring transaction success in spoken dialogue information systemsHans Dybkjær SpeechLogic™, Prolog Development Center A/S&Laila DybkjærNISLab, University of Southern Denmark

  2. Assessing results? • Subjective listening • Fine and important • Not suitable for contracts • Not suited for tracing progress • Very dependent on mood of caller • Transcript walkthroughs • Fine, provides many observations • Not suitable for contracts • Not suited for tracing progress • Transaction coding • Suitable for contracts • Suitable for tracing progress? • Huge work...

  3. Project and partners • Holiday Account (“FerieKonto”)spoken dialogue service via the telephone • September 2001 – December 2002 • Supported by the Danish government • Three Danish partners: • NISLab, SDU • Prolog Development Center A/S (PDC) • ATP-huset (hosts FerieKonto and other funds) • Employers pay 700 M kr. to FerieKonto per year • About 12.000 selected “general information” in old touch-tone system per year • Philips Speech Processing sub-contractor to PDC

  4. Facts on FAQ • Phase 1 called ”Vejled” in operation since September • Phase 2, FAQ, in operation medio December 2002 • Dialogue model • About 40 A4-pages • 80 semantic concepts in input • 100+ different information stories in output • About 800 (full) words in vocabulary • About 2500 grammar lines • Context free with synthesized attributes • 450 pre-recorded phrases, many long

  5. Characteristics • System takes initiative and guides user • User may take initiative and control system • Barge-in, i.e. the user may interrupt the system • But we don’t know where, i.e. for long output we don’t know how much of logged output they have heard • Whatever the user says is recognised as something withing system vocabulary and grammar • No sound output logged, only user input

  6. Transactions • No clear definition of transaction • One dialogue may be one transaction (e.g. ticket reservation or train information) • One dialogue may contain several different transactions (e.g. frequently asked questions) • A simple way of looking at transactions: • Start • End (success, failure) • Relate these to dialogue acts

  7. Examples • Success:U: What is your fax numberS: Fax number ... • Failure:U:What is your fax numberS: E-mail address ... • Wrong = unwanted reply:S: Do you want our address?U: No.S: Our address is ...(user gets unwanted information – not a transaction) • Wrong = erroneous information:S: Fax number 36 36 00 00(actually PDC’s fax is 36 36 00 01) • (’Wrong’ is outside the transaction scheme)

  8. Dialogue acts

  9. Resulting tag set

  10. An example dialogue • S 0.1: - Velkommen til FerieKontos talebaserede vejledningssystem.- Her kan du få standardoplysninger om udbetaling af feriepenge.- Du kan altid bede om hjælp eller gentagelse. • S 0.2: - Sig om du - er lønmodtager eller arbejdsløs i Danmark - er frameldt folkeregistret og bor i udlandet - eller har forladt arbejdsmarkedet på grund af alder eller helbred.- PAUSE5000 • U 0.1   jeg er lønmodtager • S 1.1 - Hvis du er lønmodtager i Danmark eller arbejdsløs og har et FerieKonto-bevis • S 1.2 - skal du udfylde det og sende det til os. - Udover dig selv skal din arbejdsgiver eller din a-kasse eller socialforvaltningen skrive under.- PAUSE200 • S 1.3 - Hvis FerieKonto-beviset er korrekt udfyldt når vi modtager det, overfører vi feriepengene til din konto. - Ekspeditionstiden er cirka fem arbejdsdage. - Du kan dog tidligst få dine feriepenge en måned før din første feriedag.- PAUSE1000 (Opening) (Offer) Start (Select) (Feedback) (Information) (Information) Success

  11. Offer (End) Reject Offer • S 1.4 - Vil du høre informationen igen?- PAUSE5000 • U 1.1 nej tak • S 2.1 - Vil du høre FerieKontos adresse?- PAUSE5000 • U 2.1 nej tak • S 3.1- Vil du tilbage til starten af den generelle vejledning?- PAUSE5000 • Event 1nothing recorded • S 4.1 - Undskyld. - Jeg hørte ikke hvad du sagde. - Sig 'ja' for at starte forfra, 'farvel' for at forlade systemet.- PAUSE5000 • U 4.1 nej tak • S 5.1 - Undskyld. - Systemet kan give information om udbetaling hvis du er lønmodtager eller arbejdsløs, er frameldt folkeregistret eller har forladt arbejdsmarkedet på grund af alder eller helbred. Du kan også bede om FerieKontos adresse. - Sig en af disse muligheder hvis du vil fortsætte, eller sig 'farvel' for at slutte.- PAUSE5000 • Event 2 nothing recorded • S 6.1 - Farvel og tak. • Event 3 hangup Reject Offer (Other) Repair Reject Repair (Other) (Closing) (Other)

  12. Another example dialogue (Opening) • S 0.1 - PAUSE1000- Velkommen til FerieKontos automatiske vejledning. - Du kan lade systemet foreslå emner eller du kan forsøge med dine egne ord. - Du vælger emner ved at tale til systemet. - Vælg mellem emnerne: - PAUSE200 - 'lønmodtager'. - PAUSE200 - 'På orlov'. - PAUSE200 - 'bor i udlandet'. - PAUSE200 - Eller 'forladt arbejdsmarkedet' på grund af 'alder' eller 'helbred'. - PAUSE500- Sig et emne, sig 'gentag' eller bed om 'flere muligheder'? • U 0.1 jeg er studerende • S 1.1 - Spurgte du om - Hovedmenu • U 1.1studerende • S 2.1 - Hvis du er studerende og modtager SU, kan du godt få dine feriepenge udbetalt selv om du ikke holder fri fra studierne.- Men hvis du har arbejde ved siden af studierne, skal du holde fri fra dit arbejde for at få feriepengene udbetalt. • S 2.2 - Hvis du har arbejde er det din arbejdsgiver, der skal underskrive dit FerieKonto-bevis, ellers er det socialforvaltningen i din kommune. • S 2.3 - PAUSE500 Offer (Request) Start Check Repair (Information) Success (Information)

  13. Another, continued Offer • S 2.4 - Er der andet du gerne vil spørge om? • U 2.1 skal jeg skrive lørdage på • S 3.1 - Fra ferieåret 2002, der starter 1. maj, er ferieugen ændret til 5 feriedage, og lørdage skal således ikke længere tælles med i ferien. • S 3.2 - Vil du vide noget om 'ferieåret'? • U 3.1 nej farvel • S 4.1 - Farvel og tak for dit opkald. • Event 1 nothing recorded • Event 2 disconnect Start (Request) (Information) Success Offer (Reject + request) Start (Feedback) Success (Other) (Other)

  14. Transaction annotated data • Dataset: • Vejled: A few thousand calls • About 500 FAQ test calls • Test: 225 calls, three batches, March-May 2002 • Batch 1 primarily developers • Batch 2, 3 “invited” test persons • Operation: 217 calls, one week, September 2002 • real customers with real problems

  15. Annotation • Transcribed using Philips Transcription Station • Then transformed to XML and web • Markup was done using an annotation tool developed by PDC • interface is a browser window • annotation files stored in XML • All dialogues annotated by same, experienced coder, using the same coding scheme throughout

  16. Results table

  17. Results comments • Higher transactions success in test dialogues • Primary causes of failure in test sets are: • Dialogue model • Language model • Causes corrected before operation • Difference in user groups • Test users follow the dialogue, they only have artificial problems • Primary causes of failure in operational calls are: • Real customers ask for information not covered • Typical questions to be covered by FAQ • Problem with callers hanging up without saying anything in the dialogue.

  18. Smooth dialogues • More precise overview of problems and their causes and seriousness • Same topic may have fail and success in same call • Few or many repairs • distinction between unwanted and erroneous information • erroneous information is unacceptable (tomorrow is Friday, phone 36 36 00 01) • other information than asked for may be more or less serious(fax instead of phone, fax instead of email) • misunderstanding a yes for a no is usually not so serious (repairable) but can be a nuisance • Misrecognitions • Information blocks may contain more than asked for

More Related