1 / 22

Extracting Information from Spoken User Input A Machine Learning Approach

Extracting Information from Spoken User Input A Machine Learning Approach. Piroska Lendvai Tilburg University, Netherlands. Outline. The community: ML as a tool to detect DAs, semantic content, communication problems in dialogue Tilburg University, Induction of Linguistic Knowledge group:

Sharon_Dale
Download Presentation

Extracting Information from Spoken User Input A Machine Learning Approach

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Extracting Information from Spoken User InputA Machine Learning Approach Piroska Lendvai Tilburg University, Netherlands

  2. Outline • The community: ML as a tool to detect DAs, semantic content, communication problems in dialogue • Tilburg University, Induction of Linguistic Knowledge group: • memory-based learning software package TiMBL • joint work with Antal van den Bosch & Emiel Krahmer 2001-2004 • TiMBL applied to Dutch human-machine dialogue data • Extract pragmatic and semantic components from spoken user input • Drawing on simple, potentially erroneous info from SDS • Components better classified individually or combined?

  3. speech recogniser I would like to travel from Amsterdam to Tilburg next Tuesday language understanding dialogue manager database answer generation From where to where would you like to travel on Tuesday twelve December? speech synthesis

  4. OVIS corpus example • S1: Good evening. From which station to which station do you want to travel? • U1: I need to go from Amsterdam to Tilburg on Tuesday next week. • S2: From where to where do you want to travel on Tuesday twelve December? • U2: From Amsterdam to Tilburg. • S3: At what time do you want to travel from Amsterdam to Tilburg? • U3: Around quarter past eleven in the evening. • (…) • S5: I have found the following connections: (…). Do you want me to repeat the connection? • U5: Please do. • …

  5. The OVIS system • Developed in 1995-2000, Dutch national project, train travel information • Slots to fill: DepartStation, ArriveStation, Dep/ArrivDay, Dep/ArrivTime • Dialogue structure: possibility to provide unsolicited info • System-initiative • System always verifies received info • Explicitly (“So you want to leave on Thursday.”) or • Implicitly (“At what time do you want to leave on Thursday?”) • 80 test users, noisy real-data corpus from different system versions • 441 full dialogues: 3738 turn pairs of system prompt & user reply • 43% of user turns inaccurately grounded by system • 8-26% WER

  6. Goal:Partial interpretation of user input • Drawing on attributes available from system’s modules, extract informationfrom user’s input turn • task-related dialogue act (TRA) supercategories for info-seeking dialogues (8 application-motivated classes) • query slot types being filled (30) • current user turn originates future communication problems? (binary) • in current turn, user is already aware of communication problems? (binary) • Facilitate full understanding • ASR to have more confidence in accepting/rejecting recognition hypothesis recognition of ‘yes’ and ‘no’ is highly erroneous; TRAs: Affirm, Negat • DM to launch error recovery

  7. Partial interpretation components • „From Amsterdam to Tilburg on Tuesday next week.” • „Not this Tuesday but next Tuesday.” • Task-related dialogue act:Slot-filling / Negation • (others:Affirmative, AcceptSysError, NonStd) • Slot(s) being filled by user: DepartStat, ArriveStat, ArriveDay / ArriveDay • (always co-occur with Slot-filling TRA) • Problem origin turn? Yes / No • predicting miscommunication • Problem awareness turn? No / Yes • detecting miscommunication

  8. Annotated user turns • S1: Good evening. From which station to which station do you want to travel? • U1: I need to go from Amsterdam to Tilburg on Tuesday next week. • SlotFill , DepSt_ArrSt_ArrDay ,Prob ,Ok • S2: From where to where do you want to travel on Tuesday twelve December? • U2: From Amsterdam to Tilburg. • SlotFill ,DepSt_ArrSt,Ok ,Prob • S3: At what time do you want to travel from Amsterdam to Tilburg? • U3: Around quarter past eleven in the evening. • SlotFill ,ArrTimeofDay_ArrHour,Ok ,Ok • S5: I have found the following connections: (…). Do you want me to repeat the connection? • U5: Please do. • Affirm,void,Ok, Ok

  9. Speech recognition • utterancespeech recogniser (ASR) word graph  lattice paths • ASR strategy: pick best path (confidence, language model) • There can be distortions and deletions in a path • ‘from amsterdam to tilburg 傽tuesday’ • ‘from amsterdam to tilburg nexttuesday’ • ‘toamsterdam to tilburg nexttuesday’ • Wrong pick = possibly severe information loss from amsterdam /2473 next /120.5 tuesday /297 to tilburg /158 to amsterdam /549 tuesday/3481.5

  10. Informativity of ASR output • ASR hypothesis lattice paths with associated confidences • User input: ‘rond vier uur’; around four o’clock • 856.109985 [1] <n> [2] #PAUSE# [4] op [7] drie [12] uur [13] #PAUSE# [14] on three o’clock 855.930054 [1] <n> [2] #PAUSE# [4] om [5] tien [12] uur [13] #PAUSE# [14] 855.430054 [1] <n> [2] #PAUSE# [4] om [5] drie [12] uur [13] #PAUSE# [14] 855.330017 [1] <n> [2] #PAUSE# [3] half [7] drie [12] uur [13] #PAUSE# [14] 855.140015 [1] <n> [2] #PAUSE# [3] half [8] #PAUSE# [9] vier [10] uur [13] #PAUSE# half four o’clock 855.109985 [1] <n> [2] #PAUSE# [3] half [8] tien [10] uur [13] #PAUSE# [14] 854.890015 [1] <n> [2] #PAUSE# [3] half [8] #PAUSE# [9] vier [11] uur [13] #PAUSE# half four o’clock 854.880005 [1] <n> [2] #PAUSE# [4] om [5] dertien [10] uur [13] #PAUSE# [14] 854.470032 [1] <n> [2] #PAUSE# [4] om [6] drie [10] uur [13] #PAUSE# [14] 853.880005 [1] <n> [2] #PAUSE# [4] om [5] drie [10] uur [13] #PAUSE# [14] at three o’clock ...... 853.869995 [1] <n> [2] #PAUSE# [4] rond [9] vier [11] uur [13] #PAUSE# [14] around four o’clock

  11. Utilise full ASR hypotheses: Bag-of-Words • BoW representation successful in information retrieval, may work for SLU • ignores information on order, frequency, probability of recognised words • robustly characterises utterance • Isolated paths may partially or incorrectly contain the uttered words, full lattice may contain them all (plus incorrect ones) • entire ASR hypothesis lattice is encoded as binary vector that indicates for all words in the lexicon if those were hypothesised by the ASR. • 759 bits, active 7.5 avg • vector bits: …around, at, four, o’clock, on, ten, three, tonight, travel…. • binary BoW: …1, 1, 1, 1, 1, 1, 0, 0…

  12. Machine learning experiments • Task Given a user turn in its preceding dialogue context, • assign one partial interpretation tag to it • Method Interpretation components alone/composed? • Experimentally search optimal subtask combinations • Understanding levels possibly interact with eao • [TaskRelAct + Slots + ProbOrig + ProbAwar], up to 148 concatenated classes from combined components • Tool Memory-based learner (MBL) • lazy learning, examples are stored in memory • classification: class is extrapolated from k-nearest neighbour • distance from neighbours is sum of (weighted) feature differences • parameter settings optimised with search heuristics (Van den Bosch, 2004)

  13. Knowledge sources for MBL • From User: Speech signal measurements • Prosody: turnduration, pitch F0 min/max/avg/stdev, energy RMS max/avg/stdev, tempo syll/sec, turn-initial pause • ASR output: confidence scores, best string, word graph branching, Bag-of-Words • From System: Dialogue context • Prompt wording as BoW • Prompt type history: 10 system prompts represented as structured symbols • “From where to where do you want to travel on Tuesday twelve December?” >> • _, _, _, _, _, _,Q-DepArr, RepQ-DepArr, ImplVerDep;Q-Arr, ImplVerArr;Q-Day

  14. Informed baseline • Always predict class that most frequently occurs after the current prompt type • Users are highly cooperative with system prompts • Predicting ProbOrig is tough, depends on other factors

  15. Results, F=1 scores • Optimised subtask combinations • Improvement over baseline and fully combined subtasks • Class label design has significant impact on learner performance

  16. Implications of optimal task combinations • TRAs, a subclass of DAs, are equally well learnt in isolation or combined with any other subtask • DAs coordinate distribution of other dialogue components • Correct detection of TRA facilitates identifying semantic phenomena • Instead of pipeline architecture, combined prag-sem processing beneficial

  17. Implications of optimal task combinations • TRAs, a subclass of DAs, are equally well learnt in isolation or combined with any other subtask • DAs coordinate distribution of other dialogue components • Correct detection of TRA facilitates identifying semantic phenomena • Instead of pipeline architecture, combined prag-sem processing beneficial • Best to simultaneously detect ProbAwareness and TRAs • Signalling ProbAwareness has properties similar to DA cues • It is a feedback act, might be suboptimal to treat it in an isolated problem detection module

  18. Implications of optimal task combinations • TRAs, a subclass of DAs, are equally well learnt in isolation or combined with any other subtask • DAs coordinate distribution of other dialogue components • Correct detection of TRA facilitates identifying semantic phenomena • Instead of pipeline architecture, combined prag-sem processing beneficial • Best to simultaneously detect ProbAwareness and TRAs • Signalling ProbAwareness has properties similar to DA cues • It is a feedback act, might be suboptimal to treat it in an isolated problem detection module • ProbOrigin task is an outlier, no correlations with prag-sem phenomena • Predicting ProbOrigin requires other sources of info, roots in system-internal technical factors

  19. Other aspects investigated: algorithm • Applied rule learner(RIPPER):eager learning strategy • different outcome of task compositionality: optimally learns isolated subtasks • same magnitude of learning performance on all 4 components

  20. Other aspects investigated: features • Investigated contribution of feature type groups per subtask • dialogue history informative • prosody provides suboptimal cues • using the full word graph creates robustness: minor effect on PI when • Automatically filtering word graphfrom disfluencies, unfrequent words, less informative words • Simulating perfect ASR by encoding the transcribed user utterances as BoW

  21. Upper bound performance F • Encoding the full hypothesis lattice is cheap and produces classification scores close to those on perfectly recognised words • BOW treats noise well:incomplete, ungrammatical, redundant, erroneous info

  22. Extensions • Robust Language Understanding for Question Answering in Dialogues project in Tilburg aims to validate approach in Dutch QA system for medical domain • adapt extended DA tagset • use word-level prosody instead of turn-level • exploit syntactic features • incorporate attributes of windowedleft context • Advanced methods for efficient context treatment: sequence learning • find begin/end word boundaries of DA tag • find boundaries of slot values • identify and expect adjacent turn pair DA sequences

More Related