extracting information from spoken user input a machine learning approach
Download
Skip this Video
Download Presentation
Extracting Information from Spoken User Input A Machine Learning Approach

Loading in 2 Seconds...

play fullscreen
1 / 22

Extracting Information from Spoken User Input A Machine Learning Approach - PowerPoint PPT Presentation


  • 228 Views
  • Uploaded on

Extracting Information from Spoken User Input A Machine Learning Approach. Piroska Lendvai Tilburg University, Netherlands. Outline. The community: ML as a tool to detect DAs, semantic content, communication problems in dialogue Tilburg University, Induction of Linguistic Knowledge group:

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Extracting Information from Spoken User Input A Machine Learning Approach' - Sharon_Dale


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
extracting information from spoken user input a machine learning approach

Extracting Information from Spoken User InputA Machine Learning Approach

Piroska Lendvai

Tilburg University, Netherlands

outline
Outline
  • The community: ML as a tool to detect DAs, semantic content, communication problems in dialogue
  • Tilburg University, Induction of Linguistic Knowledge group:
    • memory-based learning software package TiMBL
    • joint work with Antal van den Bosch & Emiel Krahmer 2001-2004
  • TiMBL applied to Dutch human-machine dialogue data
    • Extract pragmatic and semantic components from spoken user input
      • Drawing on simple, potentially erroneous info from SDS
      • Components better classified individually or combined?
slide3

speech recogniser

I would like to travel from Amsterdam to Tilburg next Tuesday

language understanding

dialogue

manager

database

answer generation

From where to where would you like to travel on Tuesday twelve December?

speech synthesis

ovis corpus example
OVIS corpus example
  • S1: Good evening. From which station to which station do you want to travel?
  • U1: I need to go from Amsterdam to Tilburg on Tuesday next week.
  • S2: From where to where do you want to travel on Tuesday twelve December?
  • U2: From Amsterdam to Tilburg.
  • S3: At what time do you want to travel from Amsterdam to Tilburg?
  • U3: Around quarter past eleven in the evening.
  • (…)
  • S5: I have found the following connections: (…). Do you want me to repeat the connection?
  • U5: Please do.
the ovis system
The OVIS system
  • Developed in 1995-2000, Dutch national project, train travel information
  • Slots to fill: DepartStation, ArriveStation, Dep/ArrivDay, Dep/ArrivTime
  • Dialogue structure: possibility to provide unsolicited info
  • System-initiative
  • System always verifies received info
    • Explicitly (“So you want to leave on Thursday.”) or
    • Implicitly (“At what time do you want to leave on Thursday?”)
  • 80 test users, noisy real-data corpus from different system versions
  • 441 full dialogues: 3738 turn pairs of system prompt & user reply
  • 43% of user turns inaccurately grounded by system
  • 8-26% WER
goal partial interpretation of user input
Goal:Partial interpretation of user input
  • Drawing on attributes available from system’s modules, extract informationfrom user’s input turn
    • task-related dialogue act (TRA) supercategories for info-seeking dialogues

(8 application-motivated classes)

    • query slot types being filled (30)
    • current user turn originates future communication problems? (binary)
    • in current turn, user is already aware of communication problems? (binary)
  • Facilitate full understanding
  • ASR to have more confidence in accepting/rejecting recognition hypothesis

recognition of ‘yes’ and ‘no’ is highly erroneous; TRAs: Affirm, Negat

  • DM to launch error recovery
partial interpretation components
Partial interpretation components
  • „From Amsterdam to Tilburg on Tuesday next week.”
  • „Not this Tuesday but next Tuesday.”
  • Task-related dialogue act:Slot-filling / Negation
    • (others:Affirmative, AcceptSysError, NonStd)
  • Slot(s) being filled by user: DepartStat, ArriveStat, ArriveDay / ArriveDay
  • (always co-occur with Slot-filling TRA)
  • Problem origin turn? Yes / No
  • predicting miscommunication
  • Problem awareness turn? No / Yes
  • detecting miscommunication
annotated user turns
Annotated user turns
  • S1: Good evening. From which station to which station do you want to travel?
  • U1: I need to go from Amsterdam to Tilburg on Tuesday next week.
  • SlotFill , DepSt_ArrSt_ArrDay ,Prob ,Ok
  • S2: From where to where do you want to travel on Tuesday twelve December?
  • U2: From Amsterdam to Tilburg.
  • SlotFill ,DepSt_ArrSt,Ok ,Prob
  • S3: At what time do you want to travel from Amsterdam to Tilburg?
  • U3: Around quarter past eleven in the evening.
  • SlotFill ,ArrTimeofDay_ArrHour,Ok ,Ok
  • S5: I have found the following connections: (…). Do you want me to repeat the connection?
  • U5: Please do.
  • Affirm,void,Ok, Ok
speech recognition
Speech recognition
  • utterancespeech recogniser (ASR) word graph  lattice paths
  • ASR strategy: pick best path (confidence, language model)
  • There can be distortions and deletions in a path
    • ‘from amsterdam to tilburg 傽tuesday’
    • ‘from amsterdam to tilburg nexttuesday’
    • ‘toamsterdam to tilburg nexttuesday’
  • Wrong pick = possibly severe information loss

from amsterdam /2473

next /120.5

tuesday /297

to tilburg /158

to amsterdam /549

tuesday/3481.5

informativity of asr output
Informativity of ASR output
  • ASR hypothesis lattice paths with associated confidences
  • User input: ‘rond vier uur’; around four o’clock
  • 856.109985 [1] <n> [2] #PAUSE# [4] op [7] drie [12] uur [13] #PAUSE# [14] on three o’clock

855.930054 [1] <n> [2] #PAUSE# [4] om [5] tien [12] uur [13] #PAUSE# [14]

855.430054 [1] <n> [2] #PAUSE# [4] om [5] drie [12] uur [13] #PAUSE# [14]

855.330017 [1] <n> [2] #PAUSE# [3] half [7] drie [12] uur [13] #PAUSE# [14]

855.140015 [1] <n> [2] #PAUSE# [3] half [8] #PAUSE# [9] vier [10] uur [13] #PAUSE# half four o’clock

855.109985 [1] <n> [2] #PAUSE# [3] half [8] tien [10] uur [13] #PAUSE# [14]

854.890015 [1] <n> [2] #PAUSE# [3] half [8] #PAUSE# [9] vier [11] uur [13] #PAUSE# half four o’clock

854.880005 [1] <n> [2] #PAUSE# [4] om [5] dertien [10] uur [13] #PAUSE# [14]

854.470032 [1] <n> [2] #PAUSE# [4] om [6] drie [10] uur [13] #PAUSE# [14]

853.880005 [1] <n> [2] #PAUSE# [4] om [5] drie [10] uur [13] #PAUSE# [14] at three o’clock

......

853.869995 [1] <n> [2] #PAUSE# [4] rond [9] vier [11] uur [13] #PAUSE# [14] around four o’clock

utilise full asr hypotheses bag of words
Utilise full ASR hypotheses: Bag-of-Words
  • BoW representation successful in information retrieval, may work for SLU
  • ignores information on order, frequency, probability of recognised words
  • robustly characterises utterance
  • Isolated paths may partially or incorrectly contain the uttered words, full lattice may contain them all (plus incorrect ones)
  • entire ASR hypothesis lattice is encoded as binary vector that indicates for all words in the lexicon if those were hypothesised by the ASR.
  • 759 bits, active 7.5 avg
  • vector bits: …around, at, four, o’clock, on, ten, three, tonight, travel….
  • binary BoW: …1, 1, 1, 1, 1, 1, 0, 0…
machine learning experiments
Machine learning experiments
  • Task Given a user turn in its preceding dialogue context,
  • assign one partial interpretation tag to it
  • Method Interpretation components alone/composed?
  • Experimentally search optimal subtask combinations
  • Understanding levels possibly interact with eao
  • [TaskRelAct + Slots + ProbOrig + ProbAwar], up to 148 concatenated classes from combined components
  • Tool Memory-based learner (MBL)
  • lazy learning, examples are stored in memory
  • classification: class is extrapolated from k-nearest neighbour
  • distance from neighbours is sum of (weighted) feature differences
  • parameter settings optimised with search heuristics (Van den Bosch, 2004)
knowledge sources for mbl
Knowledge sources for MBL
  • From User: Speech signal measurements
  • Prosody: turnduration, pitch F0 min/max/avg/stdev, energy RMS max/avg/stdev, tempo syll/sec, turn-initial pause
  • ASR output: confidence scores, best string, word graph branching, Bag-of-Words
  • From System: Dialogue context
  • Prompt wording as BoW
  • Prompt type history: 10 system prompts represented as structured symbols
    • “From where to where do you want to travel on Tuesday twelve December?” >>
    • _, _, _, _, _, _,Q-DepArr, RepQ-DepArr, ImplVerDep;Q-Arr, ImplVerArr;Q-Day
informed baseline
Informed baseline
  • Always predict class that most frequently occurs after the current prompt type
  • Users are highly cooperative with system prompts
  • Predicting ProbOrig is tough, depends on other factors
results f 1 scores
Results, F=1 scores
  • Optimised subtask combinations
  • Improvement over baseline and fully combined subtasks
  • Class label design has significant impact on learner performance
implications of optimal task combinations
Implications of optimal task combinations
  • TRAs, a subclass of DAs, are equally well learnt in isolation or combined with any other subtask
    • DAs coordinate distribution of other dialogue components
    • Correct detection of TRA facilitates identifying semantic phenomena
    • Instead of pipeline architecture, combined prag-sem processing beneficial
implications of optimal task combinations17
Implications of optimal task combinations
  • TRAs, a subclass of DAs, are equally well learnt in isolation or combined with any other subtask
    • DAs coordinate distribution of other dialogue components
    • Correct detection of TRA facilitates identifying semantic phenomena
    • Instead of pipeline architecture, combined prag-sem processing beneficial
  • Best to simultaneously detect ProbAwareness and TRAs
    • Signalling ProbAwareness has properties similar to DA cues
    • It is a feedback act, might be suboptimal to treat it in an isolated problem detection module
implications of optimal task combinations18
Implications of optimal task combinations
  • TRAs, a subclass of DAs, are equally well learnt in isolation or combined with any other subtask
    • DAs coordinate distribution of other dialogue components
    • Correct detection of TRA facilitates identifying semantic phenomena
    • Instead of pipeline architecture, combined prag-sem processing beneficial
  • Best to simultaneously detect ProbAwareness and TRAs
    • Signalling ProbAwareness has properties similar to DA cues
    • It is a feedback act, might be suboptimal to treat it in an isolated problem detection module
  • ProbOrigin task is an outlier, no correlations with prag-sem phenomena
    • Predicting ProbOrigin requires other sources of info, roots in system-internal technical factors
other aspects investigated algorithm
Other aspects investigated: algorithm
  • Applied rule learner(RIPPER):eager learning strategy
    • different outcome of task compositionality: optimally learns isolated subtasks
    • same magnitude of learning performance on all 4 components
other aspects investigated features
Other aspects investigated: features
  • Investigated contribution of feature type groups per subtask
    • dialogue history informative
    • prosody provides suboptimal cues
    • using the full word graph creates robustness:

minor effect on PI when

      • Automatically filtering word graphfrom disfluencies, unfrequent words, less informative words
      • Simulating perfect ASR by encoding the transcribed user utterances as BoW
upper bound performance f
Upper bound performance F
  • Encoding the full hypothesis lattice is cheap and produces classification scores close to those on perfectly recognised words
  • BOW treats noise well:incomplete, ungrammatical, redundant, erroneous info
extensions
Extensions
  • Robust Language Understanding for Question Answering in Dialogues project in Tilburg aims to validate approach in Dutch QA system for medical domain
  • adapt extended DA tagset
  • use word-level prosody instead of turn-level
  • exploit syntactic features
  • incorporate attributes of windowedleft context
  • Advanced methods for efficient context treatment: sequence learning
    • find begin/end word boundaries of DA tag
    • find boundaries of slot values
    • identify and expect adjacent turn pair DA sequences
ad