1 / 16

End-to-End Joint Learning of Natural Language Understanding and Dialogue Manager

End-to-End Joint Learning of Natural Language Understanding and Dialogue Manager. Paul Crook Xiujun Li Jianfeng Gao Li Deng. Xuesong Yang Yun-Nung (Vivian) Chen Dilek Hakkani-Tür. ICASSP 2017 March 9th. xyang45@illinois.edu y.v.chen@ieee.org.

melora
Download Presentation

End-to-End Joint Learning of Natural Language Understanding and Dialogue Manager

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. End-to-End Joint Learning of Natural LanguageUnderstandingandDialogueManager Paul Crook Xiujun Li Jianfeng Gao Li Deng Xuesong Yang Yun-Nung (Vivian) Chen DilekHakkani-Tür ICASSP 2017 March 9th xyang45@illinois.edu y.v.chen@ieee.org HLT-L2: Spoken Language Understanding I

  2. What is a Dialogue System? Apple Siri (2011) • Category: • Chit Chat • Task-Oriented Microsoft Cortana(2014) Google Home (2016) Facebook M& Bot (2015) • Adialoguesystemis acomputeragentthatinteractswithhumanvianaturallanguages. [1 ] Zue, Victor W., and James R. Glass. “Conversational interfaces: Advances and challenges.” Proceedings of theIEEE 88.8 (2000): 1166-1180. Amazon Alexa/Echo(2014)

  3. Pipelined Task-Oriented Dialogue System Hypothesis Any action movies recommended this weekend? Speech Signal • Natural Language Understanding (NLU) • User Intent Detection • Slot Filling Automatic Speech Recognition Text Input Any action movies recommended this weekend? Semantic Frame request_movie (genre=action, date=this weekend) Text-To-Speech • Dialogue Management (DM) • Dialogue State Tracking (DST) • Dialogue Policy Decision Natural Language Generation (NLG) Text response Which theater do you prefer? System Action/Policy request_location • Motivation: The pipelined system (NLU  DM) results in error propagation issues.

  4. Proposed Approach • End-to-endmodel • MitigatetheeffectsofnoisyoutputfromNLU • RefineNLUbysupervisedsignalsfromDM • Multi-taskjointlylearning • NLU - Userintentclassification • NLU - Userslottagging • DM - Systemactionprediction • Contextualunderstanding • Accesstotheuser history • Monitoruserbehaviorstatesoverturns

  5. Human-Human Dialogue Interaction Hi, how may I help you? Are there anycheap rate hotels to put my bags? Do you want to have a backpack type of hotel? Yes. Just gonnaleave our things there and stay out the whole day. So you don’t mind if it is not roomy, right? Yes. Guide Agent Tourist User Okay. These hotels are available for you: … Ok, thank you, bye! Thanks, goodbye. Idea: predictingthe next systemactiongiven the current userutterance together with the aggregated observations

  6. Natural Language Understanding(NLU) Utterance:BOS % um how much is a taxi cab there ? EOS SlotTags:O O O B-det_PRICE I-det_PRICE O O B-trsp_TYPE I-trsp_TYPE B-area_CITY O O UserIntents:QST_HOW_MUCH; QST_INFO System Actions:RES_EXPLAIN; RES_INFO; FOL_EXPLAIN; FOL_HOW_MUCH; FOL_INFO Task 1: SlotTagging Task 2: Multi-LabelUserIntentPrediction Shared weights

  7. NLU+DM 1: Pipelined BLSTMs Utterance:BOS % um how much is a taxi cab there ? EOS SlotTags:O O O B-det_PRICE I-det_PRICE O O B-trsp_TYPE I-trsp_TYPE B-area_CITY O O UserIntents:QST_HOW_MUCH; QST_INFO System Actions:RES_EXPLAIN; RES_INFO; FOL_EXPLAIN; FOL_HOW_MUCH; FOL_INFO Task 1+2:Natural Language Understanding Task 3:Multi-Label System Action Prediction single user turn current user turn w/ contextual history

  8. NLU+DM 2: End-to-EndModel(JointModel) Utterance:BOS % um how much is a taxi cab there ? EOS SlotTags:O O O B-det_PRICE I-det_PRICE O O B-trsp_TYPE I-trsp_TYPE B-area_CITY O O UserIntents:QST_HOW_MUCH; QST_INFO System Actions:RES_EXPLAIN; RES_INFO; FOL_EXPLAIN; FOL_HOW_MUCH; FOL_INFO Task 1+2+3:End-to-End Joint NLU+DM DM supervised learning with three tasks DM output signal refines NLU for better robustness

  9. Data • Dialogue State Tracking Challenge 4 • Human-human dialogues: 21-hour dialoguesessionsontouristicinformationcollectedvia Skypebetweentourguidesandtourists User Intent = Speech Act + Attributes System Action = Speech Act + Attributes

  10. DM Result –System Action Prediction (SAP) • Metric: frame-level accuracy (FrmAcc) Human-human conversations are complicated, so predicting system actions for DM is difficult Pipeline-BLSTM and JointModel outperform the baseline JointModelimprovesPipeline-BLSTMabout 10%accuracy,indicating the importance of mitigatingdownsideofpipeline

  11. DM Result –System Action Prediction (SAP) • Metric: frame-level accuracy (FrmAcc) Fully correct NLU output Oracle modelsshow the upper-bound of the SAP performance, since it transfers the errors from NLU to SAP Contextual user turns make significant contribution to DM performance JointModel achieves the best DM performance (FrmAcc) with richer latent representations

  12. NLU Result – Slot Filling & Intent Prediction • Metrics: frame-level accuracy (FrmAcc) CRF+SVMs baseline maintains strong NLU performance with 33.1% Pipeline-BLSTM and JointModel outperformed the baseline Extra supervised DM signal helps refine the NLU by back-propagating the associated errors DM signal (system action prediction) helps more on user intent prediction than slot filling, and NLU is significantly improved

  13. Conclusion • First propose an end-to-end deep hierarchical model for joint NLU and DM with limitedcontextualdialoguememory • Leverage multi-tasklearningusingthreesupervisedsignals • NLU: Userintentclassification • NLU: Slottagging • DM: Systemactionprediction • Outperform the state-of-the-art pipelined NLU and DM models • Better DM due to the contextual dialogue memory • Robust NLU fine-tuned by supervised signal from DM

  14. Thanks for Your Attention!  Code Available at https://github.com/XuesongYang/end2end_dialog

  15. Appendix - BaselineModel Predictingsystemactionsatthenextturnasresponsestothecurrent userbehaviorsbypipeliningNLUandSAPtogether NLU:CRFforslottagging,and One-Vs-AllSVMsforintentclassification SAP:One-Vs-AllSVMs [1]Raymond, Christian, and Giuseppe Riccardi. “Generative and discriminative algorithms for spoken language understanding." In INTERSPEECH, pp. 1605-1608. 2007.

  16. Appendix: Configuration • Optimizer:a mini-batch stochastic gradient descent method Adam • Contextualhistory:fiveuserturns • Dimensionofwordembedding:512 • Dropoutratio:0.5 • Noearlystopping,butuse300trainingepochs • Best models for three tasks are selected individually under different metrics • Token-level micro-average F1 score is used for slot filling • frame-level accuracy (it counts only when the whole frame parse is correct) is used for both user intent prediction and system action prediction • the decision thresholds are tuned on dev set

More Related