1 / 19

Machine Learning and the Information State Update Approach to Dialog Management Oliver Lemon and

Machine Learning and the Information State Update Approach to Dialog Management Oliver Lemon and James Henderson University of Edinburgh. TALK project. Overall Coordination:. Scientific Coordination:. a TALK research theme: Adaptivity and learning.

oliana
Download Presentation

Machine Learning and the Information State Update Approach to Dialog Management Oliver Lemon and

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Machine Learning and the Information State Update Approach to Dialog Management Oliver Lemon and James Henderson University of Edinburgh

  2. TALK project Overall Coordination: Scientific Coordination:

  3. a TALK research theme:Adaptivity and learning • Reinforcement Learning for dialog management • Complex dialog states from the Information State Update (ISU) approach

  4. Initial objectives • Which aspects of dialog management are amenable to learning and what reward functions are needed for these aspects? • What representation of the dialog state best serves this learning? • What Reinforcement Learning methods are tractable with large scale dialog systems?

  5. Information State Update approach with policy learning

  6. Example information state output: < hello, welcome to the edinburgh informatics flight booking system. what is your departure city? >lastspeaker: userrecogninput: < edinburgh >input: < edinburgh >lastmoves: < [edinburgh],u, [([greet],s),([ask_user_start_city],s)] >filledslotsvalues: < [([ask_user_start_city],s)],[[edinburgh]] >turn: systemoplansteps: ( [ask_user_destination_city] , [release_turn] )nextmoves: < [ask_user_destination_city],s > int: < [release_turn] > . . .

  7. Baseline system • DIPPER [Bos, Klein, Lemon and Oka 2003] for state specification and dialog management • ATK [Young 2004] for speech recognition • Festival [Taylor, Black and Caley 1998] for speech synthesis • O-Plan [Currie and Tate 1991] for dialog planning and content planning and structure

  8. Example DIPPER update rule urule(deliberation, [;;;CONDITIONS: is^lastspeaker=user, empty(is^int) ], [;;;EFFECTS:;;;get the next step in the current plan solve2(next_step(Step)), ;;;push it on the intentions stack of the IS push(is^int,Step) ] ).

  9. Initial data collection(from Cambridge) • Tourist information, with route descriptions • Human-human data • On-line transcription of user input with speech recognition error simulation [Williams and Young 2004]

  10. Example dialog data hello how can i helpAH I'M LOOKING FOR A GOOD RESTAURANT IN THE TOWNright there's a number of restaurants in town %um what sort of food are you looking to -- to eatI'M LOOKING FOR A RESTAURANT NEAR THE CINEMAokay there's a restaurant very near the cinema it's a -- a relaxed chinese restaurant called noble nestNOBLE NEST AND AH WHERE IS IT EXACTLYit's on the corner of north road and fountain roadAND AH WHAT IS THE PRICE OF FOOD THERE%er the food there is %er fourteen pounds per personAH IT'S A CHINESE RESTAURANT RIGHTthat's right yesAND HOW TO REACH NOBLE NEST FROM HOTEL ROYALright okay from the hotel royal it would probably be best to catch the bus outside the hotel royal %um which will take you -- probably catch the bus to art square and then walk %um from art square that being the closest bus stop . . .

  11. Reinforcement Learning • Dialog is modeled as a Markov Decision Process (MDP) • Choose the action a which maximizes the expected reward Q(s,a) given the state s Q(s,a) = R(s,a) +  P(s |s,a) max(Q(s,a)) • Estimate P(s |s,a) from users’ behavior • Estimate Q(s,a) iteratively from sample dialogs s a

  12. Previous applications of RL to dialog management • [Levin, Pieraccini and Eckert 2000], [Singh, Litman, Kearns and Walker 2002] • Choose between a small number of actions • Initiative: system / user / mixed • Confirmation: explicit / none • Have a small number of possible states • Use RL methods which would not scale up to large action sets and large state spaces

  13. Challenges • Tractable Reinforcement Learning with complex actions and large numbers of state features • Learning generic strategies which can be applied to many domains • Discovering useful features of the dialog history • Including partially observable features of the state (using POMDP models)

  14. Example of Combining ML and ISU systems • Utterance classification for Spoken Dialogue Systems [Gabsdil and Lemon, ACL 2004] • Classifying n-best recognition hypotheses: reject, accept, ignore • Based on context features, e.g. dialogue move history • Improves accuracy by 25%

  15. Summary • Adaptation and learning is an important theme in the TALK project • TALK uses the Information State Update approach, with large state representations • Reinforcement Learning will be used to learn dialog management • Challenges in tractable learning with these large complex representations

  16. Machine Learning and the Information State Update Approach to Dialog Management Oliver Lemon and James Henderson University of Edinburgh

  17. a TALK research theme: Adaptivity and learning • Representations most suited to adaptivity and learning • Parts of information state most amenable to optimisation and reward functions needed • Use of Partially Observable Markov Decision Processes and Bayesian Networks providing tractable learning for ISU-based systems

More Related