1 / 38

A Meeting Browser that Learns

A Meeting Browser that Learns. Patrick Ehlen * Matthew Purver * John Niekrasz Computational Semantics Laboratory Center for the Study of Language and Information Stanford University. The CALO Meeting Assistant. Observe human-human meetings Audio recording & speech recognition

Thomas
Download Presentation

A Meeting Browser that Learns

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Meeting Browser that Learns Patrick Ehlen * Matthew Purver * John Niekrasz Computational Semantics Laboratory Center for the Study of Language and Information Stanford University

  2. The CALO Meeting Assistant • Observe human-human meetings • Audio recording & speech recognition • Video recording & processing • Process written and typed notes, & whiteboard sketches • Produce a useful record of the interaction

  3. The CALO Meeting Assistant (cont’d) • Offload some cognitive effort from participants during meetings • Learn to do this better over time • For now, focus on identifying: • Action items people commit to during meeting • Topics discussed during meeting

  4. Human Interpretation Problems Compare to a new temp taking notes during meetings of Spacely Sprockets

  5. Human Interpretation Problems (cont’d) • Problems in recognizing and interpreting content: • Lack of lexical knowledge, specialized idioms, etc • Overhearer understanding problem (Schober & Clark, 1989)

  6. Machine Interpretation Problems • Machine interpretation: • Similar problems with lexicon • Trying to do interpretation from messy, overlapping multi-party speech transcribed from ASR, with multipleword-level hypotheses

  7. Human Interpretation Solution • Human temp can still do a good job (while understanding little) at identifying things like action items • When people commit to do things with others, they adhere to a rough dialogue pattern • Temp performs shallow discourse understanding • Then gets implicit or explicit “corrections” from meeting participants

  8. How Do We Do Shallow Understanding of Action Items? • Four types of dialogue moves:

  9. How Do We Do Shallow Understanding of Action Items? • Four types of dialogue moves: • Description of task Somebody needs to do this TZ-3146!

  10. How Do We Do Shallow Understanding of Action Items? I guess I could do it. • Four types of dialogue moves: • Description of task • Owner Somebody needs to do this TZ-3146!

  11. How Do We Do Shallow Understanding of Action Items? • Four types of dialogue moves: • Description of task • Owner • Timeframe Can you do it by tomorrow?

  12. How Do We Do Shallow Understanding of Action Items? • Four types of dialogue moves: • Description of task • Owner • Timeframe • Agreement Sure.

  13. How Do We Do Shallow Understanding of Action Items? • Four types of dialogue moves: • Description of task • Owner • Timeframe • Agreement Sounds good to me! Sure. Sweet! Excellent!

  14. Machine Interpretation • Shallow understanding for action item detection: • Use our knowledge of this exemplary pattern • Skip over deep semantic processing • Create classifiers that identify those individual moves • Posit action items • Get feedback fromparticipants after meeting

  15. Challenge to Machine Learning and UI Design • Detection challenges: • classify 4 different types of dialogue moves • want classifiers to improve over time • thus, need differential feedback on interpretations of these different types of dialogue moves • Participants should see and evaluate our results while doing something that’s valuable to them • And, from those user actions, give us the feedback we need for learning

  16. Feedback Proliferation Problem • To improve action item detection, need feedback on performance of five classifiers (4 utterance classes, plus overall “this is an action item” class) • All on noisy, human-human, multi-party ASR results • So, we could use a lot of feedback

  17. Feedback Proliferation Problem (cont’d) • Need a system to obtain feedback from users that is: • light-weight and usable • valuable to users (so they will use it) • can solicit different types of feedback in a non-intrusive, almost invisible way

  18. Feedback Proliferation Solution • Meeting Rapporteur • a type of meeting browser used after the meeting

  19. Feedback Proliferation Solution (cont’d) • Many “meeting browser” tools are developed for research, and focus on signal replay • Ours: • tool to commit actionitems from meeting to user’s to-do list • relies on implicit user supervision to gather feedback to retrain classification models

  20. Meeting Rapporteur

  21. Action Items

  22. Action Items Subclass hypotheses Top hyp is highlighted Mouse-over hyps to change them Click to edit them (confirm, reject, replace, create)

  23. Action Items Superclass hypothesis delete = neg. feedback commit = pos. feedback merge, ignore

  24. Feedback Loop • Each participant’s implicit feedback for a meeting is stored as an “overlay” to the original meeting data • Overlay is reapplied when participant views meeting data again • Same implicit feedback also retrains models • Creates a personalized representation of meeting for each participant, and personalized classification models

  25. Problem • In practice (e.g., CALO Y3 CLP data): • seem to get a lot of feedback at the superclass level (i.e., people are willing to accept or delete an action item) • but not as much (other than implicit confirmation) at subclass level (i.e., people are not as willing to change descriptions, etc)

  26. Questions • User feedback provides information along different dimensions: • Information about the time an event (like discussion of an action item) happened • Information about the text that describes aspects of the event (like the task description, owner, and timeframe)

  27. Questions (cont’d) • Which of these dimensions contribute most to improving models during retraining? • Which dimensions require more cognitive effort for the user when giving feedback? • What is the best balance between getting feedback information and not bugging the user too much? • What is the best use of initiative in such a system (user- vs. system- initiative)? • During meeting? • After meeting?

  28. Experiments • 2 evaluation experiments: • “Ideal feedback” experiment • Wizard-of-Oz experiment

  29. Ideal Feedback Experiment • Turn gold-standard human annotations of meeting data into posited “ideal” human feedback • Using that ideal feedback to retrain, determine which dimensions (time, text, initiative) contribute most to improving classifiers

  30. Ideal Feedback Experiment (cont’d) • Results: • both time and text dimensions alone improve accuracy over raw classifier • using both time and text together performs best • textual information is more useful than temporal • user initiative provides extra information not gained by system-initiative

  31. Wizard-of-Oz Experiment • Create different Meeting Assistant interfaces and feedback devices (including our Meeting Rapporteur) • See how real-world feedback data compares to the ideal feedback described above • Assess how the tools affect and change behavior during meetings

  32. owner timeframe AI task agreement • • • • • • Linearized utterances u1 u2 • • • uN Action Item Identification • Use four classifiers to identify dialogue moves associated with action items in utterances of meeting participants • Then posit the existence of an action item, along with its semantic properties (what, who, when) using those utterances

  33. Like hiring a new temp to take notes during meetings of Spaceley Sprockets • Even if we say, “Just write down the action items people agree to, and the topics,” That temp will run up against a couple problems in recognizing and interpreting content (rooted in the collaborative underpinnings of semantics): • Overhearer understanding problem (Schober & Clark, 1989) • Lack of vocabulary knowledge, etc • machine overhearer that uses noisy multi-party transcripts is even worse • We do what a human overhearer might do: shallow discourse understanding • If you were to go into a meeting you might not understand what was being talked about, but you could understand when somebody agreed to do something. • why:? Because when people make commitments to do things with others, they typically adhere to a certain kind of dialogue pattern

  34. Superclass Feedback Actions Superclass hypothesis delete = neg. feedback commit = pos. feedback (add to “to-do” list)

  35. A Great Example

  36. Some Bad Examples

  37. Feedback Loop Diagram

More Related