1 / 44

Speaking while monitoring addressees for understanding

Speaking while monitoring addressees for understanding. Seminar „Gaze as function of instructions - and vice versa “. Herbert H. Clark and Meredyth A. Krych. Torsten Jachmann 16.12.2013. Research Question. Speaking and listening in dialog Unilateral

euclid
Download Presentation

Speaking while monitoring addressees for understanding

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Speaking while monitoring addressees for understanding Seminar „Gaze asfunctionofinstructions - andviceversa“ Herbert H. Clark andMeredyth A. Krych TorstenJachmann 16.12.2013

  2. Research Question • Speaking and listening in dialog • Unilateral • Speakers and listeners act autonomous • No interaction • Bilateral • Speakers and listeners monitor their respective partner • Joint activity • What do speakers monitor? • How do they use that information?

  3. Grounding • Level 1 • Attend to vocalization • Level 2 • Identify words, phrases and sentences • Level 3 • Understand the meaning • Level 4 • Consider answering

  4. Grounding A: Where you there when they erected the new signs? B: Th… which new signs? (Level 3) A: Little notice boards, indicating where you had to go for everything B: No.  Bilateral account

  5. Monitoring • Voices • Attendance to partners utterances • Faces • Gazeand facial expressions as indicator for understanding • Workspaces • Region in front of the body • Manual gestures (but also games, etc.)

  6. Monitoring • Bodies • Head and torso movement as indicator • Shared Scenes • Scenery beyond workspace • Signals vs. Symptoms • Signals are constructed to get meaning across • Symptoms are not intentionally created

  7. Least joint effort • Opportunistic • Selection of the available methods that take the least effort to produce • “Tailored” • Overhearers (not monitored by speaker) may misunderstand utterances

  8. Method • Pairs of directors and builders • 76 students (34 male / 42 female) • Instructions to build 10 simple Lego Models • 2 x 2 design(interactive) • 28 pairs • Additional non-interactive condition • 10 pairs • Video and audio analyses

  9. Interactive • Mixture model • Workspace (between subject) • Visible • Invisible • Faces (within subject) • Visible • Invisible • No restrictions in time and talk

  10. Non-interactive • Only one condition • Director records instructions • No time or talk constrains • Prototype can be examined as long as wanted before recording • Builders listen to instructions • No constrains on actions • Start, stop, rewind

  11. Results • Efficiency • Turns • Gestures and grounding • Deictic expressions • Gestures by addressees • Cross-timing of actions • Timing strategies • Visual monitoring

  12. Efficiency • Visibility of workspace improves efficiency

  13. Efficiency Non-interactive • Time needed to build much longer (245s “n-i” vs. 183s “i”) • Strong drop in accuracy • Inadequate instructions

  14. Turns • Fewer SPOKEN turns of builder when workspace is visible

  15. Deictic expressions • Mainly unusable when workspace hidden • Joint attention needed • only referring to before mentioned situation

  16. Gestures by addressees • Mostly accompanied by deictic utterances (if any) • Explicit verdict usually only on such utterances (otherwise continuing)

  17. Cross-timing • Gestural signals • Reflect understanding at that moment

  18. Cross-timing • Overlapping signals • Usually not in spoken dialog • Start with “sufficient information”

  19. Cross-timing • Projecting • Prediction of following actions/instructions

  20. Cross-timing • Initiation time • Waiting for partner to be able to attend the following utterance

  21. Cross-timing • Time uptake • Responses have to be timed exactly to the action and situation

  22. Timing strategies • Self-interruption • Dealing with evidence from the addressee • Usually not continued

  23. Timing strategies • Collaborative references • Deictic references rely on addressees actions

  24. Visual monitoring • Mainly used when director reaches a problem • Eye gaze as support

  25. Conclusion • Grounding is fundamental • Visible workspace enhances grounding speed • In task-oriented dialogs faces are not important • Compensation possible (only if any monitoring is available)

  26. Conclusion • Updating common ground • Increments are determined jointly • Much evidence for bilateral account • Addressees provide statement about current understanding • Speakers monitor to update and change utterances

  27. Conclusion • Opportunistic process • Offering options • Self-interruptions • Waiting • Instant revision • Multi-modal process • Speech and gestures are combined if possible • Speech alone takes more time

  28. Remarks • Gaze only important for certain types of tasks • Measurement of time maybe outdated (“old” study) • No contradicting studies (To some extend commonsense)

  29. Gaze and Turn-TakingBehavior in CasualConversation Interactions KristiinaJokinen, HirohisaFurukawa, MasafumiNishidaandSeiichi Yamamoto

  30. Differences • Three-party dialogue • No instructional task • Stronger focus on eye gaze

  31. Research Question • How well can eye gaze help in predicting turn taking? • What is the role of eye gaze when the speaker holds the turn? • Is the role of eye gaze as important in three-party dialogs as in two-party dialogue?

  32. Hypothesis • In group discussions, eye gaze is important in turn to management (especially in turn holding cases) • The speaker is more influential than the other partners in coordinating interactions (selects the next speaker)

  33. Method • Three-person conversational eye gaze corpus • Natural conversations • Balanced familiarity (50% familiar; 50% unfamiliar) • Balanced gender (male-only; female-only; mixed)

  34. Method • 28 conversations among Japanese students in their early 20’s with three participants each • Each conversation about 10 minutes • Eye gaze recorded for one participant

  35. Method • Eye tracker fixed on table to remain naturalness

  36. Method

  37. Used data • Estimated at the last 300ms of an utterance if followed by a 500ms pause

  38. Used data • Dialog acts • Speech features • Values of F0, etc. • Eye gaze

  39. Results

  40. Conclusion • Speaker signals whether he intends to give the turn or hold it by using eye gaze • fixating listener vs. focusing attention somewhere • Eye gaze in multi-participant conversation as important as in two-participant conversations

  41. Conclusion • Eye gaze is used to select next speaker (seems to be correct) • Maybe Japanese data interferes with value of speech data • Comparison Study? • Listeners focus on speaker not vice versa

  42. Remarks • Vague information and data presentation • Although various data exists, interaction of factors is not presented • Some conclusions rely on the before mentioned point • Setup only takes one participant in consideration • Much of the data was unused • Lack in quality and way of creation

  43. Remarks • Study is based on data for another study • Setup is not optimal • Realistic design • Yet, contains biasing flaws (situation of the participants, only one eye tracker)

  44. Comparison • Clark and Krych present interesting ideas but eye gaze is only rarely handled • How could this be altered? • Jokinen et al. focus on eye gaze in a (more or less) natural situation but lack in scientific results and setup • What points and ideas of this setup could be beneficial?

More Related