Audiovisual Attentive User Interfaces

Audiovisual Attentive User Interfaces Attending to the needs and actions of the user Paulina Modlitba T-121.900 Seminar on User Interfaces and Usability

What is an Attentive User Interface? (1/2) • Negotiate the timing and volume of communication with the user • Use specific input, output and turn-taking techniques to determine what task, device or person a user is attending to • User’s presence, orientation, speech activity and gaze and statistically modeling attention and interaction are detected

What is an Attentive User Interface? (2/2) • Four characteristic components • visual attention • turn-taking techniques • modeling techniques for the attention • focus and context displays and visualisation • Dürsteler (2003)

Why are they needed? • Roel Vertegaal (2003) • Multiple ubiquitous computing devices lead to a growing demands on users’ attention • Metaphor: modern traffic light system • Sensors • Statistical models of traffic volume • Peripheral displays (traffic lights) • Disruptive effect of interruptions can be avoided

Evolution of human-machine interaction 1960s-1980s: many-one 1980s-1990s: one-one 1990s-2000s: one-many 2000s-2010s: many-many

Visual attention • Eye-gaze tracking: detecting the user’s visual focus of attention • Operate by sending an infrared light source toward the user’s eye • Provides information about the context • Central I/O channel in communication • Limitations in existing hardware/software • Biological limitations

Reasons for implementing gaze tracking • Kaur et al. (2003) • The gaze location is the only reliable predictor of the locus of visual attention • Gaze can be used as a “natural” mode of input that avoids the need for learned hand-eye coordination • Gaze selection of screen objects is expected to be significantly faster than the traditional hand-eye coordination • Gaze allows for hands-free interaction

Current issues • Limited size of fovea (1-3°) • Subconscious eye movements • Eyes are not control organs (Zhai et al., 2003) • No natural analogy to current input devices, e.g. mouse • Gaze is always active (Kaur et al., 2003)

Current state • Eye-gaze control used as an additional input channel • Provides context to the action • Combined with manual input gaze tracking can improve the robustness and reliability of a system

EASE Chinese Input (1/2) • Zhai et al. (2002) • Supports pinyin type-writing • official Chinese phonetic alphabet based on Roman characters • Chinese characters are homophonic - each syllable corresponds to several Chinese characters • When the user types the pinyin of a character, a number of possible characters with the same pronunciation are displayed

EASE Chinese Input (2/2) • Normally, user chooses a character by pressing a number on the keyboard • With EASE user only has to press the spacebar as soon as he or she sees the wished-for character in the list • The system selects the character closest to the user’s current gaze location

Speech recognition (1/2) • Limited technology, despite extensive research and progress • Crucial issues • error rate of speech recognition engines and how these errors can be reduced • the effort required to port the speech technology applications between different application domains or languages (Deng & Huang, 2004)

Speech recognition (2/2) • Three directions for enhancing the technique • improve the microphone ergonomics for enhancing the signal-to-noise ratio • equipping speech recognizers with the ability to learn and to correct errors • add semantic (meaning) and pragmatic (application context) knowledge (Deng & Huang, 2004)

Multimodal interfaces • Can provide more natural human-machine interaction • Improves the robustness of the interaction by using redundant or complementary information • Today: usually gaze/speech + manual control (e.g. mouse) • Future: gaze + speech, gaze, speech

Main issue • Shumin Zhai (2003) • “We need to design unobtrusive, transparent and subtle turn-taking processes that coordinate attentive input with the user’s explicit input in order to contribute to the user’s goal without the burden of explicit dialogues.”

Manual and Gaze Input Cascaded (MAGIC) Pointing • interaction technique that utilizes eye movement to assist the control task • Zhai et al. have constructed two MAGIC pointing techniques, one liberal and one conservative (Zhai et al., 1999)

Liberal approach (1/2) • The cursor is warped to every new object that the user looks at • The user can then manually take control of the cursor near (or on) the target, or ignore it and search for the next target • New target defined by distance (e.g. 120 pixels) from the current cursor position • Issues: pro-active (cursor waits readily); overactive (gaze enough to move cursor)

Liberal approach (2/2)

Conservative approach (1/2) • Warps the cursor to a target when the manual input device has been actuated • Once moved, the cursor appears in motion towards the target • Hence, the cursor never jumps directly to a target that the user does not intend to obtain • May be slower than the liberal approach

Conservative approach (2/2)

EyeCOOK • Bradbury et al. (2003) • Multimodal attentive cookbook that helps unaccustomed computer users cook a meal • User interacts with the eyeCOOK system by using eye-gaze and speech commands • System responds visually and verbally • The system replaces the object of the user’s gaze with the word “this” • If the user’s gaze can not be tracked by the eyeCOOK system the user has to specify the target verbally

EyeCOOK in Page Display Mode

GAZE-2 • Vertegaal et al, 2003 • A new group video conferencing system that uses gaze-controlled cameras to convey eye-contact • Consists of a video tunnel that makes it possible to place cameras behind the participant images on the screen • system automatically directs the video cameras in this tunnel using a gaze tracker by selecting the camera closest to the user’s current focus of attention (gaze location)

GAZE-2 system structure

3D rendering • The 2D video images of the participants are displayed in a 3D virtual meeting room and are automatically rotated to face the participant each user is looking at. • In the picture bellow, everyone is looking at the left person, who’s image is broadcasted in a higher resolution.

Turn-taking in video conferencing • Misunderstandings cause interruptions • Eye contact plays an important role in turn-taking (Vertegaal, et al., 2003)

References • Vertegaal, et al., 2003 • Bradbury et al. (2003) • Zhai et al., 1999 • Dürsteler (2003) • Vertegaal (2003) • Kaur et al. (2003) • Shumin Zhai (2003) • Zhai et al. (2002) • (Deng & Huang, 2004)

Things missing • Are attentive user interfaces better in following the user inorder to "capture his/her context" to make proactive actions for him/her,or are they better used as input devices (an approach you take). • The distinction between explicit and implicit input, as presented byHorvitz (you can find a link from the seminar homepage), is thus importanthere and could give you benefit. • Please take some real world examples of prototypes and real situationsto your presentation. This makes grasping the idea better and arguingmore concrete. You might consider presenting other application ideas aswell as the ones already in the paper. • I think you would benefit from considering in more detail, for eachparticular application, why attention and preferences are tracked andhow they might be combined, effectively, to minimize disruption and makeinteraction more fluent. Binding the presentation more tightly to the"let's make interruptions go away" theme of the seminar is important here. • Consequently, the presentation, it would be nice to see your analysis of"how things were" and "how things are" (now with AUIs).

Oulasvirta • Attention • Working memory • Long-time memory • Task resumptions • Control • Trust • Stress • Social interaction

Audiovisual Attentive User Interfaces

Audiovisual Attentive User Interfaces

Presentation Transcript

User Interfaces

Graphical User Interfaces

User Interfaces 4

Evaluating User Interfaces

User Interfaces

Graphic User Interfaces

(User) Interfaces

Graphical User Interfaces

User Interfaces

User Interfaces

User Interfaces

Speech User Interfaces

Designing user interfaces

Creating User Interfaces

Creating User Interfaces

User Interfaces 4

User Interfaces

Creating User Interfaces

Creating User Interfaces