Dialog design
This presentation is the property of its rightful owner.
Sponsored Links
1 / 58

Dialog Design PowerPoint PPT Presentation


  • 90 Views
  • Uploaded on
  • Presentation posted in: General

Dialog Design. Speech/Natural Language Pen & Gesture. Dialog Styles. 1. Command languages 2. WIMP - Window, Icon, Menu, Pointer 3. Direct manipulation 4. Speech/Natural language 5. Gesture, pen. Natural input. Universal design Take advantage of familiarity, existing knowledge

Download Presentation

Dialog Design

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Dialog design

Dialog Design

Speech/Natural Language

Pen & Gesture


Dialog styles

Dialog Styles

1. Command languages

2. WIMP - Window, Icon, Menu, Pointer

3. Direct manipulation

4. Speech/Natural language

5. Gesture, pen


Natural input

Natural input

  • Universal design

  • Take advantage of familiarity, existing knowledge

  • Alternative input & output

  • Multi-modal interfaces

  • Getting “off the desktop”


Agenda

Agenda

  • Speech

  • Natural Language

  • Other audio

  • PDAs & Pen input styles

  • Gesture


When to use speech

When to Use Speech

  • Hands busy

  • Mobility required

  • Eyes occupied

  • Conditions preclude use of keyboard

  • Visual impairment

  • Physical limitation


Waveform spectrogram

Waveform & Spectrogram

  • Speech does not equal written language


Parsing sentences

Parsing Sentences

"I told him to go back where he came from, but he wouldn't listen."


Speech input

Speech Input

  • Speaker recognition

  • Speech recognition

  • Natural language understanding


Speaker recognition

Speaker Recognition

  • Tell which person it is (voice print)

  • Could also be important for monitoring meetings, determining speaker


Speech recognition

Speech Recognition

  • Primarily identifying words

  • Improving all the time

  • Commercial systems:

    • IBM ViaVoice, Dragon Dictate, ...


Recognition dimensions

Recognition Dimensions

  • Speaker dependent/independent

    • Parametric patterns are sensitive to speaker

    • With training (dependent) can get better

  • Vocabulary

    • Some have 50,000+ words

  • Isolated word vs. continuous speech

    • Continuous: where words stop & begin

    • Typically a pattern match, no context used

Did youvs.

Didja


Recognition systems

Recognition Systems

  • Typical system has 5 components:

    • Speech capture device - Has analog -> digital converter

    • Digital Signal Processor - Gets word boundaries, scales, filters, cuts out extra stuff

    • Preprocessed signal storage - Processed speech buffered for recognition algorithm

    • Reference speech patterns - Stored templates or generative speech models for comparisons

    • Pattern matching algorithm - Goodness of fit from templates/model to user’s speech


Errors

Errors

  • Systems make four types of errors:

    • Substitution - one for another

    • Rejection - detected, but not recognized

    • Insertion - added

    • Deletion - not detected

  • Problems with recovery


Recognition example

My flight experience uh,

I started off as a private

pilot and then I spent 20

years flying as a navigator

on C130s in the air force

and I currently fly as a

first officer on the SAAB

340 turboprop.

my flight experience a and

I started off as a private

pilot and I spent 20

years flying is a navigator

on C-130s in the Air Force

and I currently fly as a

first officer on the sound

340 turboprop.

Recognition Example:


And another example

Here here's one of the

thoughts I'd throw in on

that. I'd agree with

Howard that the

combination of checklists

and flows is the way way

to manage things best.

hair and here's one of the

outside troodon and

an eye to agree with

power that the

domination checklists

and flows is the way

that managed things best

And another example


Natural language understanding

Natural Language Understanding

  • Putting meaning to the words

  • Input might be spoken or typed

  • Holy grail of Artificial Intelligence problems


Eliza a faked understanding

Eliza - a “Faked” Understanding

  • Person : Men are all alike.ELIZA : In what way?Person : They're always bugging us about something or other.ELIZA : Can you think of a specific example?Person : Well, my boyfriend made me come here.ELIZA : Your boyfriend made you come here?Person : He says I'm depressed much of the time.ELIZA : I am sorry to hear that you are depressed.Person : It's true. I am unhappy.ELIZA : Do you think coming here will help you not to be unhappy?Person : I need some help, that much seems certain.ELIZA : What would it mean to you if you got some help?Person : Perhaps I could learn to get along with my mother.ELIZA : Tell me more about your family

  • http://www-ai.ijs.si/eliza/eliza.html

  • Weizenbaum, J., "ELIZA -- A computer program for the study of natural language communication between man and machine", Communications of the ACM 9(1):36-45, 1966


Natural language domains

Natural Language Domains

  • Conceptual: total set of objects/actions

    • Knows about managers and salaries

  • Functional: what can be expressed

    • What is the salary of Joe’s manager? Who is Mary’s manager?

  • Syntactic: variety of forms

    • What is the salary of the manager of Joe?

  • Lexical: word meanings, synonyms, vocabulary

    • What were the earnings of Joe’s boss


  • Nl factors terms

    NL Factors/Terms

    • Syntactic

      • Grammar or structure

    • Prosodic

      • Inflection, stress, pitch, timing

    • Pragmatic

      • Situated context of utterance, location, time

    • Semantic

      • Meaning of words


    Sr nlu advantages

    SR/NLU Advantages

    • Easy to learn and remember

    • Powerful

    • Fast, efficient (not always)

    • Little screen real estate


    Sr nlu disadvantages

    SR/NLU Disadvantages

    • Assumes domain knowledge

    • Doesn’t work well enough yet

      • Requires confirmation

      • And recognition will always be error-prone

    • Expensive to implement

    • Unrealistic expectations

    • Generate mistrust/anger


    Speech output

    Speech Output

    • Tradeoffs in speed, naturalness and understandability

    • Male or female voice?

      • Technical issues (freq. response of phone)

      • User preference (depends on the application)

    • Rate of speech

      • Technically up to 550 wpm!

      • Depends on listener

    • Synthesized or Pre-recorded?

      • Synthesized: Better coverage, flexibility

      • Recorded: Better quality, acceptance


    Speech output1

    Speech Output

    • Synthesis

      • Quality depends on software ($$)

      • Influence of vocabulary and phrase choices

      • http://www.research.att.com/projects/tts/demo.html

    • Recorded segments

      • Store tones, then put them together

      • The transitions are difficult (e.g., numbers)


    Designing the interaction

    Designing the Interaction

    • Constrain vocabulary

      • Limit valid commands

      • Structure questions wisely (Yes/No)

      • Manage the interaction

      • Examples?

    • Slow speech rate, but concise phrases

    • Design for failsafe error recovery

    • Visual record of input/output

    • Design for the user – Wizard of Oz


    Speech tools toolkits

    Speech Tools/Toolkits

    • Java Speech SDK

      • FreeTTS 1.1.1http://freetts.sourceforge.net/docs/index.php

    • IBM JavaBeans for speech

    • Microsoft speech SDK (Visual Basic, etc.)

    • OS capabilities (speech recognition and synthesis built in to OS) (TextEdit)

    • VoiceXML


    General issues in choosing dialogue style

    General Issues in Choosing Dialogue Style

    • Who is in control - user or computer

    • Initial training required

    • Learning time to become proficient

    • Speed of use

    • Generality/flexibility/power

    • Special skills - typing

    • Gulf of evaluation / gulf of execution

    • Screen space required

    • Computational resources required


    Non speech audio

    Non-speech audio

    • Traditionally used for warnings, alarms or status information

    • Sounds provide information that help reduce error. Eg: typing, video games

    • Multi-modal interfaces


    Additional benefits of non speech audio

    Additional benefits of non-speech audio

    • Good for indicating changes, since we ignore continuous sounds

    • Provides secondary representation

      • Supports visual interface

    • Tradeoff in using natural (real) sounds vs. synthesized noises.


    Non speech audio examples

    Non-speech audio examples

    • Error ding

    • Info beep

    • Email arriving ding

    • Recycle

    • Battery critical

    • Logoff

    • Logon

      Others?


    Pen and gesture

    Pen and Gesture


    Dialog design

    PDAs

    • Becoming more common and widely used

    • Smaller display (160x160), (320x240)

    • Few buttons, interact through pen

    • Estimate: 14 million shipped by 2004

    • Improvements

      • Wireless, color, more memory, better CPU, better OS

    • Palmtop versus Handheld


    No shredder

    No Shredder…


    Dialog design

    http://www.vaio.sony.co.jp/Products/VGN-U50/

    http://sonyelectronics.sonystyle.com/micros/clie/

    http://www.oqo.com/

    http://www.blackberry.com/


    Input

    Input

    • Pen is dominant form for PDA and Tablet

    • Main techniques

      • Free-form ink

      • Soft keyboard

      • Numeric keyboard => text

      • Stroke recognition - strokes not in the shape of characters

      • Hand printing / writing recognition

    • Sometimes can connect keyboard or has modified keyboard (e.g. cell phones)


    Soft keyboards

    Soft Keyboards

    • Common on PDAs and mobile devices

      • Tap on buttons on screen


    Soft keyboard

    Soft Keyboard

    • Presents a small diagram of keyboard

    • You click on buttons/keys with pen

    • QWERTY vs. alphabetical

      • Tradeoffs?

      • Alternatives?


    Numeric keypad t9

    Numeric Keypad -T9

    • Tegic Communications developed

    • You press out letters of your word, it matches the most likely word, then gives optional choices

    • Faster than multiple presses per key

    • Used in mobile phones

    • http://www.t9.com/


    Cirrin stroke recognition

    Cirrin - Stroke Recognition

    • Developed by Jen Mankoff (GT -> Berkeley CS Faculty -> CMU CS Faculty)

    • Word-level unistroke technique

    • UIST ‘98 paper

    • Use stylus to go from one letterto the next ->


    Quikwriting stroke recogntion

    Quikwriting - Stroke Recogntion

    • Developed by Ken Perlin


    Quikwriting example

    Quikwriting Example

    p

    l

    e

    Said to be as fast as graffiti, but have to learn more

    http://mrl.nyu.edu/~perlin/demos/Quikwrite2_0.html


    Hand printing writing recognition

    Hand Printing / Writing Recognition

    • Recognizing letters and numbers and special symbols

    • Lots of systems (commercial too)

    • English, kanji, etc.

    • Not perfect, but people aren’t either!

      • People - 96% handprinted single characters

      • Computer - >97% is really good

    • OCR (Optical Character Recognition)


    Recognition issues

    Recognition Issues

    • Off-line vs. On-line

      • Off-line: After all writing is done, speed not an issue, only quality.

        • Work with either a bit map or vector sequence

      • On-line: Must respond in real-time - but have richer set of features - acceleration, velocity, pressure


    More issues

    More Issues

    • Boxed vs. Free-Form input

      • Sometimes encounter boxes on forms

    • Printed vs. Cursive

      • Cursive is much more difficult to impossible

    • Letters vs. Words

      • Cursive is easier to do in words vs individual letters, as words create more context


    More issues1

    More Issues

    • Using context & words can help

      • Usually requires existence of a dictionary

      • Check to see if word exists

      • Consider 1 vs. I vs. l

    • Training - Many systems improve a lot with training data


    Special alphabets

    Special Alphabets

    • Graffiti - Unistroke alphabet on Palm PDA

      • What are yourexperienceswith Graffiti?

    • Other alphabets or purposes

      • Gestures for commands


    Pen gesture commands

    Pen Gesture Commands

    • Might mean delete

    • Insert

    • Paragraph

    Define a series of (hopefully) simple drawing gesturesthat mean different commands in a system


    Pen use modes

    Pen Use Modes

    • Often, want a mix of free-form drawing and special commands

    • How does user switch modes?

      • Mode icon on screen

      • Button on pen

      • Button on device


    Error correction

    Error Correction

    • Having to correct errors can slow input tremendously

    • Strategies

      • Erase and try again (repetition)

      • When uncertain, system shows list of best guesses (n-best list)

      • Others??


    More ink applications

    More ink applications

    • Signature verification

    • Notetaking

    • Electronic whiteboards and large-scale displays

    • Sketching


    Free form ink

    Free-form Ink

    • Ink is the data, take as is

    • Human is responsible forunderstanding andinterpretation

    • Like a sketch pad

    • Often time-stamped


    Audio notebook

    Audio Notebook

    Stifelman, MIT

    affordances of paper

    notetaking activity

    indexing

    scanning


    Meeting support

    Meeting Support

    • Natural Input

    • Indexing

    • Tivoli

    • Domainobjects


    Eclass

    eClass

    • Ubiquitous Access


    Flatland

    Flatland

    • Supportindividualwhiteboarduse

    • Use study

      • persistence

      • segments

      • Informal

    • Mynatt, E.D., Igarashi, T., Edward, W.K., and LaMarca, A. (1999). "Flatland: New dimensions in office whiteboards." In Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI 1999; Pittsburgh, Pennsylvania). New York: ACM Press, pp. 346-353.


    Example

    Example

    • DENIM – Landay, Berkeley


    General issues in choosing dialogue style1

    General Issues in Choosing Dialogue Style

    • Who is in control - user or computer

    • Initial training required

    • Learning time to become proficient

    • Speed of use

    • Generality/flexibility/power

    • Special skills - typing

    • Gulf of evaluation / gulf of execution

    • Screen space required

    • Computational resources required


    Gesture recognition

    Gesture Recognition

    Tracking 3D hand-arm gestures

    fiber optic, e.g. dataglove

    magnetic tracker, e.g. Polhemus

    Perceptual user interfaces

    emerging area

    mainly computer vision researchers


    Other interesting interactions

    Other interesting interactions

    • 3D interaction

      • Stereoscopic displays

    • Virtual reality

      • Immersive displays such as glasses, caves

    • Augmented reality

      • Head trackers and vision based tracking


  • Login