1 / 37

Interaction – Speech and Pen

Interaction – Speech and Pen. Natural input. Universal design Take advantage of familiarity, existing knowledge Alternative input & output Multi-modal interfaces Getting “off the desktop”. Speech dialogue. Why use it? Hands busy Mobility required Eyes occupied

Download Presentation

Interaction – Speech and Pen

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Interaction – Speech and Pen

  2. Natural input • Universal design • Take advantage of familiarity, existing knowledge • Alternative input & output • Multi-modal interfaces • Getting “off the desktop”

  3. Speech dialogue • Why use it? • Hands busy • Mobility required • Eyes occupied • Conditions preclude use of keyboard • Visual impairment • Physical limitation

  4. Speech Input • Speaker recognition • Tell which person it is (voice print) • Monitoring, recording • Speech recognition • Identify words • IBM ViaVoice, Dragon Dictate, ... • Natural language understanding • (does not necessarily involve audio)

  5. Recognition Dimensions • Speaker dependent/independent • Parametric patterns are sensitive to speaker • With training (dependent) can get better • Vocabulary • Some have 50,000+ words • Isolated word vs. continuous speech • Continuous: where words stop & begin • Typically a pattern match, no context used Did youvs. Didja

  6. My flight experience uh, I started off as a private pilot and then I spent 20 years flying as a navigator on C130s in the air force and I currently fly as a first officer on the SAAB 340 turboprop. my flight experience a and I started off as a private pilot and I spent 20 years flying is a navigator on C-130s in the Air Force and I currently fly as a first officer on the sound 340 turboprop. Recognition Example:

  7. Here here's one of the thoughts I'd throw in on that. I'd agree with Howard that the combination of checklists and flows is the way way to manage things best. hair and here's one of the outside troodon and an eye to agree with power that the domination checklists and flows is the way that managed things best And another example

  8. Errors • Systems make four types of errors: • Substitution - one for another • Rejection - detected, but not recognized • Insertion - added • Deletion - not detected • So how do you recover from an error?

  9. Natural Language Understanding • Putting meaning to the words • Input might be spoken or typed • Holy grail of Artificial Intelligence problems

  10. Eliza - a “Faked” Understanding • Person : Men are all alike.ELIZA : In what way?Person : They're always bugging us about something or other.ELIZA : Can you think of a specific example?Person : Well, my boyfriend made me come here.ELIZA : Your boyfriend made you come here?Person : He says I'm depressed much of the time.ELIZA : I am sorry to hear that you are depressed.Person : It's true. I am unhappy.ELIZA : Do you think coming here will help you not to be unhappy?Person : I need some help, that much seems certain.ELIZA : What would it mean to you if you got some help?Person : Perhaps I could learn to get along with my mother.ELIZA : Tell me more about your family http://www-ai.ijs.si/eliza/eliza.html Weizenbaum, J., "ELIZA -- A computer program for the study of natural language communication between man and machine", Communications of the ACM 9(1):36-45, 1966

  11. NL Factors/Terms • Syntactic • Grammar or structure • Prosodic • Inflection, stress, pitch, timing • Pragmatic • Situated context of utterance, location, time • Semantic • Meaning of words

  12. Advantages Easy to learn and remember Powerful Fast, efficient (not always) Little screen real estate Disadvantages Assumes domain knowledge Doesn’t work well enough yet Requires confirmation And recognition will always be error-prone Expensive to implement Unrealistic expectations can generate mistrust SR/NLU Issues

  13. Speech Output • Tradeoffs in speed, naturalness and understandability • Male or female voice? • Technical issues (freq. response of phone) • User preference (depends on the application) • Rate of speech • Technically up to 550 wpm! • Depends on listener • Synthesized or Pre-recorded? • Synthesized: Better coverage, flexibility • Recorded: Better quality, acceptance

  14. Speech Output • Synthesis • Quality depends on software ($$) • Influence of vocabulary and phrase choices • http://www.research.att.com/~ttsweb/tts/demo.php#top • Recorded segments • Store tones, then put them together • The transitions are difficult (e.g., numbers)

  15. Designing Speech Interaction • Constrain vocabulary • Limit valid commands • Structure questions wisely (Yes/No) • Manage the interaction • Examples? • Slow speech rate, but concise phrases • Design for failsafe error recovery • Visual record of input/output • Test it out first

  16. Speech Tools/Toolkits • Java Speech SDK • FreeTTS 1.1.1http://freetts.sourceforge.net/docs/index.php • IBM JavaBeans for speech • Microsoft speech SDK (Visual Basic, etc.) • VoiceXML

  17. General Issues – Speech/NL • Initial training required • Learning time to become proficient • Speed of use • Generality/flexibility/power • Special skills - typing • Screen space required • Computational resources required

  18. Non-speech audio • Good for indicating changes, since we ignore continuous sounds • Traditionally used for warnings, alarms or status information • Provides secondary representation • Supports visual interface • Provides information that helps reduce error • Tradeoff in using natural (real) sounds vs. synthesized noises.

  19. Non-speech audio examples • Error ding • Info beep • Email arriving ding • Recycle • Battery critical • Logoff • Logon Others?

  20. Pen, Touch, & Mobile interaction We’ve got it all but the shredder now…

  21. Pen, Touch, & Mobile dialog • Stylus or finger • Tradeoffs of each? • Pen as a standard mouse (doubleclick?) • Variety of platforms • Desktop touch screens or input pads (Wacom) • Tablet PCs • Handheld and Mobile devices • Electronic whiteboards • Platforms often involve variety of size and other constraints

  22. Mobile devices • More common as more platforms available • PDA • Cell phone • Ultra mobile tablets • GPS • Smaller display (160x160), (320x240) • Few buttons, different interactions • Free-form ink • Soft keyboard • Numeric keyboard => text • Stroke recognition • Hand printing / writing recognition

  23. http://www.blackberry.com/ http://www.oqo.com/

  24. Soft Keyboard • Presents a small diagram of keyboard • You click on buttons/keys with pen • QWERTY vs. alphabetical • Tradeoffs? • Alternatives?

  25. Numeric Keypad • You press out letters of your word, it matches the most likely word, then gives optional choices • Faster than multiple presses per key • Used in mobile phones • http://www.t9.com/

  26. Other pen text input • Graffiti – Palm mobile devices • Unistroke recognition • Experimental • Cirrin • World level unistroke • Quickwriting • Harder to learn than graffiti

  27. Hand Printing / Writing Recognition • Recognizing letters and numbers and special symbols • Lots of systems (commercial too) • English, kanji, etc. • Not perfect, but people aren’t either! • People - 96% handprinted single characters • Computer - >97% is really good

  28. Recognition Issues • Boxed vs. Free-Form input • Sometimes encounter boxes on forms • Printed vs. Cursive • Cursive is much more difficult • Letters vs. Words • Cursive is easier to do in words vs individual letters, as words create more context • Usually requires existence of a dictionary • Real-time vs. off-line

  29. Pen Gesture Commands • Might mean delete • Insert • Paragraph Define a series of (hopefully) simple drawing gesturesthat mean different commands in a system

  30. Pen Use Modes • Often, want a mix of free-form drawing and special commands • How does user switch modes? • Mode icon on screen • Button on pen • Button on device

  31. Error Correction • Having to correct errors can slow input tremendously • Strategies • Erase and try again (repetition) • When uncertain, system shows list of best guesses (n-best list) • Others??

  32. Free-form Ink • Ink is the data, take as is • Human is responsible forunderstanding andinterpretation • Often time-stamped • Applications • Signature verification • Notetaking • Electronic whiteboards • Sketching

  33. Electronic whiteboards • Smartboard and Mimio • Can integrate with projection • Large surface to interact with • Issues? http://www.mimio.com/ http://www.smarttech.com/

  34. Touch tables • Which techniques might be similar to smaller touchscreens? • Which would differ? • How similar and different from interactive white boards? Microsoft Surface

  35. Real paper • Anoto digital paper and pen technology (http://www.anoto.com/) • Other pens available: • Issues? http://www.logitech.com/ http://www.epos-ps.com/

  36. General Issues – Pen input • Initial training required • Learning time to become proficient • Speed of use • Generality/flexibility/power • Special skills - typing • Screen space required • Computational resources required

  37. Other interesting interactions • Gesture input • Wii • Lots of other specialized hardware for tracking • 3D interaction • Stereoscopic displays • Virtual reality • Immersive displays such as glasses, caves • Augmented reality • Head trackers and vision based tracking • Tangible interaction • Use physical objects to express input

More Related