1 / 31

6.870 Final Project Webnnel: A channel-based Web navigation system

6.870 Final Project Webnnel: A channel-based Web navigation system. Chen-Hsiang Yu and Oshani Seneviratne {chyu,oshani}@mit.edu. Outline. Introduction Motivations Related Work Our Approach Demonstration User Study Challenges & Future Work Discussion References. Introduction.

rossa
Download Presentation

6.870 Final Project Webnnel: A channel-based Web navigation system

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 6.870 Final Project Webnnel: A channel-based Web navigation system Chen-Hsiang Yu and Oshani Seneviratne {chyu,oshani}@mit.edu 6.870 Multimodal User Interface

  2. Outline • Introduction • Motivations • Related Work • Our Approach • Demonstration • User Study • Challenges & Future Work • Discussion • References 6.870 Multimodal User Interface

  3. Introduction • The Web has become an important medium for delivering information. • Checking e-mails • Reading news • Watching videos • Listening to music • Shopping on the Web • . . . • People are familiar with using the Web, and start to apply similar experience to different domains. • Mobile browsing • Browsing on different Wi-Fi enabled devices • . . . 6.870 Multimodal User Interface

  4. Motivations • However, we spend more than 40% of our time at home. • In this project, we envision an application of Web browsing for home environment. • On the other hand, in the near future, you will watch your TV program along with browse the Web and use Web applications. Access and enjoy your digital entertainment easily on your TV and HDTV 6.870 Multimodal User Interface

  5. Motivations (cont.) • We propose to use multiple modalities to assist Web browsing at home environment. Figure 1: The concept of the Web channel (webnnel) system. 6.870 Multimodal User Interface

  6. Related Work • Web automation and customization • End-user programming for automation and customization on the Web • Chickenfoot [4] • GreaseMonkey [5] Figure 2: Chickenfoot User Interface 6.870 Multimodal User Interface

  7. Related Work (Cont.) • Pre-defined tool-based customization • Web Developer (Chris Pederick [X]) • Platypus (Scott R. Turner [X]) 6.870 Multimodal User Interface

  8. Related Work (Cont.) • Speech Recognition • Microsoft Vista Speech Recognition Engine • Apple Mac Speech Recognition Engine • But none of the above provide the level of customization offered by Webnnel! Figure 3: Microsoft Vista Speech Recognition 6.870 Multimodal User Interface

  9. Our Approach • Design an integrated system with multiple modalities to help users access the Web on a bigger screen. • Webnnel system is composed of three sub-systems • Webnnel Command System • Speech Command Extraction (SCE) System • (Mouse) Gesture Recognition System • Because Web content is easier to access and control by the browser extension, we design our Webnnel Command System as a Firefox extension. • All the modalities, such as speech and (mouse, hand, head) gesture can use it to control the Web content. 6.870 Multimodal User Interface

  10. Our Approach - Webnnel Command System Figure 4: The system architecture of the Webnnel system. 6.870 Multimodal User Interface

  11. Our Approach - Webnnel Command System • Webnnel Command Interface (WCI) • Provide a command interface between different modality engines and Webnnel command system. • Command Abstraction Interface (CAI) • Define high level APIs for WCI to satisfy users’ commands. • Content Manipulation Module (CMM) • Define functions for specific purpose, such as remove image, content transformation, etc, for CAI. Such as “my email” command. • Define functions for CAP to render content, such as web site snapshots or displaying with different formats. • Channel Aggregation and Presentation (CAP) • Provide web site snapshots rendering and UI supports. 6.870 Multimodal User Interface

  12. Our Approach - Webnnel Command System WCI Figure 5: The system architecture of the Webnnel system. 6.870 Multimodal User Interface

  13. Our Approach - Webnnel Command System Table 1: Commands of Webnnel Command System 6.870 Multimodal User Interface

  14. Our Approach - Speech Command Extraction Figure 6: The process of Speech Command Extraction (SCE) 6.870 Multimodal User Interface

  15. Our Approach - Speech Command Extraction • Used the Mac OS Speech Recognition Engine • To add new commands you have to… • Allocate and initialize an instance of NSSpeechRecognizer. • Set the commands that the object should listen for using the setCommands: method. • Set a delegate for the NSSpeechRecognizer object that implements the speechRecognizer:didRecognizeCommand Figure 7: An example of Apple Script 6.870 Multimodal User Interface

  16. Our Approach - (Mouse) Gesture Recognition System • Using Mouse Gesture Recognition Engine [10] • Based on the commands of Webnnel command system to design the (mouse) gesture commands Gesture Gesture Code Action right R7 left L9 web channel UD frame mode RURDR 6.870 Multimodal User Interface

  17. Our Approach - (Mouse) Gesture Recognition System Table 2: Gesture and Gesture Code of (Mouse) Gesture Recognition System 6.870 Multimodal User Interface

  18. Our Approach - Integration • Firefox Web Browser • Webnnel Command System • External Interface: WCI • Internal Modules: CAI, CAP and CMM • Apple Scripts: • Acts as the “glue” between the speech recognition and the Webnnel Command System • Custom scripts for each speech command • Perform keystrokes at the Webnnel command prompt upon recognition • (Mouse) Gesture: • All the (mouse) gesture are recognized by gesture recognition engine • Corresponding commands will be directed to WCI of Webnnel Command System. 6.870 Multimodal User Interface

  19. Our Approach - Integration (Cont.) Figure 6: The system integration of the Webnnel system. 6.870 Multimodal User Interface

  20. Demonstration • Speech Command Extraction System • Webnnel Command System • (Mouse) Gesture Recognition System 6.870 Multimodal User Interface

  21. User Study Conducted the study on 4 users Asked the users to perform 2 tasks using the Webnnel speech recognition system Task 1: Go to a certain website Task 2: Go to their web-based email system Figure 7: The result of user study May 14, 2008 6.870 Multimodal User Interface 21

  22. User Study (cont) • Recognition Accuracy (from the 16 commands we asked them to test the system with): Figure 8: The recognition accuracy of WCI + SCE 6.870 Multimodal User Interface

  23. User Study (cont) General Comments from the users: Commands are natural and easy to remember Liked the tag system Shorter the command it’s better There should be ways to enter the URL directly in to the address bar as well May 14, 2008 6.870 Multimodal User Interface 23

  24. Challenges • The technique and design to manage Web content is still limited. • Ambiguous detection and resolve • Popup window handling • User Interface to store and access personal information • The hacking to third party engine is not easy. • Gesture code design is not well enough to differentiate gestures with the same meaning. Gesture LDRUL Gesture Code L1D3R9U7L 1D397L 6.870 Multimodal User Interface

  25. Challenges (Cont.) • Early experimentation on CMU-Sphinx4 Java based speech recognition failed • Too many configuration parameters to consider • Our custom language model and grammar had a very poor recognition accuracy • Achieving cross platform compatibility: • Compared to the Mac OS, Windows (XP, Vista) and Linux (Ubuntu 7.10) did not have good support for speech recognition. • The quality of the microphones vary across different computers • Introducing many speech commands generally lowers the accuracy of the entire system • Having a stress ball around was very handy while testing the speech recognition! 6.870 Multimodal User Interface

  26. Future Work • Enhance the UI of Webnnel Command System • Add / Delete Web channels • Add / Delete / Modify / Retain / Transfer account information • Provide more dynamic templates to show the Web channels • Integrate other modality inputs, such as hand gesture and head gesture. • Conduct further user study to know the feedback from the user. • Porting the speech recognition system (Mac Speech Recognizer) to other platforms. 6.870 Multimodal User Interface

  27. Discussion • How Webnnel system works? • Separate web content access with the input modalities • Provide a simple interface for input modalities to manage the content • Webneel system demonstrates how application-level engines could use and manage the web content via the browser. • The difficulty of the system integration could be reduced. • However, the difficulty of each input modality design remains. 6.870 Multimodal User Interface

  28. Collaboration • Chen-Hsiang Yu: • Webnnel Command System • Development of Firefox Extension of Webnnel Command System • (Mouse) Gesture Extraction (MGE) System • Oshani Seneviratne: • Speech Command Extraction (SCE) System • User Study May 14, 2008 6.870 Multimodal User Interface 28

  29. References • Apple Speech Recognition Engine, http://developer.apple.com/documentation/Cocoa/Conceptual/Speech/Articles/RecognizeSpeech.html • Avot mV, http://www.avotmedia.com/ • Bigham, J. P., and Ladner, R. E. Accessmonkey: a collaborative scripting framework for web users and developers. In W4A '07, ACM Press, pp. 25-34, 2007. • Bolin, M., Webber, M., Rha, P., Wilson, T. and Miller, R.C. Automation and customization of rendered web pages, Proceedings of the 18th annual ACM symposium on User interface software and technology, October 23-26, 2005. • CMU-Sphinx Speech Recognition Engine, http://cmusphinx.sourceforge.net/html/cmusphinx.php • Greasemonkey, https://addons.mozilla.org/en-US/firefox/addon/748 • Joost, http://www.joost.com/ • Microsoft Windows Vista Speech Recognition system http://www.microsoft.com/enable/products/windowsvista/speech.aspx • Mogulus, http://www.mogulus.com/ • Mouse Gestures, https://addons.mozilla.org/en-US/firefox/addon/39 6.870 Multimodal User Interface

  30. References (cont.) • Petrie, H., Hamilton, F. and King, N. Tension, what tension? Website accessibility and visual design. Proceedings of the 2004 international cross-disciplinaryworkshop on Web accessibility (W4A), pp. 13-18, 2004. • Richards, J. and Hanson, V. Web accessibility: a broader view. Proceedings of the 13th international conference on World Wide Web, pp. 72-79, 2004. 6.870 Multimodal User Interface

  31. Any Question? {chyu,oshani}@mit.edu May 14, 2008 6.870 Multimodal User Interface 31

More Related