1 / 45

Communications, Collaboration, and Community

Communications, Collaboration, and Community. Anoop Gupta Microsoft Research Collaborators: Michael Cohen, Ross Cutler, Zicheng Liu, Yong Rui, Kentaro Toyama, Zhengyou Zhang, and others. Deployment-Driven Multidisciplinary Research: Challenges and Opportunities. Anoop Gupta

messier
Download Presentation

Communications, Collaboration, and Community

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Communications, Collaboration, and Community Anoop Gupta Microsoft Research Collaborators: Michael Cohen, Ross Cutler, Zicheng Liu, Yong Rui, Kentaro Toyama, Zhengyou Zhang, and others

  2. Deployment-Driven Multidisciplinary Research:Challenges and Opportunities Anoop Gupta Microsoft Research Collaborators: Michael Cohen, Ross Cutler, Zicheng Liu, Yong Rui, Kentaro Toyama, Zhengyou Zhang, and others

  3. Collaboration and Multimedia Group • 16 people • 9 Researchers, 5 R-SDEs, 1 Designer, 1 Usability • Diverse: Systems, Cog Psych, Sociologist, Vision, Graphics • Focus: • Peripheral awareness and people-centric interfaces • Tele-presentation and tele-meeting technologies • Make audio-video information a first-class citizen • Enhanced online communities =>Technologies, Applications, and Social Factors

  4. Peripheral awareness and people-centric interfaces • How do we stay aware of relevant information without annoying notifications • How do we stay aware of people, communicate with them, and bring them to the front of the user interface • How can we leverage technology to provide a better idea of people/environment state

  5. Tele-presentations and tele-meetings • Leverage the combination of • cheap sensors (cameras, microphones, …), • cheap computing power, bandwidth, and storage, • Advances in vision-graphics-SP technologies • Convincing remote presence and interactivity • Whiteboard, note-taking, local interaction tools • High quality recording and archiving • Rich indices and browsing support

  6. Make audio-video information a first-class citizen • Low-cost and high-quality capture • Automatic index creation and highlights • Rich support for annotation and collaboration • Browsing tools and interfaces

  7. Enhanced online communities • Tracking Interaction / Social History • Incentive Structures • Encourage high quality content creation • Encourage interaction • Discourage inappropriate behavior • Filtering and Synopsis • Community Portals

  8. Outline • Our group • Research approach • Project samplings • Office activity modeling • Distributed meetings • Tele-presentations • Face modeling • Concluding Remarks / Challenges

  9. Evaluation / Publication Refine Prototype Product Impact Build Prototype Research Approach • Deployment-driven research • End-users vs. other researchers as main customer • Robustness vs. Functionality • Multiple sensor technologies with graceful degradation • Value existing infrastructure • Simplicity of set-up and operation • Design with end-user in the loop • Field evaluations • Multi-disciplinary tool-set

  10. 1. Office Activity Modeling(joint with ASI group at MSR) • Uses of Office Awareness • Intelligent messaging • Send messages on appropriate channel • instant message, office phone, e-mail, mobile, etc. • Intelligent instant messaging • Stopped typing = not there • Peripheral awareness for “buddies” • Is now a good time to drop by Jack’s office?

  11. So how does the deployment-driven approach impact our decisions?

  12. Environment and Outputs • Environment • Office with door (w/ window); Cubicle; Open plan; … • Number of people • (0 / 1+) | (0 / 1 / 1+) | (0/1/2/3/…) • Gross activity • At desk; On PC ; On phone; In meeting; … • Fine activity • Who are the people present • Reading; Answering mail; … • Activity Trends • Usually comes in at 7am, leaves at 5pm • Never comes in on weekends • …

  13. Sensors • Keyboard / Mouse • Calendar (appointment schedule) • Desktop microphone • TAPI-enabled phone (VoIP) • Desktop camera • Other: • Motion detector, high-quality microphone / headset; bird’s-eye camera; laser/IR gates;thermal cameras etc.

  14. Making the Inferences… in increasing approximate expected order of research interest • Use reliable sensors as much as possible • Use reliable sensors to label data for other sensors • For vision, stick to reliably extractable, robust cues (e.g., presence of motion, optic flow) • “Quasi-supervised” learning, using data labeled as above

  15. Results • Eve/Priorities project at MSR (ASI) • Integrates capture of features (keyboard/mouse use, app use, vision, audio events,…) • Language for combining low-level features • Bayesian fusion • Vision component can determine whether person is facing front or not, but still not as robust as desired • Current work in quasi-supervised learning of low-level features… Hope to deploy base versions in summer

  16. Results(preliminary) Concatentation of 3 sections of low-level vision data only, sampled from 8-hour log Unsupervised clustering segments sections cleanly.

  17. Correlates with high keyboard/mouse activity, no speech Ground truth: 1 person at monitor Results(preliminary)

  18. Benefits and Challenges • Benefits • Prioritizing problems and context • How far we need to push the solution • Earlier benefits for end-users; enables social science research • Drawbacks • Need substantial engineering (plus algorithmic) skills • Need multidisciplinary team

  19. 2. Distributed Small Group Meetings • Scenario: • Imagine 8-10 people • In conference room, from desktops, mobile • Rich back and forth interaction • Archival and browsing support

  20. Contextualized Research Challenges • Novel camera, microphone, display systems • Speaker tracking; multi-person tracking • Gaze and pose correction • Activity tracking and gesture recognition • Graphical avatars and virtual environments • Real and virtual camera management • Automated indexing and browsing support • Integration of handheld devices • User interface / User experience

  21. First Prototype Omni-directional camera Meeting environment 360-degree panorama view An example omni image

  22. Second Prototype • Cost $300 vs. $10K • Much better quality ~3000 x 500 pixels • All processing done on the PC

  23. All-up Computer controlled User controlled User + Computer + Overview Remote Interfaces

  24. Short/Medium Term Plan • Cameras, Calibration, Stitching • Camera design to minimize parallax • Automatic camera calibration • Real-time on today’s processors • Speaker detection and multiple-person detection • Microphone array sound source localization • Computer vision tracking of multiple people • Fusing A/V for better speaker detection • Simple remote participation interface • Automatic camera management • Video compression, storage, and transmission • Automatic index creation and meeting browsing Expect to deploy in a few conference rooms during summer

  25. 3. Tele-Presentations • Enable people to • Easily broadcast/capture lectures (speaker and audience) • Esthetically pleasing • Participate from remote locations • Solution components • Tracking cameras, microphone arrays, … • Video production rules from professionals • Mapping of rules to cameras and software video director • Remote presence and interactivity system (TELEP) • First prototype being used in the small lecture room at MSR

  26. Key Modules • Speaker tracking and audience tracking • Computer-vision-based tracking • Microphone-array-based tracking

  27. Key modules (cont) • Virtual video director (FSM) • Maintain min shot duration • Dynamic max shot duration • Function of shot quality • Triggers TIME_EXPIRE event • Monitoring status change • Triggers STATUS event • Encode editing knowledge into transition probabilities

  28. Initial Deployment Results • Tested concurrent human operator and our system • Field study • Lab study • Results: • Human operator better, but difference is not statistically significant • People could not distinguish which operator was human and which was computer

  29. Technical Challenges • Design and configuration of camera/m-phone systems • More robust lecturer tracking • Smooth tracking in close-up shots • Multiple lecturers • Lecturers move into the audience area • More robust audience tracking • Background noise and room reverberation • More sophisticated rules and knowledge • Human operators have much better ability to deal with exceptions • A flexible/learning automated camera management system

  30. 4. Face Modeling • Technical goals: • Build a realistic-looking face model from video images • The face model can be animated right away • Painless in data acquisition & Efficient in model building • Commodity equipment (computer+camera) • No special requirement on the acquisition condition (background, lighting, …) • Uses: • Enhanced chat / gaming environments • Conferencing over low-bandwidth links

  31. System Overview

  32. Examples

  33. Example Application: Virtual Poker • Designed as a social interface • Each player controls an avatar • Some behaviors automatically generated

  34. I guess it’s my turn Virtual Poker • Players automatically turn to follow action/voice

  35. Research Challenges • Teeth, tongue, eyes and hair • Personalized facial expressions • Real-time animation driven from video • Yet more robust and easy to use

  36. Outline • Our group • Research approach • Project samplings • Office activity modeling • Distributed meetings • Tele-presentations • Face modeling • Concluding Remarks / Challenges

  37. % Complete Effort Spent Concluding Remarks • Focus on deployment-driven research • Tremendous leverage in: • Prioritizing problems we explore • Context we assume while solving • How far we push the solution • Earlier benefits for end-users • Enabling social science research • Keeping management support

  38. Challenges: • Need more resources (or pursue fewer things) • Need substantial engineering (plus algorithmic) skills • Premier conferences do not appreciate engineering aspects • Not all important research yields to above constraints • Some solution options: • Community shared infrastructure (environments) into which things can be plugged (e.g., SUIF for compilers) • Premier conferences / Senior researchers attitudes • Funding agency attitudes

  39. Focus on multidisciplinary research • Tremendous leverage in providing: • More robust solutions (or solutions at all) • More cost effective solutions • Getting deployment of research ideas out to end-user and the knowledge from resulting feedback • Challenges: • Vision, Video, Graphics, Hardware, Speech, SP, … • Need diversity within the group plus close ties externally • Need supportive management and funding structure • Academic departments, lab research groups, conferences, tenure organized around traditional disciplinary boundaries • Discourages pushing one discipline as hard as possible when another provides an easier answer

  40. Some solution components: • Strong leaders (e.g., Hennessy – Brought Arch, Compilers, Prog. Lang, OS folks together) • Premier conferences / Senior researchers attitudes • Funding agency attitudes

  41. Questions / Discussion • Graphics: What is the killer application in the workplace? • Vision: How can we identifying the state of the art to a non-expert? • Are you satisfied with the degree of connection with the end-user/reality in your sub-field? • What do you think of the role of multi-disciplinary research? Who should do it? • Do we have balance?

  42. Graphics: What is the killer application in the workplace • We have tried: • 3D Shell • 3D Avatars in tele-meetings • 3D in visualizations, … • … • Killer application still eludes us

  43. Vision: Identifying the state of the art • E.g., Speech • Speaker dependent or independent • Size of vocabulary • Language model / Grammar / Domain • Microphone quality • What’s the equivalent for vision • How can we characterize / partition / … the space in a way so that the non-expert knows when/where vision technology can be relied upon

  44. Questions / Discussion

More Related