1 / 19

Recording Meetings with the CMU Meeting Recorder Architecture

Recording Meetings with the CMU Meeting Recorder Architecture. Satanjeev Banerjee, et al. School of Computer Science Carnegie Mellon University. Goals. End goal: Build conversational agents That “understand” meetings E.g.: Identify action items Make contributions to meetings

ghalib
Download Presentation

Recording Meetings with the CMU Meeting Recorder Architecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Recording Meetings with the CMU Meeting Recorder Architecture Satanjeev Banerjee, et al. School of Computer Science Carnegie Mellon University

  2. Goals • End goal: Build conversational agents • That “understand” meetings • E.g.: Identify action items • Make contributions to meetings • E.g.: Confirm details of action items • Part of Project CALO: Cognitive Agent that Learns and Organizes • First goal: Create corpus of human meetings • Capture data that we expect agents to use • E.g.: Speech, video, whiteboard markings, etc. Carnegie Mellon University

  3. Desirable Properties of the Recorder • Need to record meetings anywhere • Emphasis on instrumenting user, not room • Assume low network bandwidth • Should still be able to record in the extreme situation where there is no network access! • Should be easy to add new data streams • “Easy” = low time to incorporate new stream • Should be able to support major OS-es Carnegie Mellon University

  4. The Recorder Architecture • Information stream is discretized into events • Either a sequence of events, e.g. utterances • Or one long event, e.g. video data • Each event is given start/end time stamps • Coincide for instantaneous events, e.g. keystroke • Events are stored on local disks • Laptops, shuttle PCs, etc. • Events are (slowly) uploaded to a central server when there is network access Carnegie Mellon University

  5. Event Identification and Logging • Each recorded event has the following identifying information associated with it: • Start and stop time stamps • Name of the meeting and the user • Modality (speech, video, hand-writing, etc.) • After recording an event, its identification information is sent to a logging server • Server creates a list of all the events in a meeting • Good for book-keeping (but not essential) Carnegie Mellon University

  6. Browse Meeting P1 P2 P3 P1 P2 P3 P1 Participant 3 P1 Participant 1 Time server Participant 2 Architecture of Meeting Recorder { DATA_BLOCK session: OTTER user: arudnicky datatype: SPEECH file: \\spot\data\u1.raw Start: 20030917::18:27.600 End: 20030917::18:35.357 } [master] Carnegie Mellon University

  7. Synchronizing the Time Stamps • All event time stamps must be synchronized • We use the Simplified Network Time Protocol • Query a central NTP server for the time • Use the reply and the round-trip time to estimate time difference between local machine and server • Use this to create server-time time stamps • Rough experiments reveal 10ms variance • Caveat: Experiments done on high speed network • What if there is *no* network access? Carnegie Mellon University

  8. Aggregating the Data • Upon network access availability, data is transferred from all sites to a central location • Current recording sites: CMU and Stanford • Implemented a cross-platform version of the MS Background Intelligent Transfer Service • Uploads files in a transparent background process • Throttles bandwidth use as user’s activity goes up • Pauses if network connection is lost • Resumes once network access is restored Carnegie Mellon University

  9. Transcription, Annotation MEETING DATABASE CALO Learning Analysis Data Collection Process (proposed) preparation Independent cross-site collection integration Background data transmission research Carnegie Mellon University

  10. Capturing Close-Talking Speech • Implemented Meeting Recorder Cross Platform (MRCP) to record speech and notes • Speech recorded using head-mounted mics • 11.025 kHz sampling rate used for portability • End pointing done using CMU Sphinx 3 ASR • Each end-pointed utterance is an event • Utterance is recorded to local disk (wav format) • Time stamps are generated using Simple NTP • Utterance’s identifying information is sent to logging server, utterance is queued for upload Carnegie Mellon University

  11. Capturing Typed Notes • Users type notes in client’s note-taking area • “Snapshots” of notes are taken at each carriage return • Each snapshot is an event • Each snapshot is saved to disk, time-stamped, logged, and queued for upload • [Demonstration of MRCP] Carnegie Mellon University

  12. More Details about MRCP • Implemented using cross platform libraries: • wxWidgets for GUI, file access, networking • PortAudio for audio libraries • Currently compiles on Windows, Macintosh OS-X and Linux operating systems • Windows version distributed to other Project CALO sites • Macintosh and Linux versions in beta-testing • WinCE version in development Carnegie Mellon University

  13. Capturing Whiteboard Pen Strokes • We use Mimio to capture whiteboard pen strokes • “Strokes” consist of all the x-y coordinates between pen-down and pen-up • Each stroke is an event. It is recorded, time-stamped, logged, queued for upload. Carnegie Mellon University

  14. Capturing Power Point Slides Information • We use MS’s PowerPoint API to capture slide change timing information, and slide contents • Events = slide changes • Event data = content of the new slide • Content is in the form of all the text, and all the “shapes” on the slide • Events are instantaneous • Start and stop time stamps coincide • Events are processed as before Carnegie Mellon University

  15. Capturing Panoramic Video • We capture panoramic video using a 4-camera CAMEO device • Developed by the Physical Awareness group at CMU • Video recording done in MPEG-4 format • One long event is produced and uploaded Carnegie Mellon University

  16. Current Status of Data Collection • Recorded meetings vary widely in size… • From 2 to 10 person meetings • …in meeting type • Scheduling meetings, presentations, brain storms • …in content • Speech group meetings, dialog group meetings, physical awareness group meetings • Currently have a total of more than 11,000 utterances (including cross talk) Carnegie Mellon University

  17. Using the Data: Some Initial Research • Question: Can we detect the state of a meeting, and the roles of participants from simple speech data? • Introduced a taxonomy of meeting states and participant roles Carnegie Mellon University

  18. Detection Methods and Initial Results • Used Anvil to hand annotate 45 minutes of meeting video with states and roles • Trained decision tree classifier from 30 minutes of data • Input features: • # speakers, lengths of utterances, pauses and interruptions within a short history of the meeting • Initial results: About 50% detection accuracy on separate 15 minutes of test data Carnegie Mellon University

  19. Questions? Thanks to DARPA grant NBCH-D-02-0010

More Related