1 / 29

Media Manager Mail Access Unified Messaging

Media Manager Mail Access Unified Messaging. Barbara Hohlt UC Berkeley Ericsson Presentation August 22, 2000. Desktop. Pager. MediaManager Mail Access. Cell-Phone. PSTN Phone. Messages from many sources. ???. Project Overview. Make messages more accessible Get all types of messages

Download Presentation

Media Manager Mail Access Unified Messaging

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Media Manager Mail AccessUnified Messaging Barbara Hohlt UC Berkeley Ericsson Presentation August 22, 2000

  2. Desktop Pager MediaManager Mail Access Cell-Phone PSTN Phone Messages from many sources ???

  3. Project Overview • Make messages more accessible • Get all types of messages • Access from different devices with different capabilities • Enable faster browsing of many voicemails • Media Mail services • A unified messaging infrastructure • Voicemail is email encoded in MIME • Transcoding services • Enhance voicemail interaction • Includes: skimmed audio, transcript, text/audio summary, and outline

  4. Related Work • Universal Inboxes/Unified Messaging • onebox.com • CoolMail.net • Lucent/Octel Unified Messenger • Stanford Mobile People Architecture • Audio Content Extraction Techniques • SpeechSkimmer, MIT’s MultiMedia Lab [Arons95] • Auto-Summarization, Microsoft Research • CueVideo, IBM

  5. Client Client Folder Store Client • Transcoder Service • Voicemail->Text Transcript • Voicemail->Text Summary • Voicemail->Text Outline • Email ->Plain Audio • Email -. GSM Audio • Voicemail -> GSM Summary • Voicemail->Audio Summary • Voicemail->Skimmed Audio Media Manager Interface Media Manager Service Mail Access Interface Mail Access Interface Mail Access Interface NinjaMail POP IMAP Architecture

  6. Desktop MediaManager Mail Access Applications • Conventional GUIs • Context-Aware Applications • Iceberg Universal Inbox Component A conventional desktop gui can contact the Media Manager directly and request messages as text. The Media Manager will return emails and voicemails as text.

  7. requests a redirection from the proxy, which forwards the redirection request to the desktop 2 Desktop 1 palm device asks for a list of messages as text and selects a voicemail 3 desktop asks for the voicemail and plays it MediaManager Mail Access Palm Device Context-Aware Application Redirection Proxy

  8. Naming Service 800-MEDIA-MGR UID: mediamgr@cs.berkeley.edu 1 Preference Registry 2 mediamgr: Cluster locn. 3 Automatic Path Creation Service Bhaskar’s Cell-Phone Universal Inbox MediaManager Mail Access Barbara’s PSTN Phone Iceberg Universal Inbox

  9. Mail Access Interface Mail Access Interface Mail Access Interface NinjaMail POP IMAP Architecture Client Client Folder Store Client • Transcoder Service • Voicemail->Text Transcript • Voicemail->Text Summary • Voicemail->Text Outline • Email ->Plain Audio • Email -. GSM Audio • Voicemail -> GSM Summary • Voicemail->Audio Summary • Voicemail->Skimmed Audio Media Manager Interface Media Manager Service

  10. MediaManagerServiceIF • getFolders( ) and getFoldersAs( ) • Given a username, returns a list of folder names • Returns the list as audio or gsm • getList( ) and getListAs( ) • Given a username, foldername, and count • Returns a list of messages (sendername, title, date) • Returns the list as audio or gsm • getMessage( ) • Given a Message Ref, returns the entire message • getMessageContent( ) • Given a Content ID and return type • Returns one part of the message as the return type

  11. Messages and Content Objects • Media Message • Media Reference id • Array of Content Objects • Content Object • Content ID • Data • Content ID • Media Reference id • Content Part index • Content Type

  12. Media Message Header Content Object Content ID Cell-Phone MediaManager Mail Access Interface Example • User asks for list of messages as GSM • Media Manager returns a list of message headers • Cell Phone sends a Content ID back • Media Manager sends a voicemail Content Object

  13. Audio Tools • Speech Recognition/Synthesis • Transcribe voicemail to text • IBM ViaVoice SDK and custom audio libs • Natural Language Processing • Directed word spotting by “understanding” content • ViaVoice SRCL • Pitch • Detecting important words by emphasized pitch • Pause • Compression through pause removal • Spurts • Retrieve sentence structure of voicemail

  14. Transcoding Techniques

  15. Translated Talk spurts • Phyllis Barbara • Area in the cat staring • And then if you run but feed them • A little more the first time in case they eat too much • On my number is (713) 465-5155 • You can call me anytime. • Have every holiday • Of light Translated using NLP • Hello this is Barbara • My number is (713) 465-5155 Examples Original Voicemail: “Hello, This is Barbara. How are you and the cats doing? I was wondering if you would feed them a little more the first time in case they eat too much. My number is (713) 465-5155. You can call me anytime. Have a very good holiday. Bye bye” Processed Voicemail: (Skimmed) (Just pitch) (Pitch emphasized words in green)

  16. Translated Talk spurts Translated using NLP • <Nothing> Examples continued... Original Voicemail: “Faced with a seemingly inevitable engineering task authors tend to adopt one of two strategies for adding new services to the Internet landscape: inflexible, highly tuned, hand-constructed services….” Processed Voicemail: (Skimmed) (Just pitch) • Faced with a seemingly inevitable engineering task authors tend to adopt what it to strategies for adding new services to the internet landscape. • Inflexible, highly Tate, had constructed services….” (Pitch emphasized words in green)

  17. Results • Pause detection • Worked well for given applications • Playback speedup by 50-70% • Pitch detection • Problems due to high pitch sounds and transitions • Speech recognition • Performance decrease in conversational settings • Natural Language Processing • Performed well with small grammar

  18. Example: Adding GSM Acess • Define a specific types, ie GSMAudio, GSMSummary • Optionally create new Content Objects • Add Content Object definition to MediaManager • Add add gsm transcoder to TranscoderService

  19. Detail: Adding GSM Access • Add Content Object definition to MediaManager • Define GSMAUDIO and GSMSUMMARY • Add cases to createObject() in Content Object • Add cases to Media Manager • Add GSM to Transcodeer • Add method toGSM() to Transcoder • Edit .config file • External.transcoder.gsm rungsm • Edit related transcoders • speechSynthesizer and audioSummary()

  20. Implementing Other Mail Stores • Examples: IMAP, POP, Microsoft Exchange Server • Implement MailAccessIF • String [] getMAFolders( userName ) • MediaMessage [] getMAList( userName, folderName, count ) • MediaMessage getMAMessage( MediaRef ) • ContentObject getMAMessageContent( ContentID ) • Add new protocol to Media Manager protocol table • Optionally add protocol for users in to FolderStore

  21. Conclusion • Overall • System useful as navigational hints • To achieve total comprehension, need better voice recognition • What works well • Skimming using pause removal • Detecting spurts for structure • What needs work • Speech detection in conversational settings • Pitch emphasis needs refining • Future Directions • Implementing more mail stores • Enhancing interfaces • Pause detection/word boundaries using speech detection • Developing voicemail grammars • Using NLP feedback with pitch emphasis detection • Improved speech detection in noisy environments

  22. MediaManagerServiceIF • String[] getFolders( userName ) • byte[][] getFoldersAs( userName, returnType ) • MediaMessage [] getList( userName, folderName, count ) • byte[][] getListAs( userName, folderName, count, returnType ) • MediaMessage getMessage( MediaRef ) • ContentObject getMessageContent( ContentID, returnType )

  23. Pitch Detection • The Idea • A speaker’s pitch naturally changes when introducing topics or emphasizing words [Hirshberg92] • Use pitch increases as hints for “important” words • Algorithm [Aaron95] • Determine pitch for each 20 ms frame (FFT with SHS) • Set emphasis threshold to be top 1% of pitch values (by histogram) • Mark 1 sec interval as emphasized if contains >=3 emphasized frames

  24. Percent of Frames Average energy (dB) Pause Detection • Why is pause detection useful? • Removing pauses speedups playback • Typically, 50-70% of original time [Foulke71] • Long pauses signify groups (talk spurts) • Noise and soft sounds create difficulties • Algorithm: Smoothed Histogram [Lamet81] • Calculate energy per 10 ms frame • Threshold based on smoothed histogram (5 dB after first peak) • Use heuristics to remove artifacts

  25. Results • Pause detection • Worked well for given applications • Playback speedup by 50-70% • Pitch detection • Problems due to high pitch sounds and transitions • Speech recognition • Performance decrease in conversational settings • Natural Language Processing • Performed well with small grammar

  26. Conclusion • Overall • System useful as navigational hints • To achieve total comprehension, need better voice recognition • What works well • Skimming using pause removal • Detecting spurts for structure • What needs work • Speech detection in conversational settings • Pitch emphasis needs refining • Future Directions • Implementing more mail stores • Enhancing interfaces • Pause detection/word boundaries using speech detection • Developing voicemail grammars • Using NLP feedback with pitch emphasis detection • Improved speech detection in noisy environments

  27. Works Cited • [Arons95]B. Arons. Interactively Skimming Recorded Speech, Ph.D. dissertation, MIT 1985. • [Foulke71]E. Foulke The Perception of Time Compressed Speech. Ch 4 in Perception of Language, edit by P.M. Kjeldergaaid, D.L. Horton, and J.J. Jenkins, Charles E. Merill Publishing Company, 1971. pp. 79-107 • [Hirshberg92]J. Hirschberg and B. Grosz. Intonational Features of Local and Global Discourse. In Proceedings of the Speech and Natural Language workshop (Harriman, NY, Feb. 23-26). Morgan Kaufman Publishers, 1992. pp. 441-446. • [Lamel81]L.F. Lamel, L.R. Rabiner, A.E. Rosenberg, and J.G. Wilpson. An Improved Endpoint Detector for Isolated Word Recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing ASSP-29, 4. (Aug, 1981), 771-785.

  28. Media Manager Interface Media Manager Service Mail Access Interface Mail Access Interface Mail Access Interface NinjaMail POP IMAP Architecture Client Client • Transcoder Service • Voicemail->Text Transcript • Voicemail->Text Summary • Voicemail->Text Outline • Email ->Plain Audio • Email -. GSM Audio • Voicemail -> GSM Summary • Voicemail->Audio Summary • Voicemail->Skimmed Audio Folder Store Client

More Related