1 / 25

Information Extraction from Spoken Language

Information Extraction from Spoken Language. Dr Pierre Dumouchel Scientific Vice-President, CRIM Full Professor, ÉTS. PUT RAW DATA NOW and then LINK DATA. http://www.ted.com/talks/tim_berners_lee_on_the_next_web.html. PUT RAW DATA NOW. Text Data (numbers, statistics)

neci
Download Presentation

Information Extraction from Spoken Language

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Information Extraction from Spoken Language Dr Pierre Dumouchel Scientific Vice-President, CRIM Full Professor, ÉTS

  2. PUT RAW DATA NOW and then LINK DATA • http://www.ted.com/talks/tim_berners_lee_on_the_next_web.html

  3. PUT RAW DATA NOW • Text • Data (numbers, statistics) • Data (audio, video)

  4. LINKED DATA • Information is in the relationship between data • Find relationship between them

  5. IBM’s Watson and Jeopardy

  6. Proposal • Information Extraction in radio and television documents • Industrial Partners: • CEDROM Sni • Irosoft • Universities and Research Center • CRIM • ÉTS • INRS-EMT • McGill • NSERC Strategic Project Proposal

  7. Process Raw Audio Data • Automatic Speech Recognition (ASR) • Parsing • Indexation ASR Parsing Indexation

  8. Closed-captioning / Subtitling VOICEWRITER

  9. Closed- captioning / Subtitling • Done with the help of a VoiceWriter that: • Respeaks • Adds punctuation • Selects proper dictionary • Does not speak during advertising • Wraps up information when more than one speakers speak in the same time or when the speech rate is too fast. • Translates

  10. How to process raw audio data? Audio Diarization Speaker Diarization Speaker Recognition Speaker Role Punctuation Structural Segmentation Topic Recognition ASR Parsing Indexation

  11. Audio Diarization • Aims to segment an audio recording into acoustically homogeneous parts • Distinguish between speech and music • Distinguish between advertising and news

  12. Speaker diarization • Aims to segment a speech signal into its speech turns

  13. Speaker Recognition

  14. Speaker Role • In broadcast news speech, most speech is from anchors and reporters. The remaining is from excerpts from quotations or interviews and are referred as sound bites. • Detecting speaker role is important to improve: • acoustice speech recognizer • information extraction

  15. Punctuation • Some language analysis tasks such as parsing and entity extraction needs punctuations (dots and commas) in order to work properly.

  16. Structural Segmentation • Sentence segmentation, paragraph segmentation, story segmentation are important features for speech understanding applications from parsing and information extraction at the basic level. • This problem is absent in text processing but has to be solved in speech processing.

  17. Topic Spotting • Aims to identify the topic of a speech signal. It is useful to adapt the different components of the system as well as to add metatag on a speech signal. • Example: La belle ferme le voile • La: the, her • Belle: beautiful, beauty • Ferme: farm, closes • Le: the, his • Voile: veil, blocks the view • Two hypothetic translations: • The veil is closed by the beauty • The beautiful farm blocks his view

  18. How to improve Information Extraction from speech?By improving ASR Components

  19. Automatic Speech Recognizer • Performance drops when • Out-of-vocabulary (Lexical models) • Multiple users (Acoustic models) • Multiple microphones (Acoustic models) • Multiple topics (Language models) • Cross-over talks (All models)

  20. How to improve Information Extraction from speech? • More data are better data. • More similar data are better data. Similar in terms of • Topic • Coming from the same time period. Specifically, more recent. • Example: Japan • Prediction of what will happen and who will speaks.

  21. More data are better data • Use of the huge amount of web information • Use super computer infrastructure in order to model it in a reasonable time: • Compute Canada infrastructure: CLUMEQ • Cluster of university computers

  22. More similar data are better data • Exploiting redundancies in different media information: • Anchor speech is predominant. • Reporters often appear at specific times, day after day • Advertisings appear (and repeat) near specific time slot, day after day. • The same news is often reused from one media to another.

  23. Exploiting redundancies in different media information

  24. Exploiting redundancies in different media information

  25. And then …. Audio Diarization Speaker Diarization Speaker Recognition Speaker Role Punctuation Structural Segmentation Topic Recognition ASR Parsing Indexation

More Related