1 / 17

Informedia 03/12/97

Informedia 03/12/97. Multilingual Informedia: Innovations. Robust Indexing and Retrieval Spanish Speech Recognitiion Searchable User Annotations Data Extraction for Further Analysis Multilingual Document Access English or Spanish Queries English or Spanish Broadcast Video.

kaden
Download Presentation

Informedia 03/12/97

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Informedia 03/12/97

  2. Multilingual Informedia:Innovations • Robust Indexing and Retrieval • Spanish Speech Recognitiion • Searchable User Annotations • Data Extraction for Further Analysis • Multilingual Document Access • English or Spanish Queries • English or Spanish Broadcast Video

  3. Extending the Informedia Digital Video Library Original Informedia Goal • Full content search and retrieval from digital video, audio and text libraries Technology • Integrated speech, image and language processing for automated library creation (indexing, segmentation, abstraction, summarization)

  4. Building on the Informedia Infrastructure • Video and Audio Segmentation • improved segmentation algorithms • extend to multiple languages • Presentation, Reuse and Interoperability • abstractions and video summarization (skims) • “cut and paste” for presentations and reports • Annotations • Initially typed, later spoken • Incrementally indexed for immediate retrieval

  5. Multilingual Integration • Spanish News Broadcast • Digitized from PAL to MPEG-1 • Speech Recognition/Alignment by Sphinx-III • Simple Phrase-based Translation • Processed Automatically into the Informedia Digital Video Library

  6. Multilingual Demo • Running prototype demo • Demonstration of current technologies

  7. Title Generation for Informedia News Stories • Informedia, a multimedia digital library, stores television broadcast news stories. • An extractive summary feature currently locates snippets in news-story transcripts to use as story titles. • GOAL: An improved, non-extractive title-generation feature for Informedia.

  8. KNN-based Topic Detection • Build training index with pre-labeled topics • 45000 Broadcast News stories With new document: • Search for top 10 related stories in training index • Lookup topics for related stories • Re-weight topics by story relevance (select top 5)

  9. Basic Idea for better Titles • Train a statistical model on a corpus of documents with human-assigned titles. • Compare title generation methods: • Extractive Titles • Naïve Bayes, EM, • KNN

  10. Extractive Summarization • MS Word 2000 AutoSummarize • Extracts sentences/fragments as summaries • Similar performance to TF IDF implementation at CMU • Does not use our training corpus

  11. Naïve Bayes • Train a statistical model on a corpus of documents with human-assigned titles. • Title need not be a snippet from the document (contrasts with extractive-summarization techniques). • Suggested by Witbrock & Mittal, 1999. • P(wTitle|wDoc) • works better if Wtitle = WDoc

  12. (K) Nearest Neighbor • Index a corpus of documents with human-assigned titles. • Find the document in the training corpus closest to the current document • Use that title (k=1)

  13. Evaluation of Title Accuracy • Apply to unseen documents, (2 * precision * recall)F1 = _________________ (precision + recall) • Precision = Correct/Retrieved • Recall = Correct/All Possible Correct • Only measured word selection, not orderShould try String Edit Distance (DTW), or Maximal Substring

  14. Multi-Lingual Experiment • 40000 TV news stories with titlesfrom 1998 Broadcast News CD-ROM • tested on 1000 held-out stories evaluated on titles • Using SYSTRAN (Babelfish.altavista.com) translated English-French-English Vocabulary overlap was about 70% (need) ???

  15. Title: CONTINUING COVERAGE OF O. J. SIMPSON CIVIL TRIAL: MORE PHOTOS OF SIMPSON IN MAGLI SHOES MAY SURFACE Example English-French-English Translation: AGAIN THE SHOES OF SIMPSON OF O J BECOME A FOCAL POINT IN SA CIVIL TEST. AND AGAIN A PHOTOGRAPH EAST A CRUCIAL PART OF THE IMAGE. FELDMAN OF CHARLES OF C N N EXPLAINS. THE SOURCES INDICATE C N N THAT THE LAWYERS FOR THE FAMILIES CONTINUING SIMPSON OF O J HAVE NOW ACCESS TO SEVERAL PHOTOGRAPHS ALLEGEDLY LATELY CLEARLY DISCOVERED TO SHOW SIMPSON CARRYING A PAIR OF SHOES OF BRUNO MAGLI OF SWEDEN. AN EXPERT AS REGARDS F B I A TESTIFIED WITH THE CIVIL TEST TO SIMPSON TO THAT SUCH A PAIR A LEFT TO THE COPIES TO SHOE BEHIND TO THE SCENE TO MURDER THE FORMER WIFE TO SIMPSON AND HIS GOLDMAN TO RON TO FRIEND. THE AGENT FOR THE FAMILIES OF VICTIMS A PRESENTED IN THE OBVIOUSNESS A PHOTOGRAPH TAKEN BY THE OAR OF HARRY OF PHOTOGRAPHER BY AND PUBLISHED INSIDE QUOTE THE QUOTATION MARK NATIONALS OF INVESTIGATOR. A TESTIFIED EXPERT A THAT PHOTO A SHOWN SIMPSON CARRYING THE SHOES...

  16. Multilingual Results

  17. Effect of Word Order

More Related