The IBM 2006 Spoken Term Detection System Jonathan Mamou IBM Haifa Research Labs Olivier Siohan Bhuvana Ramabhadran IBM T. J. Watson Research Center Outline System description Indexing
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
IBM Haifa Research Labs
IBM T. J. Watson Research Center
D. Povey, B. Kingsbury, L. Mangu, G. Saon, H. Soltau, G. Zweig,
"fMPE: Discriminatively Trained Features for Speech Recognition",
in Proceedings International Conference on Acoustics Speech and
Signal Processing, Philadelphia, PA, 2005.
Huang, J. et al, “The IBM RT06S Speech-To-Text Evaluation System",
NIST TR06S Workshop, May 3-4, 2006.
O. Siohan, M. Bacchiani, "Fast vocabulary independent audio search using path based graph indexing", Proceedings of Interspeech 2005, Lisbon, pp. 53-56.
Main Issue: designing a fragment inventory
A. Stolcke, "Entropy-based pruning of backoff languge models", in Proceedings DARPA Broadcast News Transcription and Understanding Workshop, pp. 270-274, Lansdowne, VA, Feb. 1998.
Indices are stored using Juru storage
Input: a corpus of word/sub-word transcripts
1. Extract units of indexing from the transcript
2. For each unit of indexing (word or sub-word), store in the index its posting
- transcript/speaker identifier (tid)
- begin time (bt)
- For WCN
- posterior probability
- rank relative to the other hypotheses
Output: an index on the corpus of transcripts
J. Mamou, D. Carmel and R. Hoory, "Spoken Document Retrieval from Call-center conversations", Proceedings of SIGIR, 2006
Our scoring model is based on two pieces of information provided by WCN:
Input: a query term, word based index , sub-word based index
1. Extract the query keywords
2. For In-Voc query keywords, extract the posting lists from the word based index
3. For OOV query keywords, convert the keywords to sub-words and extract the posting list of each sub-word from the sub-word index
4. Merge the different posting lists according to the timestamp of the occurrences in order to create results matching the query
- check that the words and sub-words appear in the right order according to their begin times
- check that the words/sub-words are adjacent (less that 0.5 sec for word-word, word-phoneme and less than 0.2 sec for phoneme-phoneme)
Output: the set of all the matches of the given term
List from Word
Word-Word, Word-Phone: < 0.5s
Phone-Phone: < 0.2s
in the Query
Merge based on
begin time and
List from Phone
Set of matches for all
terms in the query
In general we performed better on long terms.