1 / 44

Unsupervised Language Model Adaptation for Automatic Speech Recognition of Broadcast News Using Web 2.0

Unsupervised Language Model Adaptation for Automatic Speech Recognition of Broadcast News Using Web 2.0. Tim Schlippe , Lukasz Gren , Ngoc Thang Vu , Tanja Schultz. Outline. Motivation and Introduction Text Collection and Decoding Strategy Corpora and Baseline Language Models Experiments

adem
Download Presentation

Unsupervised Language Model Adaptation for Automatic Speech Recognition of Broadcast News Using Web 2.0

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Unsupervised Language Model Adaptation for Automatic Speech Recognition of Broadcast News Using Web 2.0 Tim Schlippe, Lukasz Gren, Ngoc ThangVu, Tanja Schultz

  2. Outline • Motivation and Introduction • Text Collection and Decoding Strategy • Corpora and Baseline Language Models • Experiments • Time- and Topic-Relevant Text Data from RSS Feeds • Time- and Topic-Relevant Text Data from Twitter • Vocabulary Adaptation • Conclusion and Future Work Unsupervised Language Model Adaptation for Automatic Speech Recognition of Broadcast News Using Web 2.0

  3. Motivation (1) • Broadcast news mostly contain the latest developments • new words emerge frequently and different topics get into the focus ofattention • To adapt automatic speech recognition (ASR) for broadcast news • update language model (LM) and pronunciation dictionary Unsupervised Language Model Adaptation for Automatic Speech Recognition of Broadcast News Using Web 2.0

  4. Motivation (2) • Using paradigms from Web 2.0 (Oreilly, 2007)to obtain time- and topic relevant data • Internet communityprovides more appropriate texts concerning the latest news faster than on the static web pages • Texts from older news that do not fit the topic of the show in question can be left out • Examples: • Socialnetworkingsites • Blogs • Web applications Unsupervised Language Model Adaptation for Automatic Speech Recognition of Broadcast News Using Web 2.0

  5. Introduction (1) • RSS Feeds • Small automatically generated XML files containing time-stamped URLs of the published updates • Can easily be found on almost all online news websites • Possibilitytogetdatafittingto a certain time interval • Twitter • Enables its users to send and read text-based messages of up to 140 characters (Tweets) • Tweets more real-time than traditional websites and a large amount of text data available • Restriction: Not possible to get Tweets that are older than 6-8 days with Twitter REST API Unsupervised Language Model Adaptation for Automatic Speech Recognition of Broadcast News Using Web 2.0

  6. Introduction (2) • Researchers have used WWW as an additional source of training data for language modeling • Initial works to use Tweets and RSS Feed services (Feng and Renger, 2012) (Martins, 2008) • Our contribution • Strategy to enrich the pronunciation dictionary and improve LM with time- and topic-relevant texttherebyusingstate-of-thearttechniques • Modules forthisstrategyareprovided in ourRapid Language Adaptation Toolkit (RLAT) (Vu et al., 2010) Unsupervised Language Model Adaptation for Automatic Speech Recognition of Broadcast News Using Web 2.0

  7. Text Collection and Decoding Strategy (1) Based on URLs in RSS Feeds, text in near temporal proximity to the current news show is collected and normalized (rss-text) rss-text-LM Unsupervised Language Model Adaptation for Automatic Speech Recognition of Broadcast News Using Web 2.0

  8. Text Collection and Decoding Strategy (2) 2. Based on the highest TF-IDF score, topic bigrams are extracted from rss-text Unsupervised Language Model Adaptation for Automatic Speech Recognition of Broadcast News Using Web 2.0

  9. Text Collection and Decoding Strategy (3) 3. With the resulting topic bigrams, appropriate Tweets are searched using the Twitter API and normalized (twitter-text)  twitter-text-LM Unsupervised Language Model Adaptation for Automatic Speech Recognition of Broadcast News Using Web 2.0

  10. Text Collection and Decoding Strategy (4) 4. Based on the most frequent words in rss-text and twitter-text, the vocabulary of the final LM is adapted Unsupervised Language Model Adaptation for Automatic Speech Recognition of Broadcast News Using Web 2.0

  11. Text Collection and Decoding Strategy (5) • rss-text-LM and twitter-text-LM are interpolated with our generic baseline LM (base-LM) • Todetermine optimal interpolation weights, the current show is decoded in a 1st pass with base-LM • Then the combination of weights is used that reduces most the perplexity (PPL) on the 1st pass hypothesis • Then: 2nd pass decodingwithcombined LM Unsupervised Language Model Adaptation for Automatic Speech Recognition of Broadcast News Using Web 2.0

  12. Corpora and Baseline LMs (1) • Radio broadcasts of the 7 a.m. news from Europe 1 • Eachshow 10-15 minutes (French) • rss-text-LM experimentsevaluated on 10 shows • 691 sentences with 22.5k running words spoken • twitter-text-LM experimentsevaluated on 5 shows • 328 sentences with 10.8k running words spoken • Subscribedthe RSS Feeds services of Le Parisien, Le Monde, France24, Le Point Unsupervised Language Model Adaptation for Automatic Speech Recognition of Broadcast News Using Web 2.0

  13. Corpora and Baseline LMs (2) • Strategyanalyzedwith 2 different baseline3-gram LMs of different quality (base-LM) • GP-LM: French LM from the GlobalPhonecorpus • Q-LM: French LM that we used in the Quaeroproject Quality of our baseline language models on the reference transcriptions of all 10 news shows Unsupervised Language Model Adaptation for Automatic Speech Recognition of Broadcast News Using Web 2.0

  14. Experiments • ASR system • Acoustic model of our KIT Quaero2010 French Speech-to-Text System (Lamel et al., 2011) • Beforevocabularyadaptation: KIT Quaero pronunciation dictionary (247k dictionary entries for 170k words) • Word error rates (WERs) (%) of our baseline systems • Q-LM: 27.45% • GP-LM: 54.48% Unsupervised Language Model Adaptation for Automatic Speech Recognition of Broadcast News Using Web 2.0

  15. Experiments – Data from RSS Feeds (1) • From which period is rss-text optimal? • Analyze rss-text from different periods WERs (%) of all shows with LMs containing RSS Feeds-based text data from different periods Unsupervised Language Model Adaptation for Automatic Speech Recognition of Broadcast News Using Web 2.0

  16. Experiments – Data from RSS Feeds (2) • Is rss-text really more relevant than other text? • … of the same amount (Ø 385k lines (for each show))? • … of a larger amount (e.g. 20M lines)? WER (%) with LMs containing RSS Feeds-related text compared to random text data Unsupervised Language Model Adaptation for Automatic Speech Recognition of Broadcast News Using Web 2.0

  17. Experiments – Data from Twitter • From rss-text, extract topic words based on TF-IDF • With topic words, search relevant French Tweets with the Twitter API (in the period from 5 days before to the date of the show) • Ø38k lines for each show Relative WER improvement for the last 5 shows Unsupervised Language Model Adaptation for Automatic Speech Recognition of Broadcast News Using Web 2.0

  18. Experiments – Vocabulary Adaptation • Best strategy with GP-LM: • Include daily on average 19k most frequent words from rss-text and twitter-text OOV rate: 13.5%  3%WER: 44.22%  36.08% (18.41% relative) • Best strategy with Q-LM: • Remove words with the lowest probability  120k • Include daily on average 1k most frequent words from rss-text and twitter-text OOV rate: 1.2%  0.3%, WER: 24.40%  24.38% (0.08% relative) Unsupervised Language Model Adaptation for Automatic Speech Recognition of Broadcast News Using Web 2.0

  19. Overview GP-LM: 52.68  35.94 Q-LM: 25.18  24.22 Relative WER improvement Unsupervised Language Model Adaptation for Automatic Speech Recognition of Broadcast News Using Web 2.0

  20. Conclusion and Future Work • We proposed an automatic strategy to adapt generic LMs and the search vocabulary to the several topics for ASR • Showedrelevanceof RSS Feeds andTweets • Embedded modules for the strategy into RLAT • Future workmayincludefurtherparadigms from Web 2.0 such as social networks or Web 3.0 (Semantic Web) to obtain time- and topic-relevant textdata Unsupervised Language Model Adaptation for Automatic Speech Recognition of Broadcast News Using Web 2.0

  21. Merci! Unsupervised Language Model Adaptation for Automatic Speech Recognition of Broadcast News Using Web 2.0

  22. References (1) Unsupervised Language Model Adaptation for Automatic Speech Recognition of Broadcast News Using Web 2.0

  23. References (2) Unsupervised Language Model Adaptation for Automatic Speech Recognition of Broadcast News Using Web 2.0

  24. Backup Unsupervised Language Model Adaptation for Automatic Speech Recognition of Broadcast News Using Web 2.0

  25. Introduction (1) • Web 2.0 • Coined 1999 (Oreilly, 2007) • Describeswebsites that use technology beyond the static pages of earlierwebsites • May allow users to interact and collaborate with each other in a social media dialogue • … in contrast to websites where people are limited to the passive viewingofcontent • Examples: • Socialnetworkingsites • Blogs • Web applications Unsupervised Language Model Adaptation for Automatic Speech Recognition of Broadcast News Using Web 2.0

  26. Introduction (2) • RSS Feeds • Small automatically generated XML files containing time-stamped URLs of the published updates • Can easily be found on almost all online news websites • Space efficient(e.g. only crawl recentnewsDailycrawledtext in ourexperiments (from Le Parisien, Le Monde, France24, Le Point) < 10MB) • Possibilitytogetdatafittingto a certain time interval Unsupervised Language Model Adaptation for Automatic Speech Recognition of Broadcast News Using Web 2.0

  27. Introduction (3) Our Rapid Language Adaptation Toolkit (RLAT) (Vu, 2010) has the goal to reduce the amount of time and effort involved in building speech processing systems for new languages and domains • advancedthemodules in RLAT • for the text normalization, • the collection of RSS Feeds together with the text on the related websites, • a TF-IDF-based topic words extraction, as well as • the opportunity for LM interpolation. Unsupervised Language Model Adaptation for Automatic Speech Recognition of Broadcast News Using Web 2.0

  28. Previous Work (1) • Although adding in-domain data is an effective mean of improving LMs (Rosenfeld, 1995), • adding out-of-domain data is not alwayssuccessful(Iyerand Ostendorf, 1999). • To retrieve relevant texts from the Web, search queries are used (Bulyko et al., 2003) (Sarikaya et al., 2005) • Extracting characteristic words of every document or web page by calculating a Term-Frequency Inverse-Document- Frequency (TF-IDF) score (Sethy et al., 2005) (MisuandKawahara, 2005) (Lecorve et al., 2008) Unsupervised Language Model Adaptation for Automatic Speech Recognition of Broadcast News Using Web 2.0

  29. Previous Work (2) • 2 pass decoding(Kemp. 1999) (Yu et al., 2000) (Lecorve et al., 2012) • Extract topic words from 1st-pass hypothesis • Pass them as a query to a Web search engine • From retrieved documents, extract a list of words to adapt pronunciation dictionary and LM for a 2nd-pass decoding • Data from different domains are combined with linear interpolation of N-grams(Vu et al., 2010) (Bulyko, 2003) • Different strategiesforvocabularyadaptationbased on wordfrequencyand time relevance(Auzanne et al., 2000) (Ohtsukiand Nguyen, 2007) Unsupervised Language Model Adaptation for Automatic Speech Recognition of Broadcast News Using Web 2.0

  30. Previous Work (3) • (Feng and Renger, 2012)adapt a general LM by interpolating it with an LM trained on normalized TV Tweets and improved ASR accuracy for a voice-enabled social TV application • (Martins, 2008)subscribesthe RSS news Feeds services of six Portuguese news channels for vocabulary and LM daily adaptation for a broadcast news ASR system Unsupervised Language Model Adaptation for Automatic Speech Recognition of Broadcast News Using Web 2.0

  31. Our Contribution (1) Strategy to improve LM and ASR quality with time- and topic-relevant texttherebyusingstate-of-thearttechniques Depending on the quality of the underlying generic baseline LM on our test data, we optimize the vocabulary adaptationtechnique Unsupervised Language Model Adaptation for Automatic Speech Recognition of Broadcast News Using Web 2.0

  32. Motivation (2) • Traditional recursive crawling • Specialized on crawling huge amounts of text data • Choosing high link depth leads to collection of older texts • Possible unnecessary content like duplicates, error pages, blank pages, advertisements (topic relevance) • Language models using data from this crawler might not be up-to-date for breaking news (time relevance) Unsupervised Language Model Adaptation for Automatic Speech Recognition of Broadcast News Using Web 2.0

  33. Traditional Text Crawling Unsupervised Language Model Adaptation for Automatic Speech Recognition of Broadcast News Using Web 2.0

  34. RSS Feeds on LeParisien Unsupervised Language Model Adaptation for Automatic Speech Recognition of Broadcast News Using Web 2.0

  35. RSS Feeds – Example Time relevance Unsupervised Language Model Adaptation for Automatic Speech Recognition of Broadcast News Using Web 2.0

  36. Corpora and Baseline LMs (1) • Radio broadcasts of the 7 a.m. news from Europe 1 • January 2011 - February 2012 • Eachshow 10-15 minutes • RSS-text-LM experimentsevaluated on 10 shows • 691 sentences with 22.5k running words spoken • twitter-test-LM experimentsevaluated on 5 shows • 328 sentences with 10.8k running words spoken • RSS Feeds from Le Parisien, Le Monde, France24, Le Point (~385k linesforeachshow (T5-T0)) • Tweets (~38k linesforeachshow (T5-T0)) Unsupervised Language Model Adaptation for Automatic Speech Recognition of Broadcast News Using Web 2.0

  37. Experiments – Data from RSS Feeds (1) • RSS parser (new tool in our RLAT) • Takes RSS Feeds, extracts the URLs with the publishing date and collects them preserving the time information • Only pages corresponding to the listed URLs are crawled • After crawling, HTML tags are removed and the text data is normalized Unsupervised Language Model Adaptation for Automatic Speech Recognition of Broadcast News Using Web 2.0

  38. Text Normalization from Tweets • Similar to (Feng and Renger, 2012): Unsupervised Language Model Adaptation for Automatic Speech Recognition of Broadcast News Using Web 2.0

  39. Experiments – Data from Twitter • From rss-text, extract topic words based on TF-IDF • With topic words, search relevant French Tweets with the Twitter API (in the period from 5 days before to the date of the show) Relative WER improvement for the last 5 shows Unsupervised Language Model Adaptation for Automatic Speech Recognition of Broadcast News Using Web 2.0

  40. Experiments – Vocabulary Adaptation (1) • For GP-LM which has a high OOV rate (13.5%), the following strategyperformsbest: • Generate a list of words present in the concatenation of rss-text and twitter-text with the corresponding number of occurrences • From this list, we remove all words that are present only once in ourtextdata • The remaining words that are still not present in the search vocabulary areadded On average 19k words are added to the vocabularyforeachshow OOV rate reducedto 3% Unsupervised Language Model Adaptation for Automatic Speech Recognition of Broadcast News Using Web 2.0

  41. Experiments – Vocabulary Adaptation (2) • Due to its considerably lower OOV rate (1.2%), we worked another strategy out for Q-LM: • Reduce words in the LM to improve the PPL by removing the words with the lowest probability • Remove those words in the decoding dictionary as well • Add the most frequent new words present in the concatenation ofrss-textandtwitter-text Optimal: New baseline vocabulary with 120k words plus the 1k most frequent words from concatenation of rss-text and twitter-text OOV rate reducedto 0.3% Unsupervised Language Model Adaptation for Automatic Speech Recognition of Broadcast News Using Web 2.0

  42. Results – Overview Unsupervised Language Model Adaptation for Automatic Speech Recognition of Broadcast News Using Web 2.0

  43. Interpolation weights Unsupervised Language Model Adaptation for Automatic Speech Recognition of Broadcast News Using Web 2.0

  44. Conclusion and Future Work (2) • Advanced the modules in RLAT • for the text normalization • the collection of RSS Feeds together with the text on the related websites, • a TF-IDF-based topic words extraction, as well as • the opportunity for LM interpolation • Future workmayincludefurtherparadigms from Web 2.0 such as social networks to obtain time- and topic-relevant textdata Unsupervised Language Model Adaptation for Automatic Speech Recognition of Broadcast News Using Web 2.0

More Related