1 / 18

INTERSPEECH 2013 A Survey of Language Modeling

INTERSPEECH 2013 A Survey of Language Modeling. Department of Computer Science & Information Engineering National Taiwan Normal University. 報告者:郝柏翰. Unsupervised Discriminative Language Modeling Using Error Rate Estimator.

esme
Download Presentation

INTERSPEECH 2013 A Survey of Language Modeling

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. INTERSPEECH 2013A Survey of Language Modeling Department of Computer Science & Information Engineering National Taiwan Normal University 報告者:郝柏翰

  2. Unsupervised Discriminative Language Modeling Using Error Rate Estimator 1Takanobu Oba, 2Atsunori Ogawa, 2Takaaki Hori, 1Hirokazu Masataki, 2Atsushi Nakamura 1NTT Media Intelligence Laboratories, NTT Corporation, Yokosuka, Japan 2NTT Communication Science Laboratories, NTT Corporation, Kyoto, Japan

  3. Introduction • Discriminative language modeling requires a large amount of spoken data and manually transcribed reference text for training. • This paper proposes an unsupervised training method to overcome this handicap. • The key idea is to use an error rate estimator, instead of calculating the true error rate from reference. • In standard supervised approaches, the true error rate is used only for finding the Oracle, the minimum error rate hypothesis, and for prioritizing the competing hypothesis for weight learning.

  4. Conditional Random Field • A conditional random field (CRF) is a type of discriminative undirected probabilistic graphical model. It is most often used for labeling or parsing of sequential data, such as natural language text or biological sequences and computer vision.

  5. Error Rate Estimator • While the confidence measure classifies each word into correct or incorrect classes, the error rate estimator undertakes error type classification, in which each word is given one of four labels, (C), (S), (D) and (I), which denote correct recognition, substitution error, deletion error and insertion error, respectively.

  6. Experiments • The baseline WER 27.8% was decreased by applying the DLM trained using the error rate estimator with all learning algorithms. With PA1 and MERT, their gains were the same as thesupervised approach. Since the error reduction rate was significantly degraded with the other algorithms, its gain was slight

  7. Experiments • the relationship between training data size and WER is depicted in Fig. 2. The data size is given as the number of utterances. The learning algorithm we employed here was R2D2. • The WER was decreased by increasing the training data size.

  8. Conclusions • This paper proposed an unsupervised DLM training method, which is basically the same as the conventional supervised approach except that it estimates the error rate instead of calculating the true error rate. • Our experiments showed that our method could basically match the performance of the conventional supervised approach and any slight degradation in accuracy can be offset by increasing the unlabeled utterance data used for training.

  9. Unsupervised Topic Adaptation for Morph-Based Speech Recognition 1Andr´e Mansikkaniemi, 2Mikko Kurimo 1Aalto University, School of Science, Department of Information and Computer Science 2Aalto University, School of Electrical Engineering, Department of Signal Processing and Acoustics

  10. Introduction • Statistical morph-based automatic speech recognition (ASR) enables the recognition of an unlimited number of words. For a morphologically rich language such as Finnish, morph-based ASR has improved recognition accuracy significantly • A large error source that still remains an unsolved problem for Finnish ASR is foreign entity names (FENs) • Over-segmentation of foreign words (specific for statistical morph-based models) • Examples: mcafee → m + cafe + e, reading → re + a + ding • Makes pronunciation modeling difficult and unreliable

  11. Introduction • Topic adaptation in automatic speech recognition (ASR) refers to the adaptation of language model and vocabulary for improved recognition of in-domain speech data. • In this work we implement unsupervised topic adaptation for morph-based ASR, to improve recognition of foreign entity names. • Based on first-pass ASR hypothesis similar texts are selected from a collection of articles, which are used to adapt the background language model. Latent semantic indexing is used to index the adaptation corpus and ASR output.

  12. Architecture

  13. Unsupervised LM adaptation • Latent semantic indexing (LSI) transforms the document-word matrix into lower space using singular value decomposition. The benefit of LSI is dimensionality reduction and tying together of words with similar meaning. • The vector components are word weights, indicating the importance of an individual word to a document. Word weights are typically calculated using the term frequency-inverse document frequency (tf-idf) measure.

  14. Foreign word detection • Vocabulary adaptation is performed byfirstdetecting topic specific foreign word candidates in in-domain texts. Thereafter adapted pronunciation rules are generated for the words. Finally foreign words that are over-segmented are restored back into their base forms. • First selection score is based on how foreignthe word appears to be. For this we calculate the letter perplexity (ppl) for each FEN candidate using a letter-ngram model trained on Finnish words. • An alternative to measure topicrelatedness is to use the cosine similarity score (sim) between the ASR output and the document in which the word appears in.

  15. Experiments • The background language model was trained on the Kielipankki corpus (140 million words). • Language model adaptation data was collected from the Web. Articles were retrieved automatically from the Web from a set of pre-defined Finnish online news sources. In total over 44 000 Finnish news articles were retrieved (over 7 million words). • First experiments were conducted on the 1 hour development set. We tested different parameter values and indexing terms for unsupervised LM adaptation and different selection scores for vocabulary adaptation. The word error rate (WER) and foreign entity name error rate (FENER) are reported in the results.

  16. Experiments • Indexing terms

  17. Experiments • Vocabulary adaptation

  18. Conclusions • The focus in this work was to improve recognition of foreign entity names which make up a large error source in Finnish ASR. • Vocabulary adaptation is successful in efficiently lowering error rates for foreign words, but even in the best case average WER is only lowered slightly. • The main aim of this work was to improve recognition of foreign words for Finnish morph-based ASR. The implemented unsupervised topic adaptation framework was successful in significantly lowering error rates for foreign words.

More Related