1 / 14

Unsupervised Training Using Large Amounts of Arabic Broadcast News Audio Data

Unsupervised Training Using Large Amounts of Arabic Broadcast News Audio Data. Jeff Ma, Spyros Matsoukas, Richard Schwartz BBN Technologies. Outline. Brief overview of previous work Efforts to improve data selection An explicit method for hypothesis confidence estimation

Download Presentation

Unsupervised Training Using Large Amounts of Arabic Broadcast News Audio Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Unsupervised Training Using Large Amounts of Arabic Broadcast News Audio Data Jeff Ma, Spyros Matsoukas, Richard Schwartz BBN Technologies

  2. Outline • Brief overview of previous work • Efforts to improve data selection • An explicit method for hypothesis confidence estimation • Use of more features • Use of neural networks • Alleviation of over-fitting problem • Improvements after MMI training

  3. Unsupervised training framework Seed data (with trans.) Model-training Data selected Models Data-selecting Data (no trans.) Data decoded Decoding

  4. Previous work • Results • Relative gains: 10.8% on 1,858 hours of Arabic BN data; 21.7% on 1,900 hours of English BN data • Diagnoses • High percentage of non-news data is the major factor resulting in the relatively poor performance • Manual inspection of 6 Arabic episodes revealed that more than half of the data is non-news speech • Efforts to improve data selection • Methods • Incremental training (multiple passes) • LM perplexity-based episode-removing • Additional gains • But, still about half of the gain obtained on the English data

  5. Data-selection • The data-selection procedure • First, estimate confidence for hypotheses • Then, select a hypothesis if its confidence is above a threshold – confidence threshold • The old (implicit) estimation method for confidence • As a weighted average of its words’ confidence • A drawback • No direct link to accuracies of hypotheses • Two hypotheses have the same average word confidence scores but could have significantly different accuracies

  6. An explicit estimation method • Not crucial that selected data must have no errors • One hypothesis is considered safe to be added into training if its accuracy is above an accuracy threshold • Estimate explicit confidence that one hypothesis’ accuracy is above the threshold • An explicit estimation procedure • Define one hypothesis as correct (target value =1) if its accuracy is above the accuracy threshold and as wrong if not (target value = 0) • Extract hypothesis-related features • Train confidence models using the features and the target values

  7. An initial comparison – explicit vs. implicit • All experiments used the 1,858-hour Arabic data (1,570 hours remain after our automatic audio segmentation) • A comparison • For “implicit”, the confidence threshold was previously tuned • For “explicit”, the accuracy threshold was set to 0.7, the two thresholds were not tuned • The new explicit method produces better performance (0.3%)

  8. Addition of new features • New features • 2 lattice features, num_of_nodes_per_word and num_of_arcs_per_word • LM perplexities (PPL) of episodes • Improvements on normalized cross entropy (NCE)

  9. Addition of new features (cont’d) • Improvements on word error rate (WER) • WERs were measured on “h4ad04” test set • Thresholds were tuned for each scenario separately

  10. Use of neural networks • Neural network (NN) training • Used Matlab NN toolbox (resilient back-propagation algorithm) • Dev set: h4ad05 (2059 segments); Val set: h4av05 • Suffered from over-fitting due to lack of training data

  11. Efforts to alleviate over-fitting Note: • Changed to a new 6-hour dev set – bnat05 – set up for the GALE evaluation • Changed to a morpheme-based Arabic system, which reduced out-of-vocabulary (OOV) rate substantially • First try: use top-n (n>1) hypotheses • Reduced over-fitting, but didn’t gain on WER

  12. Efforts to alleviate over-fitting (cont’d) • Second try: use more data • A larger dev set – “bncat05” – by adding 2 hours of broadcast conversational data to the “bnat05” set • Also, reduced over-fitting, but produced no gain on WER

  13. Improvements after MMI training • Trained MMI-SI and MMI-SAT models with the best unsupervised training setting • After adaptation, the unsupervised training produces a relative gain 7.2% (16.7 → 15.5, the last two rows) on MML models, which is slightly smaller than the gain on ML models – a relative gain 8.0% (17.6 → 16.2, the first two rows) • Most of the gain from unsupervised training remained after MMI training

  14. Conclusion • The new explicit data-selecting method outperforms the old method and results in 0.4% extra WER reduction from unsupervised training • The use of the lattice and perplexity features improves the confidence estimation and produces 0.2-0.3% further WER reduction • No significant benefit from using neural networks in the confidence estimation • Both the use of top-n (n>1) hypotheses and the use of more data alleviate over-fitting slightly (3-5% NCE improvements), but produce no WER reduction • Most of the gain from our unsupervised training remains after the MMI training

More Related