00:00

Discriminative Speech Recognition Rescoring with Pre-trained Language Models

This study explores discriminative fine-tuning schemes for pre-trained language models (BERT and GPT2) in the context of speech recognition rescoring. The approach involves optimizing the minimum word-error rate criterion and utilizing discriminative training to improve the rescoring process for ASR. The experimental setup includes LibriSpeech data, Whisper tiny ASR model, and BERT/GPT2 rescoring models with results showcasing the effectiveness of the proposed methods.

donayre
Download Presentation

Discriminative Speech Recognition Rescoring with Pre-trained Language Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Discriminative Speech Recognition Rescoring with Pre-trained Language Models Prashanth Gurunath Shivakumar, Jari Kolehmainen, Yile Gu, Ankur Gandhe, Ariya Rastrow, Ivan Bulyko Amazon Alexa AI, USA ASRU 2023

  2. Introduction • Discriminative training, directly optimizing the minimum word-error-rate (MWER) criterion typically improves rescoring • They propose and explore several discriminative fine-tuning schemes for pre-trained LMs (BERT and GPT2)

  3. Non-Discriminative ASR Rescoring New score of ??ℎASR hypothesis: ??= log????? + ?log?A??|?? Likelihood score from rescoring LM sequence probability from 1st pass acoustic model

  4. Non-Discriminative ASR Rescoring: GPT2 New score of ??ℎASR hypothesis: ??= log????? + ?log?A??|?? Likelihood score from rescoring LM sequence probability from 1st pass acoustic model GPT2: ? ????? = ? ??,?|??,1,…,??,?−1 ?=1

  5. Non-Discriminative ASR Rescoring: BERT New score of ??ℎASR hypothesis: ??= log????? + ?log?A??|?? Likelihood score from rescoring LM sequence probability from 1st pass acoustic model BERT: pseudo-log-likelihood ? ????? = − log?(??,?|??,\t) ?=1

  6. Discriminative training for ASR • Minimum word error rate (MWER) loss • Minimize ASR expected word error rate ??????,?∗= ? ℰ ?,?∗ ?(?|?)ℰ ?,?∗ = ? edit distance between the ASR hypothesis ? and groundtruth transcript ?∗ probability of ASR hypothesis ? given s????? ? • approximated by restricting the sequence probability over the n-best hypothesis ? ??????,?∗= ?(??|?)ℰ ??,?∗ ?=1

  7. Discriminative ASR Rescoring: GPT2 + MWER • The previous MWER loss for discriminatively training can be applied for 2nd pass rescoring model • Minimize rescoring model expected word error rate probability of choosing ASR hypothesis ?? after rescoring ? ??????,?∗= ?(??|?)ℰ ??,?∗ ?=1 ? ??? ? ???ℰ ??,?∗ = ?=1 ?=1 New score of ??ℎASR hypothesis after rescoring: ? ??= log( ? ??,?|??,1,…,??,?−1) + ?log?A??|?? ?=1

  8. Discriminative ASR Rescoring: BERT + MWER • The previous MWER loss for discriminatively training can be applied for 2nd pass rescoring model • Minimize rescoring model expected word error rate ? probability of choosing ASR hypothesis ?? after rescoring ??????,?∗= ?(??|?)ℰ ??,?∗ ?=1 ? ??? ? ???ℰ ??,?∗ = ?=1 ?=1 New score of ??ℎASR hypothesis after rescoring: ? ??= − log?(??,?|??,\t) + ?log?A??|?? ?=1

  9. Discriminative ASR Rescoring: RescoreBERT ? ??????,?∗= ?(??|?)ℰ ??,?∗ ?=1 ? ??? ? ???ℰ ??,?∗ = ?=1 ?=1 New score of ??ℎASR hypothesis after rescoring: ??= log????? + ?log?A??|??

  10. Discriminative ASR Rescoring: RescoreGPT ? ??????,?∗= ?(??|?)ℰ ??,?∗ ?=1 ? ??? ? ???ℰ ??,?∗ = ?=1 ?=1 New score of ??ℎASR hypothesis after rescoring: ??= log????? + ?log?A??|??

  11. Discriminative ASR Rescoring: Attention Pooling Q?? log????? = ??(softmax ?) ?? Q = ??? ? = ??? ? = ??? ? = ℎ1,…,ℎ? : hidden output embeddings ?, ??, ??, ??: learnable weights

  12. Dataset and Experimental Setup • Data • LibriSpeech 1000 hours • ASR • Whisper tiny (Transformer, 39M) • Generate 10best • Rescoring model • BERT (110M) • GPT2 (117M)

  13. Results

More Related