1 / 8

Speaker Identification and Verification

Speaker Identification and Verification. Dan Burnett, Nuance 58 th IETF. Terminology. Speaker identification -- using utterances from a speaker, determine who the caller is out of a set of known speakers

labrie
Download Presentation

Speaker Identification and Verification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Speaker Identification and Verification Dan Burnett, Nuance 58th IETF

  2. Terminology • Speaker identification -- using utterances from a speaker, determine who the caller is out of a set of known speakers • Speaker verification -- using utterances from a speaker, determine whether the caller is who he/she claims to be (requires an identity claim) • Training -- using utterances from a speaker to train a unique voiceprint that can later be used to identify/verify a speaker. Applies to both SI/SV.

  3. draft-burnett-mrcpext-00.txt • Created by Nuance and Intervoice • Proposes extensions to MRCP v1 (draft-shanmugham-mrcp-04.txt) • Based originally on Nuance functionality, modified to be more general • Starting point for MRCP v2 functionality discussions • Also extensions for speaker-enrolled grammars, hotword recognition, and to the recognition resource

  4. VER-DELETE-VOICEPRINT VER-ROLLBACK GET-PARAMS SET-PARAMS VERIFY VER-FROM-BUFFER* Proposed SI/SV process(simplified, see section 6.7) VER-START-SESSION VER-BUFFERING-START VER-SET-VOICEPRINT VER-BUFFERING-CONTROL VER-FROM-BUFFER* VER-BUFFERING-STOP VER-END-SESSION * Requires active buffering and ver/id sessions.

  5. Discussion points • Why buffering? • Registry for return info • Anything else before I convert to MRCPv2?

  6. Voice/Text Grammar Enrollment(simplified, see section 5.5) START-ENROLLMENT-SESSION • Extension to existing recognition resource • Creates speaker-produced grammar entries • E.g., voice-enrolled entries for voice dialing • Both speech and text can be used to create grammar entries PAUSE/RESUME-ENROLLMENT-SESSION ENROLLMENT-ROLLBACK RECOGNIZE/STOP* ADD/DELETE/MODIFY-PHRASE END/ABORT-ENROLLMENT-SESSION * These methods already exist in the recognizer resource

  7. Hotword(see section 7) • New recognition resource • Instead of listening for a set time period, listens continuously until it matches a grammar • Non-matching speech is ignored and does not affect the state of the recognizer

  8. Other Extensions • Record method (sec. 4.4) • Allows end-pointed recording of an audio stream • Interpret method (sec. 4.5) • Behaves as a recognition except that text input is given instead of an audio stream. It returns a standard recognition result minus any audio-specific values.

More Related