1 / 19

SPEECH APIs

SPEECH APIs. !. Provide access to vendor’s speech synthesis speech recognition command-and-control speech recognition The programmer defines a restricted grammar/vocabulary himself dictation speech recognition

sheng
Download Presentation

SPEECH APIs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SPEECH APIs ! Provide access to vendor’s • speech synthesis • speech recognition • command-and-control speech recognition • The programmer defines a restricted grammar/vocabulary himself • dictation speech recognition • The programmer uses the general (statistical) built-in grammar of the recogniser (optimised for a topic/domain)

  2. Available Speech APIs: 1. SAPI • SAPI by Microsoft+vendors (IBM etc.) • cross-vendor api • Platform: Windows 95/98 or Windows NT 4.0 (or later) • Microsoft Visual C++ 4.0 or later • NB: Try it out with MS Whisper (free!)

  3. Available Speech APIs: 2. JSAPI ! • JSAPI by Sun Microsystems+vendors (Apple Computer, Inc, AT&T, Dragon Systems, IBM, Novell. Inc. Philips, Texas Instruments Incorporated) • cross-vendor api • cross platform api • programming via JAVA • NB: Try it out with ViaVoice for Linux (free!)

  4. Available Speech APIs: 3. VOCAPI • VOCAPI by Philips, Bosch, Siemens, Opel, Sony, Volkswagen ...) • cross-vendor small-sized api • cross platform api intended for PDAs, hands-free operation in cars etc. • programming via C

  5. JSAPI & JSGF ! • Java Speech Grammar Format • central for controlling speech recognition in JSAPI • platform-independent, vendor-independent • language-independent (… largely!) • corresponds to/enhances the “CFG format” defined in SAPI • enhancements: Java-style notations (see below)

  6. JSGF Programming Issues ! • loading, creating, deleting of grammars in a speech recognizer, activation of grammars for recognition etc. • Loading grammars via URLs on a web site. • Mechanisms for receiving results of recognition for a grammar and processing of those results. • Vocabulary management including handling of token pronunciations.

  7. JSGF for speech recognition ! • To be used for “rule grammars” (or “command and control grammars” or “regular grammars” • non-statistical*, small vocabulary, low-perplexity, domain/application dependent, for spoken dialogues • Not to be used for “dictation grammars” • statistical, large vocabulary, high-perplexity, domain/application independent (may be optimised for a “topic”), for dictation, presupposes adaptation * (However: see slide about weights)

  8. JSGF notation 1.(4.1. ff.) ! • BNF-equivalent, traditional style: • Non-terminals (“rule names”) enclosed in <> • Terminals (“tokens”, “words”) in Unicode characters • Operators for ‘or’, ‘iteration’, ‘optional’ etc. E.g. <firstname> = John | Peter | Mary+; <firstnames> = (John | Peter | Mary)+;

  9. JSGF notation 2. (3.1 ff.+4.9) ! • JAVA adapted style: • JSGF header: grammar name/import grammar dk.mydomain.emailapplication.mailBrowser import <dk.mydomain.ReusableGrammars.date> or import <dk.mydomain.ReusableGrammars.Danish.*> • documentation comments p. 9+ 22 (4.9) /** - */

  10. JSGF notation 3 (4.1. ff.) ! • JAVA adapted style (cont.): • public rules vs. non-public (“private”) rules • the Rule Name of a public rule is (one of the) start symbol(s) of the grammar, can be activated: public <s> = <np> <vp>; <np>=<det><n>; <n>=man | woman | bird; • public rules can be imported into other grammars

  11. JSGF Weights (4.2.3) ! • Weights enable the representation of probabilistic grammars (e.g. bigrams, trigrams) in JSGF <size> = /10/ small | /2/ medium | /1/ large; equivalent to probabilities <size> = /10/13/ small | /2/13/ medium | /1/13/ large;

  12. JSGF Weights (4.2.3) ! • Example: A bigram implemented in JSGF • One rule per word (including a pseudo-word BOS “beginning of sentence”) • A rule expansion define the successors of the word associated with the rule, e.g. <successors_of_a> = /5/ man < successors_of_ man> | /4/ woman < successors_of_ woman> | …etc;

  13. JSGF Tags (4.5) ! • Enable primitive “parsing” along with recognition: • handling synonymy: <country> = Australia {Oz} | (United States) {USA} | America {USA} | (U S of A) {USA};

  14. JSGF Tags (4.5.1) ! • separating language specific issues (the actual phrase) from “universal meanings” (“hi”): <greeting>= (howdy | good morning) {hi}; <greeting>= (ohayo | ohayogozaimasu) {hi}; <greeting>= (guten tag) {hi}; <greeting>= (bon jour) {hi};

  15. JSGF Recursions (4.7) ! • Left recursion: not allowed • (could be rewritten as iteration in a regular grammar) • Embedded recursion: not allowed • for Chomsky a very serious restriction! • Right recursion: allowed • (can be rewritten as iteration in a regular grammar) • Likely explanation: Speech recognition presupposes regular (finite state) grammars

  16. JSAPI Recognition Results ! Interface javax.speech.recognition.FinalRuleResult Interface javax.speech.recognition.Result • 1-best list/n-best-list, for each item in list: • list of tokens (“words”) • list of tags • name of grammar accepting input • name of public rule accepting input

  17. Hello World (IBMs JSAPI) ! [cf. grammar on next slide] U: My name is Bruce Adams (rule: nameis, tags: Bruce Adams) S: Hello Bruce Adams U: Repeat after me (rule: begin, tags: begin) S: I am listening (activates dictation grammar+stop-rule) U/S: [S repeats/synthesises sentences dictated by U] U: That’s all (rule: stop, tags: stop) S:OK (deactivates dictation grammar) U: Bye (rule: bye, tags: bye)

  18. Hello World (IBMs JSAPI) ! <first> = Bruce {Bruce}|Andrew{Andrew}|Stuart {Stuart}; <last> = Lucas {Lucas}| Hunt {Hunt}| Adams {Adams}; <name> = <first> <last>; public <nameis> = My name is {name} <name>; public <begin> = Repeat after me {begin}; public <stop> = That's all{stop}; public <bye> = Good bye {bye} | So long {bye};

  19. Exercise ! • Try to “review” JSGF • weak points/strong points • to which extent can it be used for “parsing” (retrieving useful semantics) • resolving lexical ambiguity • resolving structural ambiguity

More Related