IRCS/CCN Summer Workshop June 2003 Speech Recognition

IRCS/CCN Summer WorkshopJune 2003Speech Recognition

Why is perception hard? • Task: available signals → model of the world around • signals are mostly accidental, inadequate • sometimes disguised or falsified • always mixed-up and ambiguous • Reasoning about the source of signals: • Integration of context: what do you expect? • “Sensor fusion”: integration of vision, sound, smell etc. • Source (and noise) separation: there’s more than one thing out there • Variable perspective, source variation etc. • depends on the type of signal • depends on the type of object • Much harder than chess or calculus!

Bayesian probability estimation • Thomas Bayes (1702-1761) • Minister of the Presbyterian Chapel at Tunbridge Wells • Amateur mathematician • Essay towards solving a problem in the doctrine of chances,published (posthumously) in 1764 • Crucial idea: background (prior) knowledge about the plausibility of different theoriescan be combined with knowledge aboutthe relation of theories to evidence • in a mathematically well-defined way • even if all knowledge is uncertain • to reason about the most likely explanation of the available evidence • Bayes’ theorem • “the most important equation in the history of mathematics” (?) • a simple consequence of basic definitions, or • a still-controversial recipe for the probability of alternative causes for a given event, or • the implicit foundation of human reasoning • a general framework for solving the problems of perception Tutorial on Bayes’ Theorem

Fundamental theoremof speech recognition P(W|S) ∝ P(S|W)P(W) where W is “Word(s)” (i.e. message text) S is “Sound(s)” (i.e. speech signal) “Noisy channel model” of communications engineeringdue to Shannon 1949 New algorithms, especially relevant to speech recognition due to L.E. Baum et al. ~ 1965-1970 Applied to speech recognition by Jim Baker (CMU PhD 1975), Fred Jelinek (IBM speech group >>1975)

Motivations for a Bayesian approach • A consistent framework for integrating previous experience and current evidence • A quantitative model for “abduction” = reasoning about the best explanation • A general method for turning a generative model into an analytic one = “analysis by synthesis” helpful where |categories| << |signals| These motivations apply both in engineering practice and in the evolution of biological systems

Basic architecture of standard speech recognition technology 1. Bayes’ Rule: P(W|S) ∝ P(S|W)P(W) 2. Approximate P(S|W)P(W) as a Hidden Markov Model a probabilistic function [ to get P(S|W)] of a markov chain [ to get P(W) ] 3. Use Baum/Welch (=EM) algorithm to “learn” HMM parameters 4. Use Viterbi decoding to find the most probable W given S in terms of the estimated HMM

HMM parameter estimation given labelled/aligned training data...

Viterbi decoding given HMM & observed signal...

Sketch of Baum-Welch (EM) algorithm for estimating HMM parameters given unaligned (or even unlabelled) training data

Other typical details:Complex elaborations of the basic ideas • HMM states ← triphones ← words • each triphone → 3-5 states + connection pattern • phone sequence from pronuncing dictionary • clustering for estimation • Acoustic features • RASTA-PLP etc. • Vocal tract length normalization, speaker clustering • Output pdf for each state as mixture of gaussians • Language model as N-gram model over words • recency/topic effects • Empirical weighting of language vs. acoustic models • etc. etc.

Some limitations of the standard architecture • Problems with Markovian assumptions • Modeling trajectory effects • Variable coordination of articulatory dimensions • ....

IRCS/CCN Summer Workshop June 2003 Speech Recognition

IRCS/CCN Summer Workshop June 2003 Speech Recognition

Presentation Transcript

An overview of the SPHINX Speech Recognition System

Spoken Dialogue Systems

HTK Tutorial

Use of Sound in Games

Revenue Recognition

Speech Processing Text to Speech Synthesis

Conditional Random Fields for Automatic Speech Recognition

Welcome to Speech “where you’ll wear many hats…”

Natural Language Generation An Introductory Tour

Articulatory Feature-Based Speech Recognition

Speech Processing

EEA Workshop 2 June 19, 2014

Design and Implementation of Speech Recognition Systems

Introduction to Pattern Recognition Chapter 1 ( Duda et al.)

OPTI_ENERGY

A Tutorial on Bayesian Speech Feature Enhancement

Centre of Excellence Workshop Kiev - 2003

Design and Implementation of Speech Recognition Systems

Sequence Scoring Experiments Using the TIMIT Corpus and the HTK Recognition Framework

Conditional Random Fields for Automatic Speech Recognition