1 / 96

Speech Processing

Speech Processing. Speaker Recognition. Speaker Recognition. Definitions: Speaker Identification: For given a set of models obtained for a number of known speakers, which voice models best characterizes a speaker. Speaker Verification:

warda
Download Presentation

Speech Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Speech Processing Speaker Recognition

  2. Speaker Recognition • Definitions: • Speaker Identification: • For given a set of models obtained for a number of known speakers, which voice models best characterizes a speaker. • Speaker Verification: • Decide whether a speaker corresponds to a particular known voice or to some other unknown voice. • Claimant – an individual who is correctly posing as one of known speakers • Impostor – unknown speaker who is posing as a known speaker. • False Acceptance • False Rejection Veton Këpuska

  3. Speaker Recognition • Steps in Speaker Recognition: • Model Building • For each target speaker (claimant) • A number of background (imposter) speakers • Speaker-Dependent Features • Oral and Nasal tract length and cross section during different sounds • Vocal fold mass and shape • Location and size of the false vocal folds • Accurately measured from the speech waveform. • Training Data + Model Building Procedure ⇨Generate Models Veton Këpuska

  4. Speaker Recognition • In practice difficult to derive speech anatomy features from the speech waveform. • Use conventional methods to extract features: • Constant-Q filter bank • Spectral Based Features Veton Këpuska

  5. Speaker Recognition System Training Speech Data Linda Target & BackgroundSpeaker Models Training FeatureExtraction Kay Joe Testing Speech Data Decision: TomNot Tom Recognition FeatureExtraction Tom Veton Këpuska

  6. Spectral Features for Speaker Recognition • Attributes of Human Voice: • High-level – difficult to extract from speech waveform: • Clarity • Roughness • Magnitude • Animation • Prosody – pitch intonation, articulation rate, and dialect • Low-level – easy to extract from speech waveform: • Vocal tract Spectrum • Instantaneous pitch • Glottal flow excitation • Source Event Onset Times • Modulations in Format Trajectories Veton Këpuska

  7. Spectral Features for Speaker Recognition • Want the feature set to reflect the unique characteristics of a speaker. • The short-time Fourier transform (STFT): • STFT Magnitude: • Vocal tract resonances • Vocal tract anti-resonances – important for speaker identifiability. • General trend of the envelope of the STFT Magnitude is influenced by the coarse component of the glottal flow derivative. • Fine structure of STFT characterized by speaker-dependent features: • Pitch • Glottal-flow • Distributed Acoustic Effects Veton Këpuska

  8. Spectral Features for Speaker Recognition • Speaker Recognition Systems use smooth representation of the STFT magnitude: • Vocal tract resonances • Spectral Tilt • Auditory-based features superior to the conventional features: • All-pole LPC spectrum • Homomorphic filtered spectrum • Homomorphic prediction, etc. Veton Këpuska

  9. Mel-Cepstrum • Davies & Mermelstein: Veton Këpuska

  10. Short-Time Fourier Analysis (Time-Dependent Fourier Transform) Veton Këpuska

  11. Rectangular Window Veton Këpuska

  12. Hamming Window Veton Këpuska

  13. Comparison of Windows Veton Këpuska

  14. Comparison of Windows (cont’d) Veton Këpuska

  15. A Wideband Spectrogram Veton Këpuska

  16. A Narrowband Spectrogram Veton Këpuska

  17. Discrete Fourier Transform • In general, the number of input points, N, and the number of frequency samples, M, need not be the same. • If M>N , we must zero-pad the signal • If M<N , we must time-alias the signal Veton Këpuska

  18. Examples of Various Spectral Representations Veton Këpuska

  19. Cepstral Analysis of Speech • The speech signal is often assumed to be the output of an LTI system; i.e., it is the convolution of the input and the impulse response. • If we are interested in characterizing the signal in terms of the parameters of such a model, we must go through the process of de-convolution. • Cepstral, analysis is a common procedure used for such de-convolution. Veton Këpuska

  20. Cepstral Analysis • Cepstral analysis for convolution is based on the observation that: x[n]= x1[n] * x2[n] ⇒ X (z)= X1(z)X2(z) By taking the complex logarithm of X(z), then log{X (z)} =log{X1(z)} + log{X2(z)} = • If the complex logarithm is unique, and if is a valid z-transform, then The two convolved signals will be additive in this new, cepstral domain. • If we restrict ourselves to the unit circle, z = ej, then: • It can be shown that one approach to dealing with the problem of uniqueness is to require that arg{X(ejω)} be a continuous, odd, periodic function of ω. Veton Këpuska

  21. Cepstral Analysis (cont’d) ^ • To the extent that X(z)=log{X(z)} is valid, • It can easily be shown that c[n] is the even part of x[n]. • If x[n] is real and causal then x[n], be recovered from c[n]. This is known as the Minimum Phase condition. ^ ^ ^ Veton Këpuska

  22. Mel-Frequency Cepstral Representation(Mermelstein & Davis 1980) • Some recognition systems use Mel-scale cepstral coefficients to mimic auditory processing. (Mel frequency scale is linear up to 100 Hz and logarithmic thereafter.) This is done by multiplying the magnitude (or log magnitude) of S(ej) with a set of filter weights as shown below: Veton Këpuska

  23. References • Tohkura, Y., “A Weighted Cepstral Distance Measure for Speech Recognition," IEEE Trans. ASSP, Vol. ASSP-35, No. 10, 1414-1422, 1987. • Mermelstein, P. and Davis, S., “Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences," IEEE Trans. ASSP, Vol. ASSP-28, No. 4, 357-366, 1980. • Meng, H., The Use of Distinctive Features for Automatic Speech Recognition,SM Thesis, MIT EECS, 1991. • Leung, H., Chigier, B., and Glass, J., “A Comparative Study of Signal Represention and Classi.cation Techniques for Speech Recognition," Proc. ICASSP,Vol.II, 680-683, 1993. Veton Këpuska

  24. Classifier Feature Extraction Classi Feature Vectorsx Observations Pattern Classification • Goal: To classify objects (or patterns) into categories (or classes) • Types of Problems: • Supervised: Classes are known beforehand, and data samples of each class are available • Unsupervised: Classes (and/or number of classes) are not known beforehand, and must be inferred from data Veton Këpuska

  25. Probability Basics • Discrete probability mass function (PMF): P(ωi) • Continuous probability density function (PDF): p(x) • Expected value: E(x) Veton Këpuska

  26. Kullback-Liebler Distance • Can be used to compute a distance between two probability mass distributions, P(zi), and Q(zi) • Makes use of inequality log x ≤ x - 1 • Known as relative entropy in information theory • The divergence of P(zi) and Q(zi) is the symmetric sum Veton Këpuska

  27. Bayes Theorem • Define: Veton Këpuska

  28. From Bayes Rule: Where: Bayes Theorem Veton Këpuska

  29. Bayes Decision Theory • The probability of making an error given x is: P(error|x)=1-P(i|x) if decide class i • To minimize P(error|x) (and P(error)): Choose i if P(i|x)>P(j|x) ∀j≠i • For a two class problem this decision rule means: Choose 1 if else 2 • This rule can be expressed as a likelihood ratio: Veton Këpuska

  30. Bayes Risk • Define cost function λijand conditional risk R(ωi|x): • λijis cost of classifying x as ωi when it is really ωj • R(ωi|x) is the risk for classifying x as class ωi • Bayes risk is the minimum risk which can be achieved: • Choose ωiifR(ωi|x) < R(ωj|x) ∀i≠j • Bayes risk corresponds to minimumP(error|x)when • All errors have equal cost (λij = 1, i≠j) • There is no cost for being correct (λii = 0) Veton Këpuska

  31. Discriminant Functions • Alternative formulation of Bayes decision rule • Define a discriminant function, gi(x), for each class ωi Choose ωiif gi(x)>gj(x)∀j = i • Functions yielding identical classiffication results: gi(x) = P(ωi|x) = p(x|ωi )P(ωi) = log p(x|ωi)+log P(ωi) • Choice of function impacts computation costs • Discriminant functions partition feature space into decision regions, separated by decision boundaries. Veton Këpuska

  32. Density Estimation • Used to estimate the underlying PDF p(x|ωi) • Parametric methods: • Assume a specific functional form for the PDF • Optimize PDF parameters to fit data • Non-parametric methods: • Determine the form of the PDF from the data • Grow parameter set size with the amount of data • Semi-parametric methods: • Use a general class of functional forms for the PDF • Can vary parameter set independently from data • Use unsupervised methods to estimate parameters Veton Këpuska

  33. Parametric Classifiers • Gaussian distributions • Maximum likelihood (ML) parameter estimation • Multivariate Gaussians • Gaussian classifiers Veton Këpuska

  34. Gaussian Distributions • Gaussian PDF’s are reasonable when a feature vector can be viewed as perturbation around a reference • Simple estimation procedures for model parameters • Classification often reduced to simple distance metrics • Gaussian distributions also called Normal Veton Këpuska

  35. Gaussian Distributions: One Dimension • One-dimensional Gaussian PDF’s can be expressed as: • The PDF is centered around the mean • The spread of the PDF is determined by the variance Veton Këpuska

  36. Maximum Likelihood Parameter Estimation • Maximum likelihood parameter estimation determines an estimate θ for parameter θ by maximizing the likelihoodL(θ) of observing data X = {x1,...,xn} • Assuming independent, identicallydistributed data • ML solutions can often be obtained via the derivative: ^ Veton Këpuska

  37. Maximum Likelihood Parameter Estimation • For Gaussian distributions log L(θ) is easier to solve Veton Këpuska

  38. Gaussian ML Estimation: One Dimension • The maximum likelihood estimate for μ is given by: Veton Këpuska

  39. Gaussian ML Estimation: One Dimension • The maximum likelihood estimate for σ is given by: Veton Këpuska

  40. Gaussian ML Estimation: One Dimension Veton Këpuska

  41. ML Estimation: Alternative Distributions Veton Këpuska

  42. ML Estimation: Alternative Distributions Veton Këpuska

  43. Gaussian Distributions: Multiple Dimensions (Multivariate) • A multi-dimensional Gaussian PDF can be expressed as: • d is the number of dimensions • x={x1,…,xd} is the input vector • μ= E(x)= {μ1,...,μd} is the mean vector • Σ= E((x-μ )(x-μ)t) is the covariance matrix with elements σij, inverse Σ-1 , and determinant |Σ| • σij= σji= E((xi- μi)(xj- μj)) = E(xixj) - μiμj Veton Këpuska

  44. Gaussian Distributions: Multi-Dimensional Properties • If the ithand jthdimensions are statistically or linearly independent then E(xixj)= E(xi)E(xj) and σij=0 • If all dimensions are statistically or linearly independent, then σij=0 ∀i≠j and Σ has non-zero elements only on the diagonal • If the underlying density is Gaussian and Σ is a diagonal matrix, then the dimensions are statistically independent and Veton Këpuska

  45. Diagonal Covariance Matrix:Σ=σ2I Veton Këpuska

  46. Diagonal Covariance Matrix:σij=0 ∀i≠j Veton Këpuska

  47. General Covariance Matrix: σij≠0 Veton Këpuska

  48. Multivariate ML Estimation • The ML estimates for parameters θ = {θ1,...,θl } are determined by maximizing the joint likelihood L(θ) of a set of i.i.d. data x = {x1,..., xn} • To find θ we solve θL(θ)= 0, or θ log L(θ)= 0 • The ML estimates of  and  are ^ Veton Këpuska

  49. Multivariate Gaussian Classifier • Requires a mean vector i, and a covariance matrix Σifor each of M classes {ω1, ··· ,ωM } • The minimum error discriminant functions are of the form: • Classification can be reduced to simple distance metrics for many situations. Veton Këpuska

  50. Gaussian Classifier: Σi= σ2I • Each class has the same covariance structure: statistically independent dimensions with variance σ2 • The equivalent discriminant functions are: • If each class is equally likely, this is a minimum distance classifier, a form of template matching • The discriminant functions can be replaced by the following linear expression: • where Veton Këpuska

More Related