Download Presentation
## Audio Features & Machine Learning

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Audio Features &Machine Learning**E.M. Bakker API2010**Features for Speech Recognition and Audio Indexing**• Parametric Representations • Short Time Energy • Zero Crossing Rates • Level Crossing Rates • Short Time Spectral Envelope • Spectral Analysis • Filter Design • Filter Bank Spectral Analysis Model • Linear Predictive Coding (LPC) API2010**Methods**• Vector Quantization • Finite code book of spectral shapes • The code book codes for ‘typical’ spectral shape • Method for all spectral representations (e.g. Filter Banks, LPC, ZCR, etc. …) • Ensemble Interval Histogram (EIH) Model • Auditory-Based Spectral Analysis Model • More robust to noise and reverberation • Expected to be inherently better representation of relevant spectral information because it models the human cochlea mechanics API2010**Pattern Recognition**Speech Audio, … Parameter Measurements Test Pattern Query Pattern Pattern Comparison Reference Patterns Recognized Speech, Audio, … Decision Rules API2010**Speech**Audio, … Feature Detector1 Feature Detectorn Feature Combiner and Decision Logic Recognized Speech, Audio, … Hypothesis Tester Reference Vocabulary Features Pattern Recognition API2010**Spectral Analysis Models**• Pattern Recognition Approach • Parameter Measurement => Pattern • Pattern Comparison • Decision Making • Parameter Measurements • Bank of Filters Model • Linear Predictive Coding Model API2010**Audio Signal**s(n) Bandpass Filter F() Result Audio Signal F(s(n) Band Pass Filter • Note that the bandpass filter can be defined as: • a convolution with a filter response function in the time domain, • a multiplication with a filter response function in the frequency domain API2010**Bank of Filters Analysis Model**API2010**Bank of Filters Analysis Model**• Speech Signal: s(n), n=0,1,… • Digital with Fs the sampling frequency of s(n) • Bank of q Band Pass Filters: BPF1, …,BPFq • Spanning a frequency range of, e.g., 100-3000Hz or 100-16kHz • BPFi(s(n)) = xn(ejωi), where ωi = 2πfi/Fs is equal to the normalized frequency fi, where i=1, …, q. • xn(ejωi) is the short time spectral representation of s(n) at time n, as seen through the BPFi with centre frequency ωi, where i=1, …, q. • Note: Each BPF independently processes s to produce the spectral representation x API2010**Typical Speech Wave Forms**API2010**MFCCs**Speech Audio, … Preemphasis Windowing Fast Fourier Transform MFCCs are calculated using the formula: Mel-Scale Filter Bank Log() • Where • Ci is the cepstral coefficient • P the order (12 in our case) • K the number of discrete Fourier • transform magnitude coefficients • Xk the kth order log-energy output • from the Mel-Scale filterbank. • N is the number of filters MFCC’s first 12 most Signiifcant coefficients Direct Cosine Transform API2010**Linear Predictive Coding Model**API2010**Filter Response Functions**API2010**Short Time Fourier Transform**• s(m) signal • w(n-m) a fixed low pass window API2010**Short Time Fourier TransformLong Hamming Window: 500 samples**(=50msec) Voiced Speech API2010**Short Time Fourier TransformShort Hamming Window: 50 samples**(=5msec) Voiced Speech API2010**Short Time Fourier TransformLong Hamming Window: 500 samples**(=50msec) Unvoiced Speech API2010**Short Time Fourier TransformShort Hamming Window: 50 samples**(=5msec) Unvoiced Speech API2010**Short Time Fourier TransformLinear Filter Interpretation**API2010**Linear Predictive Coding (LPC) Model**• Speech Signal: s(n), n=0,1,… • Digital with Fs the sampling frequency of s(n) • Spectral Analysis on Blocks of Speech with an all pole modeling constraint • LPC of analysis order p • s(n) is blocked into frames [n,m] • Again consider xn(ejω) the short time spectral representation of s(n) at time n. (where ω = 2πf/Fs is equal to the normalized frequency f). • Now the spectral representation xn(ejω) is constrained to be of the form σ/A(ejω), where A(ejω) is the pth order polynomial with z-transform: A(z) = 1 + a1z-1 + a2z-2 + … + apz-p • The output of the LPC parametric Conversion on block [n,m] is the vector [a1,…,ap]. • It specifies parametrically the spectrum of an all-pole model that best matches the signal spectrum over the period of time in which the frame of speech samples was accumulated (pth order polynomial approximation of the signal). API2010**Vector Quantization**• Data represented as feature vectors. • VQ Training set to determine a set of code words that constitute a code book. • Code words are centroids using a similarity or distance measure d. • Code words together with d divide the space into a Voronoi regions. • A query vector falls into a Voronoi region and will be represented by the respective codeword. API2010**Vector Quantization**Distance measures d(x,y): • Euclidean distance • Taxi cab distance • Hamming distance • etc. API2010**Vector Quantization**Clustering the Training Vectors • Initialize: choose M arbitrary vectors of the L vectors of the training set. This is the initial code book. • Nearest neighbor search: for each training vector, find the code word in the current code book that is closest and assign that vector to the corresponding cell. • Centroid update: update the code word in each cell using the centroid of the training vectors that are assigned to that cell. • Iteration: repeat step 2-3 until the averae distance falls below a preset threshold. API2010**Vector Classification**For an M-vector code book CB with codes CB = {yi | 1 ≤ i ≤ M} , the index m* of the best codebook entry for a given vector v is: m* = arg min d(v, yi) 1 ≤ i ≤ M API2010**VQ for Classification**A code book CBk = {yki | 1 ≤ i ≤ M}, can be used to define a class Ck. Example Audio Classification: • Classes ‘crowd’, ‘car’, ‘silence’, ‘scream’, ‘explosion’, etc. • Determine by using VQ code books CBk for each of the classes. • VQ is very often used as a baseline method for classification problems. API2010**Sound, DNA: Sequences!**• DNA: helix-shaped molecule whose constituents are two parallel strands of nucleotides • DNA is usually represented by sequences of these four nucleotides • This assumes only one strand is considered; the second strand is always derivable from the first by pairing A’s with T’s and C’s with G’s and vice-versa • Nucleotides (bases) • Adenine (A) • Cytosine (C) • Guanine (G) • Thymine (T) API2010**Gene**DNA Transcription genomics molecular biology RNA Translation structural biology biophysics Protein Protein folding Biological Information: From Genes to Proteins API2010**From Amino Acids to Proteins Functions**CGCCAGCTGGACGGGCACACCATGAGGCTGCTGACCCTCCTGGGCCTTCTG… TDQAAFDTNIVTLTRFVMEQGRKARGTGEMTQLLNSLCTAVKAISTAVRKAGIAHLYGIAGSTNVTGDQVKKLDVLSNDLVINVLKSSFATCVLVTEEDKNAIIVEPEKRGKYVVCFDPLDGSSNIDCLVSIGTIFGIYRKNSTDEPSEKDALQPGRNLVAAGYALYGSATML DNA / amino acid sequence 3D structure protein functions DNA (gene) →→→ pre-RNA →→→ RNA →→→ Protein RNA-polymerase Spliceosome Ribosome API2010**Motivation for Markov Models**• There are many cases in which we would like to representthe statistical regularities of some class of sequences • genes • proteins in a given family • Sequences of audio features • Markov models are well suited to this type of task API2010**A Markov Chain Model**• Transition probabilities • Pr(xi=a|xi-1=g)=0.16 • Pr(xi=c|xi-1=g)=0.34 • Pr(xi=g|xi-1=g)=0.38 • Pr(xi=t|xi-1=g)=0.12 API2010**Definition of Markov Chain Model**• A Markov chain[1] model is defined by • a set of states • some states emit symbols • other states (e.g., the begin state) are silent • a set of transitions with associatedprobabilities • the transitions emanating from a given state define a distribution over the possible next states [1] Марков А. А., Распространение закона больших чисел на величины, зависящие друг от друга. — Известия физико-математического общества при Казанском университете. — 2-я серия. — Том 15. (1906) — С. 135—156 API2010**Markov Chain Models: Properties**• Given some sequence x of length L, we can ask howprobable the sequence is given our model • For any probabilistic model of sequences, we can write thisprobability as • key property of a (1st order) Markov chain: the probabilityof each xidepends only on the value of xi-1 API2010**The Probability of a Sequence for a Markov Chain Model**Pr(cggt)=Pr(c)Pr(g|c)Pr(g|g)Pr(t|g) API2010**Example Application**CpG islands • CG di-nucleotides are rarer in eukaryotic genomes thanexpected given the marginal probabilities of C and G • but the regions upstream of genes are richer in CGdi-nucleotides than elsewhere – CpG islands • useful evidence for finding genes Application: Predict CpG islands with Markov chains • one Markov chain to represent CpG islands • another Markov chain to represent the rest of the genome API2010**Markov Chains for Discrimination**• Suppose we want to distinguish CpG islands from othersequence regions • Given sequences from CpG islands, and sequences fromother regions, we can construct • a model to represent CpG islands • a null model to represent the other regions • We can then score a test sequence by: API2010**Markov Chains for Discrimination**• Why can we use • According to Bayes’ rule: • If we are not taking into account prior probabilities (Pr(CpG) and Pr(null)) of the twoclasses, then from Bayes’ rule it is clear that we just need tocompare Pr(x|CpG)andPr(x|null)as is done in our scoring function score(). API2010**Higher Order Markov Chains**• The Markov property specifies that the probability of a statedepends only on the probability of the previous state • But we can build more “memory” into our states by using ahigher order Markov model • In an n-th order Markov model The probability of the current state depends on the previous n states. API2010**Selecting the Order of aMarkovChain Model**• But the number of parameters we need to estimate growsexponentially with the order • for modeling DNA we need parameters for ann-th order model • The higher the order, the less reliable we can expect ourparameter estimates to be • estimating the parameters of a 2ndorder Markov chainfrom the complete genome of E. Coli (5.44 x 106 bases) , we’d see eachword ~ 85.000 times on average (divide by 43) • estimating the parameters of a 9th order chain, we’dsee each word ~ 5 times on average (divide by 410 ~ 106) API2010**Higher Order Markov Chains**• An n-th order Markov chain over some alphabet A isequivalent to a first order Markov chain over the alphabetof n-tuples: An • Example: A 2nd order Markov model for DNA can betreated as a 1st order Markov model over alphabet AA, AC, AG, AT CA, CC, CG, CT GA, GC, GG, GT TA, TC, TG, TT API2010**A Fifth Order Markov Chain**Pr(gctaca)=Pr(gctac)Pr(a|gctac) API2010**Hidden Markov Model: A Simple HMM**Model 2 Model 1 Given observed sequence AGGCT, which state emits every item? API2010**Tutorial on HMM**L.R. Rabiner, A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition, Proceeding of the IEEE, Vol. 77, No. 22, February 1989. API2010**HMM for Hidden Coin Tossing**T H H ……… H H T T H T H H T T H T T T T T API2010**Hidden State**• We’ll distinguish between the observed parts of a problemand the hidden parts • In the Markov models we’ve considered previously, it isclear which state accounts for each part of the observedsequence • In the model above, there are multiple states that couldaccount for each part of the observed sequence • this is the hidden part of the problem API2010**Learning and Prediction Tasks(in general, i.e., applies on**both MM as HMM) • Learning • Given: a model, a set of training sequences • Do: find model parameters that explain the training sequences withrelatively high probability (goal is to find a model that generalizes wellto sequences we haven’t seen before) • Classification • Given: a set of models representing different sequence classes, and given a test sequence • Do: determine which model/class best explains the sequence • Segmentation • Given: a model representing different sequence classes, and given a test sequence • Do: segment the sequence into subsequences, predicting the class of eachsubsequence API2010**Algorithms for Learning & Prediction**• Learning • correct path known for each training sequence-> simple maximumlikelihoodor Bayesian estimation • correct path not known -> Forward-Backward algorithm + ML orBayesian estimation • Classification • simple Markov model-> calculate probability of sequence along singlepath for each model • hidden Markov model-> Forward algorithm to calculate probability ofsequence along all paths for each model • Segmentation • hidden Markov model-> Viterbi algorithm to find most probable pathfor sequence API2010**The Parameters of an HMM**• Transition Probabilities • Probability of transition from state k to state l • Emission Probabilities • Probability of emitting character b in state k Note: HMM’s can also be formulated using an emission probability associated with a transition from state k to state l. API2010