1 / 19

Kernel-Based Detectors and Fusion of Phonological Attributes

Kernel-Based Detectors and Fusion of Phonological Attributes. Brett Matthews Mark Clements. Outline. Frame-Based Detection One-vs-all detectors Context-dependent framewise detection Probabilistic Outputs Kernel-Based Attribute Detection SVM Least-Squares SVM

bona
Download Presentation

Kernel-Based Detectors and Fusion of Phonological Attributes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Kernel-Based Detectors and Fusion of Phonological Attributes Brett Matthews Mark Clements

  2. Outline • Frame-Based Detection • One-vs-all detectors • Context-dependent framewise detection • Probabilistic Outputs • Kernel-Based Attribute Detection • SVM • Least-Squares SVM • Evaluating Probabilistic Estimates • Naïve Bayes Combinations • Hierarchical Manner Classification • Detector Fusion • Genetic Programming

  3. vowel silence dental velar voicing Frame-Based Detection • One-vs-All classifiers • Manner of articulation • Vowel, fricative, stop, nasal, glide/semivowel, silence • Place of articulation • Dental, labial, coronal, palatal, velar, glottal, back, front • Vowel Manners • High, mid, low, back, round • Framewise Detection • 10ms frame rate • 12 MFCCs+En • 8 context dependent frames • Classifier Types & Posterior Probs • Artificial Neural nets • Probabilistic outputs • Kernel-Based Classifiers • SVM • Empirically determined posterior probs • LS-SVMs • Probabilistic outputs Event Fusion

  4. w Kernel-Based Classifiers • Support Vector Machines (SVM) • LS-SVM Classifier • Kernel-based classifier like SVM • Least-Squares formulation • Probabilistic output scores • LS-SVM Lab package • Katholieke Universeit Lueven • Same decision function as SVM • Subject to • Equality constraints, instead of inequality constraints • No margin optimization • Linear system solution

  5. Least Squares SVMs • “Support Vectors” a found by solving a linear system • Kernel Functions • Probabilistic Outputs • Bayesian Inference for Posterior probs • Moderated outputs can be directly interpreted as posterior probabilities Linear Polynomial RBF

  6. Evaluating Probabilistic Estimates • Reliability and Accuracy of probabilistic scores • Initial Fusion Experiments • Hierarchical Manner Classification • LS-SVM, SVM • Naïve Bayes combination for Phone Detection • LS-SVM, SVM, ANN LS-SVM SVM

  7. Hierarchical Combinations • Probabilistic Phonetic feature hierarchy for classifying frames into 6 manner classes • Train binary detectors on each split in hierarchy • 5 Detectors, 6 classes • silence vs speech • sonorant | speech • vowel | sonorant • stop | non-sonorant • semivowel | sonorant consonant P(fric | x) = (1 – P(st | non-sc)) · (1 – P(son | spch)) · P (spch | x) fricative detection and gnd truth

  8. Hierarchical Combinations LS-SVM (Combined) • Reliability of Posterior Probs (right) • Plot probabilistic estimates of combinations vs. observed frequencies • Hierarchical Combinations much more reliable for SVM than LS-SVM • Classification Accuracy (below) • Higher classification accuracy for SVMs, especially fricatives • Upper-bound Comparison (below) • One-vs-all classifiers trained directly for each class. • Combinations nearly as accurate as one-vs-all for classification performance • LS-SVM combinations not good for semivowel and nasal vowel stop fricative semivowel/ glide nasal silence SVM (Combined) stop vowel fricative semivowel/ glide nasal silence Classification accuracy (%)

  9. Naïve Bayes Combinations • One-vs-all frameworks desired • Phonetic hierarchies are cumbersome • Phone Detection • Combine phonological attribute scores with Naïve Bayes product • Initial experiments in evaluating probabilities • Compare accuracy and reliability of probabilistic outputs for ANN, SVM and LS-SVM • Limited training data (LS-SVM limit is 3000 due to memory restrictions) • Detect phones with combinations of relevant phonetic attributes P(/f/ | x) = P(labial | x) P(fric | x) (1-P(voicing | x))

  10. Naïve Bayes Combinations • Phone Detection • Compare combined attributes with direct training on phones as an upper bound • ROC Stats (right) • SVMs best for attribute detection • Mixed results for NB combinations • No clear winner between LS-SVM and SVM • Direct training outperforms combinations • Reliability • Naïve Bayes combinations give poor reliability for all detector types • Rare phones & vowels • For /v/, /ng/ and /oy/, improvements in EER and AUC across detector types (lower right) • Most vowels saw improvements as well ROC Stats Direct vs. Combined

  11. Phone Detection Compare combined attributes with direct training on phones as an upper bound ROC Stats (right) SVMs best for attribute detection Mixed results for NB combinations No clear winner between LS-SVM and SVM Direct training outperforms combinations Reliability Naïve Bayes combinations give poor reliability for all detector types Rare phones & vowels For /v/, /ng/ and /oy/, improvements in EER and AUC across detector types (lower right) Most vowels saw improvements as well Naïve Bayes Combinations Combined attributes (SVM) Direct Training (SVM)

  12. Genetic Programming • Evolutionary algorithm for tree-structured feature “creation” (Extraction) • Maximize a fitness function across a number of generations (iterations) • Operations like crossover & mutation control the evolution of the algorithm • Trees are algebraic networks • Inputs are multi-dimensional features • Tree nodes are unary or binary mathematical operators (+, -, *, (.)2, log) • Algebraic networks simpler and more transparent than neural nets • GPLab Package from Universidade de Coimbra, Portugal • http://gplab.sourceforge.net

  13. 1-D feature /aa/ vowel /ae/ silence dental velar /zh/ voicing Genetic Programming • Trained GP trees on SVM outputs • Develop algebraic networks for combining detector outputs • Produce a 1-D feature from a nonlinear combination of detector outputs • choose fitness function, set of node operators, tree depth, etc. to maximize separation • Trees are algebraic networks • Inputs are multi-dimensional features • Tree nodes are unary or binary mathematical operators (+, -, *, (.)2, log) • Algebraic networks simpler and more transparent than neural nets

  14. /oy/ /th/ Genetic Programming /oy/ • System is complex for speech recognition (tree + classifier for each phone), but GP trees themselves provide insights for combination • Fitness function • Tree node operators • Important features • Initial results • Mixed results • Good separation for some phones, not good for most • GP Trees select attributes of interest, discard others • Still in progress /th/

  15. Summary • Evaluating Posterior Probs • ANNs, SVMs, LS-SVMs • SVMs are best for reliability and accuracy • In limited training data, rare phones may benefit from from overlapping phonetic classes • Genetic Programming for detector fusion • Small, transparent algebraic networks for combining attribute detectors • GP trees select relevant attributes, but much room for improvement • Limiting tree node operators and selecting “fitness functions” should provide insights into detector fusion

  16. Extras Feature Space correlation matrix (1) Feature Space correlation matrix (2) Feature Space correlation matrix (3) Training Data Represents the kernel function K and the range of kernel parameters

  17. Extras Determine w and b by solving the optimization problem Subject to Generalization/ Regularization term Regression error for training sample k Expression for the trade-off between generalization and training set error Positive scale parameters

  18. Extras • Support Vector Machines • Good performance, but the majority of training points became support vectors • Posterior probabilities w

More Related