1 / 17

My Slides

My Slides. Support vector machines (brief intro) WS04: What we accomplished WS04: Organizational lessons AVICAR video corpus: current status. Support Vector Machines as (sort of) compared to Neural Networks*.

tyrell
Download Presentation

My Slides

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. My Slides • Support vector machines (brief intro) • WS04: What we accomplished • WS04: Organizational lessons • AVICAR video corpus: current status

  2. Support Vector Machinesas (sort of) compared toNeural Networks* * Difficult to do because they have never been compared head-to-head on any speech task!

  3. SVM = Regularized Nonlinear Discriminant Kernel: Transform to Infinite- Dimensional Hilbert Space The only way in which SVM differs from RBF-NN: THE TRAINING CRITERION SVM Discriminant Dimension c = argmin( training_error(c) + l/width(margin(c)) ) SVM Extracts a Discriminant Dimension (Bourlard/Morgan Hybrid) Niyogi & Burges, 2002: Posterior PDF = Sigmoid Model in Discriminant Dimension OR (BDFK Tandem) Borys & Hasegawa-Johnson, 2005: Likelihood = Mixture Gaussian in Discriminant Dimension

  4. Binary Classifier = sign( Nonlinear Discriminant )

  5. Advantages of SVMs w.r.t. NNs • Accuracy: • SVM generalizes much better from small training data sets ( training tokens > 6X observation vector size ) • As training data size increases, accuracy of NN and SVM converge • Theoretically, and in some practical experiments too • Like 3-layer-MLP, RBF-SVM is a universal approximator • Fast training: nearly quadratic optimality criterion

  6. Disadvantages of SVMs w.r.t. NNs • No way to train with very large training set • Complexity = O(N^2): either fast or impossible • Computational complexity during test • Solution: Burges’ Reduced Set Method (extra training step; only available right now in matlab) • Accuracy: unless you optimize the hyper-parameters, accuracy is good but not great • Exhaustive hyper-training is very slow • Can get good accuracy but not great accuracy with the “theoretically correct” hyper-parameters

  7. Disadvantages of SVMs w.r.t. NNs • The real problem: We need phonetically labeled training data • “Embedded re-estimation experiment:” • pre-trained SVMs used as HMM input (tandem system) • RBF weights re-estimated, together with HMM params, in order to maximize likelihood of the training data • Result: Training Data Likelihood, WRA increased • Result: Test Data WRA decreased

  8. WS04

  9. WS04: SVM/DBN hybrid recognizer Word LIKE A Canonical Form … Tongue closed Tongue Mid Tongue front Tongue open … Surface Form Tongue front Semi-closed Tongue Front Tongue open … Manner Glide Front Vowel Place Palatal … SVM Outputs p( gPGR(x) | palatal glide release) p( gGR(x) | glide release ) x: Multi-Frame Observation including Spectrum, Formants, & Auditory Model …

  10. WS04 Organizational Lessons:What Worked • Innovative experiments, made possible by people who really wanted to be doing what they were doing • Result: Published ideas were interesting to many people • Parallel SVM classification experiments allowed us to test many different SVM definitions • Result: Classification errors mostly below 20% by end of WS • Parallel recognizer test experiments (DBN/SVM was one, MaxEnt-based lattice rescoring was another) • Result: both achieved small (nonsignificant) WER reduction over baseline

  11. WS04 Organizational Lessons:What Didn’t Work • Software Bottleneck between the SVMs and the recognizers: Only one tool available to apply an SVM to every frame in a speech file, and only person knew how to use it. • Too Many Experimental Variables: Should SVMs be trained using (1) all frames, or (2) only landmark frames? DBN expects #1. HMM works best if manner features are #1, place features are #2. DBN? Impossible to test in six weeks. • Apples & Oranges: SVM-only classifier outputs in cases #1, #2 were incomparable => no test short of full DBN integration is meaningful.

  12. WS04 Organizational Lessons:What Didn’t Work • Unbeatable baseline: Goal was to rescore the output of the SRI recognizer in order to reduce WER => to find acoustic information not already used by the baseline recognizer. • What information is “not already used?” Phone-based ANN/HMM hybrid system: hard to say. • When an experiment fails: why? • Better: use open-source baseline (= not state of the art, but that’s OK), construct test systems in a continuum between baseline and target.

  13. AVICAR

  14. 8 Mics, Pre-amps, Wooden Baffle. Best Place= Sunvisor. 4 Cameras, Glare Shields, Adjustable Mounting Best Place= Dashboard AVICAR: Recording Hardware System is not permanently installed; mounting requires 10 minutes.

  15. AVICAR: Data Summary • 100 Talkers • 5 noise conditions: • Engine idling, • 35mph, windows closed / windows open • 55mph, windows closed / windows open • 4 types of utterances: • Isolated Digits • Phone numbers • Isolated Letters (e-set = articulation test) • TIMIT sentences • Public release: 16 schools & companies (but I don’t know how many are using it)

  16. AVICAR: Labeling & Recognition • Manual lip segmentation: 36 images • Automatic face tracking: nearly perfect • Automatic lip tracking: not so good • Manual audio segmentation: sentence boundaries • Audio Enhancement: • Audio Digit WRA 97, 89, 87, 84, 78%

  17. AVICAR: Data Problems • DIVX encoding => database < 300G, but… • DIVX => poor edge quality in some images • Amelioration plan: re-transfer from tapes in high quality, huge data size for folks who want it.

More Related