200 likes | 317 Views
Explore stress detection (SD) for English words, aiding in computer-assisted pronunciation training. Learn the steps involved, from preprocessing to model construction, using forced alignment. Discover the applications, error analysis, and experiments conducted with various classifiers. Access speech corpora and a speech corpus for lexical stress detection. Examine features for vowel-based stress detection and the application of lexical stress detection in English words, with examples and error analyses. Dive into the world of stress detection in English for enhanced language learning.
E N D
Stress Detection J.-S. Roger Jang (張智星) MIR Lab, CSIE Dept., National Taiwan Univ. http://mirlab.org/jang
Intro to Stress Detection • Stress detection (SD) for English • Given an English word and its pronunciation • Detect the stress position of the pronunciation • Applications • Computer-assisted pronunciation training (CAPT) • Similar to… • Tone recognition in Mandarin Chinese • Intonation scoring
Examples of Stress in English Words • For multi-syllablic English word, there is a stressed syllable • Example • Dictionary: stressed at syllable 1 • Tomorrow: stressed at syllable 2 • International: stressed at syllable 3
Steps in Stress Detection • Preprocessing • Use forced alignment to find vowel locations • Feature extraction • Extract feature for each vowel • Model construction • Build a classifier for vowel-based stress detection • Post processing • Create a word-based stress detection
Forced Alignment (1/2) • A process used for align an utterance to the corresponding canonical phonetic alphabets • Example: International
Forced Alignment (2/2) • Applications of forced alignment • Speech scoring (based on timber only) • Utterance verification • Our forced alignment engine • ASRA (Automatic Speech Recognition & Assessment): For voice command recognition and speech assessment (scoring)
Corpora for Stress Detection • Merriam Webster dictionary • Website • Some statistics • # pronunciations: 21950 • Usable files: 14994 • No. of syllables > 1 • Available in our dictionary • Valid output from ASRA • In-house recordings • Recordings from MSAR for several years • Available upon request
Speech Corpus for Lexical Stress Detection • Merriam Webster Online Dictionary’s Lexical Pronunciation • http://www.merriam-webster.com • All utterance are pronunciated by Native Speakers
Stress Detection based on Vowel Classification • SD is based on vowel classification due to the following observations • Each word has a stressed syllable • Each syllable is usually composed of a consonant and a vowel • Vowels are always voiced (have pitch) • Therefore • Each vowel is classified into “unstressed” or “stressed” • To determine stressed syllable in an utterance • Max likelihood of the class “Stressed” • Min likelihood of the class “Unstressed” • Difference of the above two
Features for vowels • Vowel-based features • Pitch: min, mean, max, range, std, slope, etc. • Volume: min, mean, max, range, std, slope, etc. • Duration (normalized by speech rate) • Legendre polynomial fitting for pitch & volume • Spectral emphasized version of the above • …
Lexical Stress Detection – Experiment 1 10-fold Cross Validation Classifier: SVM Feature Set E :Root Mean Square Energy D : Duration P : Pitch S :Root Mean Square Spectral Emphasis Energy PS: Pitch Slope CE: Legendre Coefficient of Root Mean Square Energy Contour CP: Legendre Coefficient of Pitch Contour CS: Legendre Coefficient of Spectral Emphasis Energy Contour
Lexical Stress Detection – Experiment 2 10-fold Cross Validation Classifier: SVM Syllable Number-Independent Classifier vs. Syllable Number-dependent Classifier
Lexical Stress Detection – Experiment 3 10-fold Cross Validation GMMC: Gaussian Mixture Model Classifier NBC: Naïve Bayes Classifier QC: Quadratic Classifier SVMC: Support Vector Machine Classifier
Lexical Stress Detection – Error Analysis • Error Types: • Wrong ground truth / More than 1 pronunciations of the word • conduct2[kənˋdʌkt] / [ˋkɑndʌkt] • Complex Word with 2 primary stressed syllables • worldwide2[`wɝld`waɪd] • histochemistry5[ˋhɪstəˋkɛmɪstrɪ] • Word with Primary stressed and Secondary stressed syllable • deposition4[͵dɛpəˋzɪʃən] • cafeteria5[͵kæfəˋtɪrɪə]
Lexical Stress Detection – Error Analysis • Error Types: • Wrong result from Pitch Tracking • elegant3[ˋɛləgənt] • Wrong result from Forced Alignment • peremptory4[pəˋrɛmptərɪ]
More on Stress Detection • ASRA • Chapter 20 of online tutorial on Audio Signal Processing • Demo • Recognition • goDemoVc.m in ASR • Web • Assessment • goDemoSa.m in ASR • Web • Stress detection • Application note • Demo