8 speech recognition n.
Skip this Video
Loading SlideShow in 5 Seconds..
8- Speech Recognition PowerPoint Presentation
Download Presentation
8- Speech Recognition

Loading in 2 Seconds...

play fullscreen
1 / 57

8- Speech Recognition - PowerPoint PPT Presentation

  • Uploaded on

8- Speech Recognition. Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types. 7- Speech Recognition (Cont’d). HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

8- Speech Recognition

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
    Presentation Transcript
    1. 8-Speech Recognition • Speech Recognition Concepts • Speech Recognition Approaches • Recognition Theories • Bayse Rule • Simple Language Model • P(A|W) Network Types

    2. 7-Speech Recognition (Cont’d) • HMM Calculating Approaches • Neural Components • Three Basic HMM Problems • Viterbi Algorithm • State Duration Modeling • Training In HMM

    3. Recognition Tasks • Isolated Word Recognition (IWR) Connected Word (CW) , And Continuous Speech Recognition (CSR) • Speaker Dependent, Multiple Speaker, And Speaker Independent • Vocabulary Size • Small <20 • Medium >100 , <1000 • Large >1000, <10000 • Very Large >10000

    4. Speech Recognition Concepts Speech recognition is inverse of Speech Synthesis Speech Text NLP Speech Processing Speech Synthesis Understanding NLP Speech Processing Speech Phone Sequence Text Speech Recognition

    5. Speech Recognition Approaches • Bottom-Up Approach • Top-Down Approach • Blackboard Approach

    6. Bottom-Up Approach Signal Processing Voiced/Unvoiced/Silence Feature Extraction Segmentation Sound Classification Rules Signal Processing Knowledge Sources Phonotactic Rules Feature Extraction Lexical Access Segmentation Language Model Segmentation Recognized Utterance

    7. Top-Down Approach Inventory of speech recognition units Word Dictionary Task Model Grammar Semantic Hypo thesis Syntactic Hypo thesis Unit Matching System Lexical Hypo thesis Feature Analysis Utterance Verifier/ Matcher Recognized Utterance

    8. Blackboard Approach Acoustic Processes Lexical Processes Black board Environmental Processes Semantic Processes Syntactic Processes

    9. Recognition Theories • Articulatory Based Recognition • Use from Articulatory system for recognition • This theory is the most successful until now • Auditory Based Recognition • Use from Auditorysystem for recognition • Hybrid Based Recognition • Is a hybrid from the above theories • Motor Theory • Model the intended gesture of speaker

    10. Recognition Problem • We have the sequence of acoustic symbols and we want to find the words that expressed by speaker • Solution : Finding the most probable of word sequence by having Acoustic symbols

    11. Recognition Problem • A : Acoustic Symbols • W : Word Sequence • we should find so that

    12. Bayse Rule

    13. Bayse Rule (Cont’d)

    14. Simple Language Model Computing this probability is very difficult and we need a very big database. So we use from Trigram and Bigram models.

    15. Simple Language Model (Cont’d) Trigram : Bigram : Monogram :

    16. Simple Language Model (Cont’d) Computing Method : Number of happening W3 after W1W2 Total number of happening W1W2 AdHoc Method :

    17. Error Production Factor • Prosody (Recognition should be Prosody Independent) • Noise (Noise should be prevented) • Spontaneous Speech

    18. P(A|W) Computing Approaches • Dynamic Time Warping (DTW) • Hidden Markov Model (HMM) • Artificial Neural Network (ANN) • Hybrid Systems

    19. Dynamic Time Warping

    20. Dynamic Time Warping

    21. Dynamic Time Warping

    22. Dynamic Time Warping

    23. Dynamic Time Warping Search Limitation : - First & End Interval - Global Limitation - Local Limitation

    24. Dynamic Time Warping Global Limitation :

    25. Dynamic Time Warping Local Limitation :

    26. Artificial Neural Network . . . Simple Computation Element of a Neural Network

    27. Artificial Neural Network (Cont’d) • Neural Network Types • Perceptron • Time Delay • Time Delay Neural Network Computational Element (TDNN)

    28. Artificial Neural Network (Cont’d) Single Layer Perceptron . . . . . .

    29. Artificial Neural Network (Cont’d) Three Layer Perceptron . . . . . . . . . . . .

    30. Neural Network Topologies

    31. TDNN

    32. Neural Network Structures for Speech Recognition

    33. Neural Network Structures for Speech Recognition

    34. Hybrid Methods • Hybrid Neural Network and Matched Filter For Recognition Acoustic Features Output Units Speech Delays PATTERN CLASSIFIER

    35. Neural Network Properties • The system is simple, But too much iteration is needed for training • Doesn’t determine a specific structure • Regardless of simplicity, the results are good • Training size is large, so training should be offline • Accuracy is relatively good

    36. Pre-processing • Different preprocessing techniques are employed as the front end for speech recognition systems • The choice of preprocessing method is based on the task, the noise level, the modeling tool, etc.

    37. روش MFCC • روش MFCC مبتني بر نحوه ادراک گوش انسان از اصوات مي باشد. • روش MFCC نسبت به ساير ويژگِيها در محيطهاي نويزي بهتر عمل ميکند. • MFCC اساساً جهت کاربردهاي شناسايي گفتار ارايه شده است اما در شناسايي گوينده نيز راندمان مناسبي دارد. • واحد شنيدار گوش انسان Mel مي باشد که به کمک رابطه زير بدست مي آيد:

    38. مراحل روش MFCC مرحله 1: نگاشت سيگنال از حوزه زمان به حوزه فرکانس به کمک FFT زمان کوتاه. : سيگنال گفتارZ(n) : تابع پنجره مانند پنجره همينگW(n( WF= e-j2π/F m : 0,…,F – 1; : طول فريم گفتاري.F

    39. مراحل روش MFCC مرحله 2: يافتن انرژي هر کانال بانک فيلتر. M تعداد بانکهاي فيلتر مبتني بر معيار مل ميباشد. تابع فيلترهاي بانک فيلتر است.

    40. توزيع فيلتر مبتنی بر معيار مل

    41. مراحل روش MFCC • مرحله 4: فشرده سازي طيف و اعمال تبديل DCT جهت حصول به ضرايب MFCC • در رابطه بالا L،...،0=n مرتبه ضرايب MFCC ميباشد.

    42. سیگنال زمانی Mel-scaling فریم بندی |FFT|2 Logarithm IDCT Cepstra Low-order coefficients Delta & Delta Delta Cepstra Differentiator روش مل-کپستروم

    43. ضرایب مل کپستروم(MFCC)

    44. ویژگی های مل کپستروم(MFCC) • نگاشت انرژی های بانک فیلترمل درجهتی که واریانس آنها ماکسیمم باشد(با استفاده ازDCT) • استقلال ویژگی های گفتار به صورت غیرکامل نسبت به یکدیگر(تاثیرDCT) • پاسخ مناسب در محیطهای تمیز • کاهش کارایی آن در محیطهای نویزی

    45. Time-Frequency analysis • Short-term Fourier Transform • Standard way of frequency analysis: decompose the incoming signal into the constituent frequency components. • W(n): windowing function • N: frame length • p: step size

    46. Critical band integration • Related to masking phenomenon: the threshold of a sinusoid is elevated when its frequency is close to the center frequency of a narrow-band noise • Frequency components within a critical band are not resolved. Auditory system interprets the signals within a critical band as a whole