8 speech recognition n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
8- Speech Recognition PowerPoint Presentation
Download Presentation
8- Speech Recognition

Loading in 2 Seconds...

play fullscreen
1 / 57

8- Speech Recognition - PowerPoint PPT Presentation


  • 224 Views
  • Uploaded on

8- Speech Recognition. Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types. 7- Speech Recognition (Cont’d). HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about '8- Speech Recognition' - sagira


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
8 speech recognition
8-Speech Recognition
  • Speech Recognition Concepts
  • Speech Recognition Approaches
  • Recognition Theories
  • Bayse Rule
  • Simple Language Model
  • P(A|W) Network Types
7 speech recognition cont d
7-Speech Recognition (Cont’d)
  • HMM Calculating Approaches
  • Neural Components
  • Three Basic HMM Problems
  • Viterbi Algorithm
  • State Duration Modeling
  • Training In HMM
recognition tasks
Recognition Tasks
  • Isolated Word Recognition (IWR)

Connected Word (CW) , And Continuous Speech Recognition (CSR)

  • Speaker Dependent, Multiple Speaker, And Speaker Independent
  • Vocabulary Size
    • Small <20
    • Medium >100 , <1000
    • Large >1000, <10000
    • Very Large >10000
speech recognition concepts
Speech Recognition Concepts

Speech recognition is inverse of Speech Synthesis

Speech

Text

NLP

Speech

Processing

Speech Synthesis

Understanding

NLP

Speech

Processing

Speech

Phone

Sequence

Text

Speech Recognition

speech recognition approaches
Speech Recognition Approaches
  • Bottom-Up Approach
  • Top-Down Approach
  • Blackboard Approach
bottom up approach
Bottom-Up Approach

Signal Processing

Voiced/Unvoiced/Silence

Feature Extraction

Segmentation

Sound Classification Rules

Signal Processing

Knowledge Sources

Phonotactic Rules

Feature Extraction

Lexical Access

Segmentation

Language Model

Segmentation

Recognized Utterance

top down approach
Top-Down Approach

Inventory of speech recognition units

Word Dictionary

Task

Model

Grammar

Semantic

Hypo

thesis

Syntactic

Hypo

thesis

Unit

Matching

System

Lexical

Hypo

thesis

Feature

Analysis

Utterance

Verifier/

Matcher

Recognized Utterance

blackboard approach
Blackboard Approach

Acoustic

Processes

Lexical

Processes

Black

board

Environmental

Processes

Semantic

Processes

Syntactic

Processes

recognition theories
Recognition Theories
  • Articulatory Based Recognition
    • Use from Articulatory system for recognition
    • This theory is the most successful until now
  • Auditory Based Recognition
    • Use from Auditorysystem for recognition
  • Hybrid Based Recognition
    • Is a hybrid from the above theories
  • Motor Theory
    • Model the intended gesture of speaker
recognition problem
Recognition Problem
  • We have the sequence of acoustic symbols and we want to find the words that expressed by speaker
  • Solution : Finding the most probable of word sequence by having Acoustic symbols
recognition problem1
Recognition Problem
  • A : Acoustic Symbols
  • W : Word Sequence
  • we should find so that
simple language model
Simple Language Model

Computing this probability is very difficult and we need a very big database. So we use from Trigram and Bigram models.

simple language model cont d
Simple Language Model (Cont’d)

Trigram :

Bigram :

Monogram :

simple language model cont d1
Simple Language Model (Cont’d)

Computing Method :

Number of happening W3 after W1W2

Total number of happening W1W2

AdHoc Method :

error production factor
Error Production Factor
  • Prosody (Recognition should be Prosody Independent)
  • Noise (Noise should be prevented)
  • Spontaneous Speech
p a w computing approaches
P(A|W) Computing Approaches
  • Dynamic Time Warping (DTW)
  • Hidden Markov Model (HMM)
  • Artificial Neural Network (ANN)
  • Hybrid Systems
dynamic time warping4
Dynamic Time Warping

Search Limitation :

- First & End Interval

- Global Limitation

- Local Limitation

dynamic time warping5
Dynamic Time Warping

Global Limitation :

dynamic time warping6
Dynamic Time Warping

Local Limitation :

artificial neural network
Artificial Neural Network

.

.

.

Simple Computation Element

of a Neural Network

artificial neural network cont d
Artificial Neural Network (Cont’d)
  • Neural Network Types
    • Perceptron
    • Time Delay
    • Time Delay Neural Network Computational Element (TDNN)
artificial neural network cont d1
Artificial Neural Network (Cont’d)

Single Layer Perceptron

. . .

. . .

artificial neural network cont d2
Artificial Neural Network (Cont’d)

Three Layer Perceptron

. . .

. . .

. . .

. . .

hybrid methods
Hybrid Methods
  • Hybrid Neural Network and Matched Filter For Recognition

Acoustic Features

Output Units

Speech

Delays

PATTERN

CLASSIFIER

neural network properties
Neural Network Properties
  • The system is simple, But too much iteration is needed for training
  • Doesn’t determine a specific structure
  • Regardless of simplicity, the results are good
  • Training size is large, so training should be offline
  • Accuracy is relatively good
pre processing
Pre-processing
  • Different preprocessing techniques are employed as the front end for speech recognition systems
  • The choice of preprocessing method is based on the task, the noise level, the modeling tool, etc.
slide41
روش MFCC
  • روش MFCC مبتني بر نحوه ادراک گوش انسان از اصوات مي باشد.
  • روش MFCC نسبت به ساير ويژگِيها در محيطهاي نويزي بهتر عمل ميکند.
  • MFCC اساساً جهت کاربردهاي شناسايي گفتار ارايه شده است اما در شناسايي گوينده نيز راندمان مناسبي دارد.
  • واحد شنيدار گوش انسان Mel مي باشد که به کمک رابطه زير بدست مي آيد:
slide42
مراحل روش MFCC

مرحله 1: نگاشت سيگنال از حوزه زمان به حوزه فرکانس به کمک FFT زمان کوتاه.

: سيگنال گفتارZ(n)

: تابع پنجره مانند پنجره همينگW(n(

WF= e-j2π/F

m : 0,…,F – 1;

: طول فريم گفتاري.F

slide43
مراحل روش MFCC

مرحله 2: يافتن انرژي هر کانال بانک فيلتر.

M تعداد بانکهاي فيلتر مبتني بر معيار مل ميباشد.

تابع فيلترهاي بانک فيلتر است.

slide45
مراحل روش MFCC
  • مرحله 4: فشرده سازي طيف و اعمال تبديل DCT جهت حصول به ضرايب MFCC
  • در رابطه بالا L،...،0=n مرتبه ضرايب MFCC ميباشد.
slide46

سیگنال زمانی

Mel-scaling

فریم بندی

|FFT|2

Logarithm

IDCT

Cepstra

Low-order coefficients

Delta & Delta Delta Cepstra

Differentiator

روش مل-کپستروم
slide48
ویژگی های مل کپستروم(MFCC)
  • نگاشت انرژی های بانک فیلترمل درجهتی که واریانس آنها ماکسیمم باشد(با استفاده ازDCT)
  • استقلال ویژگی های گفتار به صورت غیرکامل نسبت به یکدیگر(تاثیرDCT)
  • پاسخ مناسب در محیطهای تمیز
  • کاهش کارایی آن در محیطهای نویزی
time frequency analysis
Time-Frequency analysis
  • Short-term Fourier Transform
    • Standard way of frequency analysis: decompose the incoming signal into the constituent frequency components.
    • W(n): windowing function
    • N: frame length
    • p: step size
critical band integration
Critical band integration
  • Related to masking phenomenon: the threshold of a sinusoid is elevated when its frequency is close to the center frequency of a narrow-band noise
  • Frequency components within a critical band are not resolved. Auditory system interprets the signals within a critical band as a whole
feature orthogonalization
Feature orthogonalization
  • Spectral values in adjacent frequency channels are highly correlated
  • The correlation results in a Gaussian model with lots of parameters: have to estimate all the elements of the covariance matrix
  • Decorrelation is useful to improve the parameter estimation.
slide53

Language Models for LVCSR

Word Pair Model: Specify which word pairs are valid

slide55

Perplexity of the Language Model

Entropy of the Source:

First order entropy of the source:

If the source is ergodic, meaning its statistical properties can be

completely characterized in a sufficiently long sequence that the

Source puts out,

slide56

We often compute H based on a finite but sufficiently large Q:

H is the degree of difficulty that the recognizer encounters, on average,

When it is to determine a word from the same source.

Using language model, if the N-gram language model PN(W) is used,

An estimate of H is:

In general:

Perplexity is defined as: