speaker recognition l.
Download
Skip this Video
Download Presentation
Speaker Recognition

Loading in 2 Seconds...

play fullscreen
1 / 16

Speaker Recognition - PowerPoint PPT Presentation


  • 341 Views
  • Uploaded on

Speaker Recognition. Sharat.S.Chikkerur S.Anand Mantravadi Rajeev.K.Srinivasan. Speaker Recognition. Speaker Identification. Speaker Detection. Speaker Verification. Text Dependent. Text Independent. Text Dependent. Text Independent. Speaker Recognition. Definition

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Speaker Recognition' - emily


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
speaker recognition

Speaker Recognition

Sharat.S.Chikkerur

S.Anand Mantravadi Rajeev.K.Srinivasan

speaker recognition2

Speaker Recognition

Speaker Identification

Speaker Detection

Speaker Verification

Text

Dependent

Text

Independent

Text

Dependent

Text

Independent

Speaker Recognition
  • Definition
    • It is the method of recognizing a person based on his voice
    • It is one of the forms of biometric identification
  • Depends of speaker dependent characteristics.

EE 516 Term Project, Fall 2003

speech production

Pitch

Av

Impulse

Train

Generator

Glottal Pulse

Model

G(z)

Vocal Tract

Model

V(z)

Radiation

Model

R(z)

Noise source

AN

Speech production

Speech production mechanism

Speech production model

EE 516 Term Project, Fall 2003

generic speaker recognition system
Generic Speaker Recognition System

Speech signal

Score

Analysis Frames

Feature Vector

Preprocessing

Feature

Extraction

Pattern

Matching

Verification

Preprocessing

Feature

Extraction

Speaker Model

Enrollment

  • Stochastic Models
    • GMM
    • HMM
  • Template Models
    • DTW
    • Distance Measures
  • LAR
  • Cepstrum
  • LPCC
  • MFCC
  • A/D Conversion
  • End point detection
  • Pre-emphasis filter
  • Segmentation
  • Choice of features
    • Differentiating factors b/w speakers include vocal tract shape and behavioral traits
    • Features should have high inter-speaker and low intra speaker variation

EE 516 Term Project, Fall 2003

our approach
Our Approach

Silence Removal

Cepstrum Coefficients

Cepstral Normalization

Long time average

Polynomial Function

Expansion

Reference Template

Dynamic Time Warping

Distance Computation

  • Preprocessing
  • Feature Extraction
  • Speaker model
  • Matching

EE 516 Term Project, Fall 2003

silence removal
Silence Removal
  • Preprocessing
  • Feature Extraction
  • Speaker model
  • Matching

EE 516 Term Project, Fall 2003

pre emphasis
Pre-emphasis
  • Preprocessing
  • Feature Extraction
  • Speaker model
  • Matching

EE 516 Term Project, Fall 2003

segmentation
Segmentation
  • Preprocessing
  • Feature Extraction
  • Speaker model
  • Matching
  • Short time analysis
    • The speech signal is segmented into overlapping ‘Analysis Frames’
    • The speech signal is assumed to be stationary within this frame

Q31

Q32

Q33

Q34

EE 516 Term Project, Fall 2003

feature representation
Feature Representation
  • Preprocessing
  • Feature Extraction
  • Speaker model
  • Matching

Speech signal and spectrum of two users uttering ‘ONE’

EE 516 Term Project, Fall 2003

vocal tract modeling
Vocal Tract modeling
  • Preprocessing
  • Feature Extraction
  • Speaker model
  • Matching

Signal Spectrum

Smoothened Signal Spectrum

  • The smoothened spectrum indciates the locations of the formants of each user
  • The smoothened spectrum is obtained by cepstral coefficients

EE 516 Term Project, Fall 2003

cepstral coefficients

Pitch

Av

P[n]

G(z)

V(z)

R(z)

y1‘[n]+y2‘[n]

x1‘[n]+x2‘[n]

x1[n]*x2[n]

y1[n]*y2[n]

D[]

L[]

D-1[]

u[n]

x1[n]*x2[n]

x1‘[n]+x2‘[n]

DFT[]

LOG[]

IDFT[]

AN

X1(z)X2(z)

log(X1(z)) + log(X2(z))

Cepstral coefficients
  • Preprocessing
  • Feature Extraction
  • Speaker model
  • Matching

EE 516 Term Project, Fall 2003

speaker model

F1 = [a1…a10,b1…b10]

F2 = [a1…a10,b1…b10]

…………….

…………….

FN = [a1…a10,b1…b10]

Speaker Model

EE 516 Term Project, Fall 2003

dynamic time warping
Dynamic Time Warping
  • Preprocessing
  • Feature Extraction
  • Speaker model
  • Matching
  • The DTW warping path in the n-by-m matrix is the path which has minimum average cumulative cost. The unmarked area is the constrain that path is allowed to go.

EE 516 Term Project, Fall 2003

results
Results
  • Distances are normalized w.r.t. length of the speech signal
  • Intra speaker distance less than inter speaker distance
  • Distance matrix is symmetric

EE 516 Term Project, Fall 2003

matlab implementation
Matlab Implementation

EE 516 Term Project, Fall 2003