speaker verification system part b final presentation l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Speaker Verification System Part B Final Presentation PowerPoint Presentation
Download Presentation
Speaker Verification System Part B Final Presentation

Loading in 2 Seconds...

play fullscreen
1 / 40

Speaker Verification System Part B Final Presentation - PowerPoint PPT Presentation


  • 275 Views
  • Uploaded on

Speaker Verification System Part B Final Presentation. Performed by: Barak Benita & Daniel Adler Instructor: Erez Sabbag . Implementation of a speaker verification algorithm on a DSP The verification module will perform a real time authentication of the user based on sampled voice data.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Speaker Verification System Part B Final Presentation' - emily


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
speaker verification system part b final presentation

Speaker Verification SystemPart B Final Presentation

Performed by: Barak Benita & Daniel Adler

Instructor: Erez Sabbag

the project goal
Implementation of a speaker verification algorithm on a DSP

The verification module will perform a real time authentication of the user based on sampled voice data.

The idea is to integrate the speaker verification model with other security and management models allowing them to grant access to resources based on the speakers voice verification.

The Project Goal
introduction
Introduction

Speaker verification is the process of automatically authenticating the speaker on the basis of individual information included in speech waves.

Speaker’s Identity (Reference)

Speaker

Verification

System

Speaker’s

Voice

Segment

Result [0:1]

system overview
System Overview

BT

Base Station

Access

Denied

My name is Bob!

LAN

Speaker

Verification

Unit

Server

BT

Base Station

LAN

system description
System Description

The system is compound from TI’s C6701floating point DSP with the speaker verification algorithm on it. A user with a hand device (e.g. bluetooth on a PDA), will receive access to different resources ( door opening, file access, etc) based on a voice verification process.

The project implements only the speaker verification algorithm on the DSP and has input and output interfaces to interact with other devices (e.g. Bluetooth).

The DSP is encoded with the users voice signature. Each time user verification is needed, the algorithm compares the speakers voice with the signature.

3

system block diagram
System Block Diagram

DSP

Signature

parameters

Enrollment

Server

(training phase – building

A signature)

Codec

Verification Channel

Voice Channel

(optional)

Bluetooth

Radio

Interface

  • Bluetooth
  • unit

Codec

Bluetooth

Base station

Authorization

Server

LAN

Voice Channel

(optional)

Voice Channel

“My name is Bob”

5

project description
Project Description:
  • Part One:
  • Literature review
  • Algorithms selection
  • MATLAB implementation
  • Result analysis
  • Part Two:
  • Implementation of the chosen algorithm on a DSP
speaker verification process
Speaker Verification Process

Analog Speech

Pre-Processing

Feature

Extraction

Pattern

Matching

Reference

Model

Decision

Result [0:1]

implemented algorithms feature extraction module mfcc
Implemented Algorithms: Feature Extraction Module – MFCC

MFCC (Mel Frequency Cepstral Coefficients) is the most common technique for feature extraction. MFCC tries to mimic the way our ears work by analyzing the speech waves linearly at low frequencies and logarithmically at high frequencies.

The idea acts as follows:

Spectrum

Mel Spectrum

Cepstrum

Mel Cepstrum

FFT

Mel-frequency

Wrapping

Windowed

PDS Frame

implemented algorithms pattern matching modeling module vector quantization vq

Feature Vector

Reference

Model = Codebook

Pattern Matching

=

Distortion measure

Distortion Rate

Implemented Algorithms:Pattern Matching Modeling Module – Vector Quantization (VQ)

In the enrolment part we build a codebook of the speaker according to the LBG (Linde, Buzo, Gray) algorithm, which creates an N size codebook from set of L feature vectors.

In the verification stage, we are measuring the distortion of the given sequence of the feature vectors to the reference codebook.

implemented algorithms decision
Implemented Algorithms: Decision

In VQ the decision is based on checking if the distortion rate is higher than a preset threshold: acceptance if distortion rate > t, else rejection.

In this project no decision model will be build, the output of the system will be based on the following score rate (values between 0 to 1), which indicates the suitability of the person to the reference model:

Score = exp (-mean distance)

implementation environment
Implementation Environment
  • Hardware tools:
  • TI DSP 6701 EVM board
  • PC host station
  • Software development tools:
  • TI Code Composer
  • Matlab 6.1
  • Programming Languages:
  • C
  • Assembler
  • Matlab
ti dsp 6701 evm
TI DSP 6701 EVM
  • Why?
  • Floating Point
  • Designed Especially for Voice Applications
  • Large Bank of On Chip Memory
  • High level development (C)
  • PCI Interface
  • Why Not?
  • Price
  • Size
  • Consumption
program workflow
Program Workflow

Analog Speech (input)

DSP

Program

Pre-Processing

Feature

Extraction

MATLAB

Program

Reference

Model

Pattern

Matching

Decision

Result [0:1] (output)

step by step implementation
Step By Step Implementation
  • Pre-processing a ‘ones’ vector on the DSP and comparing it to the Matlab results
  • Pre-processing an audio file and comparing to the Matlab results
  • Feature extracting of the audio file (after pre-processing) and comparing to the Matlab results
  • Pattern matching the feature vectors to a ‘ones’ codebook matrix and comparing to the Matlab results (running with the same codebook)
  • Creating a real codebook from a reference speaker importing it to the DSP and comparing the running results of the DSP and the Matlab
  • Verifying that the distances of the speakers from the codebook in the DSP program and in the Matlab program are the same
creating the assembler lookup files
Creating the Assembler Lookup Files
  • Creating the output data through Matlab functions (e.g. hamming(n))
  • Saving the output in an assembler lookup table format
  • Referencing the lookup table with a name that will be called from the C source code in the DSP project (as a function)

hamming = fopen('hamming.asm', 'wt', 'l');

fprintf(hamming, '; hamming.asm - single precision floating point table generated from MATLAB\n');

fprintf(hamming, '\t.def\t_hamming\n');

fprintf(hamming, '\t.sym\t_hamming, _hamming, 54, 2, %d,, %d\n', size, n);

fprintf(hamming, '\t.data\n');

fprintf(hamming, '_hamming:\n');

fprintf(hamming, '\t.word\t%tXh, %tXh, %tXh, %tXh\n', h);

fprintf(hamming, '\n');

fclose(hamming);

  • Importing the file as an asm file (adding a file to the project) to the DSP project

; hamming.asm - single precision floating point table generated from MATLAB

.def _hamming

.sym _hamming, _hamming, 54, 2, 8192,, 256

.data

_hamming:

.word 3DA3D70Ah, 3DA4203Fh, 3DA4FBD3h, 3DA669A4h

.word 3DA86978h, 3DAAFB01h, 3DAE1DD8h, 3DB1D180h

.word 3DB61567h, 3DBAE8E1h, 3DC04B30h, 3DC63B7Dh

.word 3DCCB8DCh, 3DD3C24Bh, 3DDB56B1h, 3DE374E1h

.word 3DEC1B99h, 3DF5497Fh, 3DFEFD27h, 3E049A87h

.word 3E09F7D0h, 3E0F9597h, 3E1572FFh, 3E1B8F1Ch

h = hamming(n);

  • Using the lookup table in the C source code

// ----- Windowing the filtered frame with Hamming ----

for (k=0 ; k < N ; k++){

for (j=0 ; j < N ; j++){

if (k - j < 0) break;

frame[k] += hamming[j]*filtered_frame[k-j];

}

}

binding all the pieces

Generation of assembly functions through Matlab

Generation of voice data file from a *.wav format file through Matlab waveread function

Hamming.asm

Generation of assembly functions through Matlab

Sari5fix.asm

Melbank.asm

Rdct.asm

Generation of assembly functions through Matlab

Codebook.asm

Binding All The Pieces

Analog Speech (input)

DSP

Program

C Code

Pre-Processing

Feature

Extraction

Pattern

Matching

Decision

Result [0:1] (output)

software modules
Software Modules

main

init

O(1)

extract_frame

O(n^2)

digitrev_index

O(n)

bitrev

O(n)

calc_dist

O(1)

hamming

O(1)

bitrev

O(n)

melbank

O(n)

cfftr2_dit

O(nlog(n))

project structure

speakerverification.pjt

Project Structure

Include

Files

board.h

codec.h

dma.h

intr.h

mcbsp.h

link.cmd

pci.h

regs.h

Libraries

verification.h

rts6700.lib

Source

bitrevf.asm

cfftr2.asm

codebook.asm

digitrev_index.c

hamming.asm

melbank.asm

rdct.asm

verification.c

tested system
Tested System
  • The Tested System parameters:
  • The tested algorithms and methods were the MFCC and VQ with the following parameters:
    • Sampling Frequency: 11025Hz
    • Feature Vector Size: 18
    • Window Size: 256
    • Offset Size: 128
    • Codebook Size: 128
    • Number of iterations for codebook creation: 25
  • We compared between the Matlab and DSP results based on a codebook created from Daniel’s 60 seconds of random speech and random selection of different five seconds speakers.
verifications
Verifications
  • The DSP results were compared to the Matlab simulation.
  • We chose random speakers from the speakers DB with one reference codebook.
  • For Example:
  • PersonMATLABDSP
  • Daniel 66.95% (0.4011) 66.95% (0.4011)
  • Barak 44.01% (0.8206) 44.01% (0.8206)
  • Ayelet 43.61% (0.8299) 43.61% (0.8299)
  • Diego 53.97% (0.6166) 53.97% (0.6166)
  • Adi 42.07% (0.8656) 42.07% (0.8656)
conclusions
Conclusions
  • The TI DSP 6701 EVM is capable of preforming speaker
  • verification analysis and achieve high resolution results (as
  • achieved in the Matlab)
  • Speaker Verification algorithms are not mature enough to
  • become a good biometric detection solution
  • Code Composer is not stable and good enough to become an “easy
  • to use” development environment
  • A second phase project, which will implement a complete verification system should be build
time table first semester
Time Table – First Semester

14.11.01 – Project description presentation

15.12.01 – completion of phase A: literature review and algorithm selection

25.12.01 – Handing out the mid-term report

25.12.01 – Beginning of phase B: algorithm implementation in MATLAB

10.04.02 – Publishing the MATLAB results and selecting the algorithm that will be implemented on the DSP

time table second semester
Time Table – Second Semester

10.04.02 – Presenting the progress and planning of the project to the supervisor

17.04.02 – Finishing MATLAB Testing

17.04.02 – The beginning of the implementation on the DSP

07.11.02 – Project presentation and handing the project final report

slide28

Pre-Processing (step 1)

Analog Speech

Windowed PDS Frames

[1, 2, … , N]

Pre-Processing

slide29

Pre-Processing module

Analog Speech

Anti aliasing filter to avoid aliasing during sampling. LPF [0, Fs/2]

LPF

Band Limited

Analog Speech

Analog to digital converter with frequency sampling (Fs) of [10,16]KHz

A/D

Digital Speech

Low order digital system to spectrally flatten the signal (in favor of vocal tract parameters), and make it less susceptible to later finite precision effects

First Order FIR

Pre-emphasized

Digital Speech

(PDS)

Frame Blocking

Frame blocking of the sampled signal. Each frame is of N samples overlapped with N-M samples of the previous frame. Frame rate ~ 100 Frames/Sec

N values: [200,300], M values: [100,200]

PDS Frames

Frame

Windowing

Using Hamming (or Hanning or Blackman) windowing in order to minimize the signal discontinuities at the beginning and end of each frame.

Windowed PDS Frames

slide30

Feature Extraction (step 2)

Windowed PDS Frames

Set of Feature Vectors

[1, 2, … , N]

[1, 2, … , K]

Feature

Extraction

Extracting the features of speech from each frame and representing it in a vector (feature vector).

slide31

Pattern Matching Modeling (step 3)

  • The pattern matching modeling techniques is divided into two sections:
    • The enrolment part, in which we build the reference model of
    • the speaker.
    • The verifications (matching) part, where the users will be
    • compared to this model.
slide32

Enrollment part – Modeling

Set of Feature Vectors

[1, 2, … , K]

Modeling

Speaker Model

This part is done outside the DSP and the DSP receives only the speaker model (calculated offline in a host).

slide33

Pattern Matching

Speaker Model

Set of Feature Vectors

[1, 2, … , K]

Pattern

Matching

Matching Rate

slide34

Decision Module (Optional)

In VQ the decision is based on checking if the distortion rate is higher than a preset threshold:

if distortion rate > t, Output = Yes,

else Output = No.

In HMM the decision is based on checking if the probability score is higher than a preset threshold:

if probability scores > t, Output = Yes,

else Output = No.

slide35

The Voice Database

  • Two reference models were generated (one male and one female),
  • each model was trained in 3 different ways:
      • repeating the same sentence for 15 seconds
      • repeating the same sentence for 40 seconds
      • reading random text for one minute
  • The voice database is compound from 10 different speakers (5
  • males and 5 females), each speaker was recorded in 3 ways:
      • repeating the reference sentence once (5 seconds)
      • repeating the reference sentence 3 times (15 seconds)
      • speaking a random sentence for 5 seconds
slide36

Experiment Description Cont.

Conclusions:

Window size of 330 and offset of 110 samples performs better than window size of 256 and offset of 128 samples

slide37

Experiment Description Cont.

Conclusions:

Feature vector of 18 coeffs is better than feature vector of 12 coeffs

slide38

Experiment Description Cont.

  • Conclusions:
  • Worst combinations:
        • 5 seconds of fixed sentence for testing with an
        • enrolment of 15 seconds of the same sentence.
        • 5 seconds of fixed sentence for testing with an
        • enrolment of 40 seconds of the same sentence.
  • Best combinations:
        • 15 seconds of fixed sentence for testing with an
        • enrolment of 40 seconds of the same sentence.
        • 15 seconds of fixed sentence for testing with an
        • enrolment of 60 seconds of random sentences.
        • 5 seconds of a random sentence with an
        • enrolment of 60 seconds of random sentences.
slide40

Additional verification results

  • The DSP results were compared to the Matlab simulation.
  • We chose random speakers from the speakers DB with one reference codebook.
  • For Example:
  • PersonMATLABDSP
  • Alex 69.58% (0.3627) 69.58% (0.3627)
  • Sari 61.66% (0.4835) 61.66% (0.4835)
  • Roee 49.97% (0.6938) 49.97% (0.6938)
  • Eran 54.75% (0.6023) 54.75% (0.6023)
  • Hila 55.72% (0.5849) 55.72% (0.5849)