Loading in 2 Seconds...

Model Formation and Classification Techniques For Conversation-based Speaker Discrimination

Loading in 2 Seconds...

- By
**eros** - Follow User

- 105 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'Model Formation and Classification Techniques For Conversation-based Speaker Discrimination' - eros

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Model Formation and Classification Techniques For Conversation-based Speaker Discrimination

### Acknowledgement

Advisor:

Robert Yantorno, Ph.D

Committee Members:

Brian Butz, Ph.D.

Dennis Silage, Ph.D.

Iyad Obeid, Ph.D.

Uchechukwu O. Ofoegbu

Advisor:

Robert Yantorno, Ph.D

Committee Members:

Brian Butz, Ph.D.

Dennis Silage, Ph.D.

Iyad Obeid, Ph.D.

My committee members, for your time and commitment to my research

The Air Force Research Labs, for financially supporting most of this research work

My family, for being there

Dr Y, the best advisor one could hope for

Members and Friends of the Speech Lab, for your valuable contributions

ECE faculty and staff, for your great support

The audience, for being a part of this

Presentation Outline

- Introduction
- Challenges of Conversational Data
- General Applications of Research
- Novelty of Research

- Introduction
- Evaluation Databases
- Modeling Speakers
- Traditional Speaker Modeling
- Proposed Method
- Features Used
- Distance Used

- Introduction
- Evaluation Databases
- Modeling Speakers
- Application Systems
- Unsupervised Speaker Indexing
- Speaker Count
- Generalized Speaker Indexing

- Introduction
- Evaluation Databases
- HTIMIT
- SWITCHBOARD
- New Conversations Database

- Introduction
- Evaluation Databases
- Modeling Speakers
- Application Systems
- Fusion of Distance Measures
- “Optimized T Distance
- Decision-Based Combination
- Weighted Decision-Based Combination

- Introduction
- Evaluation Databases
- Modeling Speakers
- Application Systems
- Fusion of Distance Measures
- Summary

- Introduction
- Evaluation Databases
- Modeling Speakers
- Application Systems
- Fusion of Distance Measures
- Summary
- Further Research

Advisor:

Robert Yantorno, Ph.D

Committee Members:

Brian Butz, Ph.D.

Dennis Silage, Ph.D.

Iyad Obeid, Ph.D.

Introduction

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research

Challenges of Conversational Data

- No a priori information available from participating speakers
- Training is impossible
- No a priori knowledge of change points
- Speakers alternate very rapidly
- Limited amounts of data for single speaker representations
- Distortion
- Channel noise, co-channel data

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research

Proposed Solutions

- Selective creation of data models
- Distance-Based Model Comparison
- Development of application-specific system

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research

Novelty of this Research

- Selective creation of data models
- Distance-Based Model Comparison
- Development of application-specific system

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research

Applications

- Monitoring criminal conversations
- Forensics
- Automated Customer Services
- Storage/Search/Retrieval of Audio Data
- Military Activities
- Conference calls

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research

Databases

- Standard Speaker Discrimination Databases
- HTMIT
- Switchboard
- Temple Conversations Database (TCD)

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research

Modeling Speakers

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research

Traditional Speaker Modeling

- Examples
- Gaussian Mixture Models
- Hidden Markov Models
- Neural Networks
- Prosody-Based Models
- Disadvantages
- Require large amounts
- Sometimes require training procedure
- Relatively complex

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research

Conversational Data Modeling

- Current Method
- Equal segmentation of data
- Indiscriminate use of data
- Problems
- Change points unknown
- Not all speech is useful
- Poor performance

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research

V

U

V

U

V

…

U

V

U

V

S

V

. . .

V

V

V

V

V

V

MEAN AND COVARIANCE MATRIX COMPUTATION

MEAN AND COVARIANCE MATRIX COMPUTATION

Proposed Speaker ModelingIntroduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research

SEGMENT 1

SEGMENT M

FEATURE COMPUTATION

FEATURE COMPUTATION

. . .

MODEL 1

MODEL M

Proposed Speaker Modeling

- Why voiced only?
- Same speech class compared
- Contains the most information
- What’s the appropriate number of phonemes?
- Large enough to sufficiently represent speakers
- Small enough to avoid speaker overlap

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research

Features Considered

- Linear Predictive Cepstral Coefficients
- Model the vocal tract
- Mel-Scale Frequency Cepstral Coefficients
- Model the human auditory system

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research

Distance Measurements

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research

Different speaker distances

Same speaker distances

Distances Used

- Mahalanobis Distance
- Hotelling’s T-Square Statistics
- Kullback-Leibler Distance
- Bhattacharyya Distance
- Levene’s Test

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research

Analysis of Cepstral Features

- Mahalanobis Distance

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research

Best Number of Phonemes?

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research

Number of Phonemes

Features Used - LPCC

Application Systems

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research

Unsupervised Speaker Indexing

- The Restrained-Relative Minimum Distance (RRMD) Approach

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research

REFERENCE MODELS

0 D1,2 D1,3 …

D2,1 0 D2,3 …

D3,1 D3,2 0 …

…

0 D1,2 D1,3 …

D2,1 0 D2,3 …

D3,1D3,2 0 …

…

Unsupervised Speaker Indexing

- The Restrained-Relative Minimum Distance (RRMD) Approach

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research

Observe distance

Reference 1

Reference 2

Unusable Data

Failed

Min. Distance

Failed

Relative

Distance

Condition

Passed

Restraining

Condition

Same Speaker?

Same Speaker

Passed

RRMD Approach

- Restraining Condition
- Distance Likelihood Ratio

DLR > 1 Same Speaker

DLR < 1 Check Relative

Distance Condition

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research

Reference 2

RRMD Approach- Relative Distance Condition
- Relative Distance:

Drel = dmax – dmin

- Drel > threshold

Same Speaker

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research

dmin

dmax

Experiments and Results

- Experiments
- HTIMIT used for obtaining likelihood ratio parameters
- 1000 same speaker and 1000 different speaker utterances computed
- 100 conversations from Switchboard database used for evaluation

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research

Indexing Results - Mahalanobis

LPCC

MFCC

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research

Indexing Results – T-Square

LPCC

MFCC

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research

Indexing Results - Bhattacharyya

LPCC

MFCC

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research

Indexing Results - Summary

- Mahalanobis distance yielded best results
- LPCCs outperformed MFCCs

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research

Reference Model Selected Randomly

Reference Model Selected Randomly

Reference Model Selected Randomly

Speaker Count System- The Residual Ratio Algorithm (RRA)
- Process is repeated K-1 times for counting up to K speakers

Too little data

Removed, select

Another

model

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research

DLR-based Model Comparison

DLR-based Model Comparison

. . .

Speaker Count

- Added Residual Ratio:
- Is the sum of the residual ratios in all elimination stages
- Should be higher for greater number of speakers

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research

Experiments and Results

- Experiments
- 4000 conversations generated from HTIMIT
- All 40 conversations from new database used

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research

Speaker Count Results - HTIMIT

LPCC

MFCC

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research

Speaker Count Results - HTIMIT

LPCC

MFCC

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research

Speaker Count Results – TCD

LPCC

MFCC

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research

Speaker Count Results – TCD

LPCC

MFCC

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research

Cross Evaluation

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research

HTIMIT – LPCCs with the WDBC

TCD – MFCCs with the T-Square

Speaker Counting-Indexing

- The Residual Ratio speaker count algorithm is applied
- Test models are associated with their matching reference models
- Unmatched models are assigned to the references from which it has the minimum distance.

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research

Speaker Counting /Indexing Results

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research

Solid - HTMIT; Patterned – TCD

Fusion of Distance Measures

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research

Correlation Analysis

Draftsman’s Display - LPCC

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research

“Best Distance”

- Optimal Criteria for Fusion of Distances
- Maximize inter-speaker variation
- Minimize intra-speaker variation
- Maximize T-test value between inter-class distance distributions

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research

Decision Level Fusion

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research

D1 => match

D2 => no match

Match = ¾

No Match = ¼

Final Decision = Match

D3 => match

D4 => match

Weighted Decision Level Fusion

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research

Ti = T-value corresponding to each distance

Summary

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research

Research Goal

- To differentiate between speakers in a conversation
- To determine the number of speakers present
- To determine who is speaking when
- To overcome the following challenges
- No a priori information
- Limited data size
- No knowledge of change points
- Co-channel speech

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research

Summary of Accomplishments

- Novel model formation technique
- Three novel approaches for conversations-based speaker differentiation
- Distance combination techniques to enhance performance

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research

Observations

- Mahalanobis Distance, LPCCs optimal for standard databases
- T-Square Distance, MFCCs optimal for new database
- Best fusion technique: Weighted voting combination technique most efficient

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research

Conclusion

- Developed system yields about 6% EER whereas state of the art speaker indexing systems yield about 10% error rate.
- Methods for discrimination between speakers (speaker count or indexing) in CONVERSATIONS with more than two speakers have been introduced.

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research

Further Research

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research

Further Research

- Investigation of prosodic speaker discrimination features
- Improving model formation technique by determining speaker change-points a priori
- Exploring the use of individual phonemes to form models

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research

Further Research, cont’d

- Investigating the use of unvoiced speech, cautiously, in the formation of models
- Speech enhancement techniques to handle distorted data
- Implementation of other fusion techniques such as KL measure of divergence

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research

Publications

- U. Ofoegbu, A. Iyer, R. Yantorno and S. Wenndt, “Unsupervised Indexing of Noisy conversations with Short Speaker Utterances”, IEEE Aerospace Conference. March, 2007
- U. Ofoegbu, A. Iyer, R. Yantorno, “Detection of a Third Speaker in Telephone Conversations”, ICSLP, INTERSPEECH 2006
- U. Ofoegbu, A. Iyer, R. Yantorno, “A Simple Approach to Unsupervised Speaker Indexing”, IEEE ISPACS. 2006.
- U. Ofoegbu, A. Iyer, R. Yantorno, “A Speaker Count System for Telephone Conversations”, IEEE ISPACS. 2006.

Advisor:

Robert Yantorno, Ph.D

Committee Members:

Brian Butz, Ph.D.

Dennis Silage, Ph.D.

Iyad Obeid, Ph.D.

ACKNOWLEDGEMENT

To the greatest teacher in the world, and the one who has made the most impact in my life, Dr. Robert E. Yantorno.

To my best friend and the love of my life, Dr. Jude C. Abanulo

To Dr. Brian Butz, Dr. John Helferty, Dr. Saroj Biswas and Dr. Henry Sendaula

To my dissertation committee members, Dr. Iyad Obeid and Dr. Dennis Silage, and to Dr. Rena Krakow.

To my friend, Ananth Iyer

To Abdoul Fall, Joe Fitschgrund, Angela Linse and Ralph Oyini; and to the members of the Speech Processing Lab and the faculty of the electrical engineering department

To engineering administrators, Tamika Butler, Carol Dahlberg, Yvette Gibson and Cheryl Sharp, and to Louise, day time janitress for the engineering building

To the Temple students who volunteered as participants in the New Conversations Database

To Temple

To the Air Force Research Labs at Rome – Financial supporters of most of the research

To my parents, Ugo & Joseph Ofoegbu; my siblings, Amaka & Humphrey Onyendi, Nene, Obinna and Chibuzor Ofoegbu; and my grandmother, Cordelia Osuji

To God

Thank you.

Advisor:

Robert Yantorno, Ph.D

Committee Members:

Brian Butz, Ph.D.

Dennis Silage, Ph.D.

Iyad Obeid, Ph.D.

Robert Yantorno, Ph.D

Committee Members:

Brian Butz, Ph.D.

Dennis Silage, Ph.D.

Iyad Obeid, Ph.D.

Brett Smolenski, Ph.D.

Extra Slides

Cepstral Analysis

Frequency Analysis of Speech

Excitation Component

Vocal Tract Component

STFT of Speech

Slowly varying formants

Fast varying harmonics

=

X

Log of STFT

Log of Excitation

Log of Vocal Tract Component

=

+

IDFT of Log of STFT

Excitation

Vocal tract

+

=

Cepstral Features

- Linear Predictive Cepstral Coefficients
- Obtained Recursively from LPC Coefficients

Let LPC vector = [a0 a1 a2 …ap] and

LPCC vector = [c0 c1 c2 …cp c0 … c1 c2 …cn-1]

Conversational Data Modeling

- Current Method
- Equal Segmentation of Data
- Indiscriminate use of data
- Problems
- Change points unknown
- Not all speech is useful

“Best Distance”

- Intra-speaker and inter-speaker distance lengths are always equal, therefore:

P = sum of the covariance matrices of the two classes.

λ1 = maximum eigenvalue obtained by solving the

generalized eigenvalue problem:

Q = is the square of the distance between the mean vectors

of the two classes

RRMD Approach

- Relative Distance Condition

Distance Measures

- Mahalanobis Distance
- Measures the separation between the means of both classes
- Hotelling’s T-Square Statistics
- Measures the separation between the means of both classes and takes into consideration the data lengths
- Kullback-Leibler Distance
- Measures the separation between the distribution of both classes
- Bhattacharyya Distance
- Derived from measuring the classification error between both classes
- Levene’s Test
- Measures absolute deviation from the center of the class distribution

Feature Extraction

Model Building

Test Speech

Feature

Extraction

Recognition

Decision

Comparison

Speaker Recognition- Speaker Identification
- Who is this speaker?
- Speaker Verification
- Is he who he claims to be?

System

Output

Speaker Segmentation

- Broadcast News/Conference Data
- Conversational Data

Speaker B

Utterances

from Speaker A

Randomly Select Utterance

Randomly Select 2 Utterances

Utterance 1

Window Data

Window Data

Compute

Feature

Compute Feature

Utterance 2

Compute Distance

Compute Distance

Randomly Select Utterance

Window Data

Window Data

Compute Feature

Compute

Feature

Procedural Set-upIntra-speaker distance computations

- 384-Speaker database used
- Average Utterance Length = 5 seconds

Inter-speaker distance computations

Best ‘N’ Estimation

- 245 conversations from SWITCHBOARD used
- Results shown for T-Square distance

Addressing the

Challenges

Applications

Methods

Modeling Speakers

Speaker Indexing

Speaker Count

Speaker Count-Indexing

Fusion of Distances

Evaluation

Summary and Further

Research

N = 5

RRA Examples – 2 Speakers

Addressing the

Challenges

Applications

Methods

Modeling Speakers

Speaker Indexing

Speaker Count

Speaker Count-Indexing

Fusion of Distances

Evaluation

Summary and Further

Research

RRA Examples – 3 Speakers

Addressing the

Challenges

Applications

Methods

Modeling Speakers

Speaker Indexing

Speaker Count

Speaker Count-Indexing

Fusion of Distances

Evaluation

Summary and Further

Research

Comparison

TWO-SPEAKER RESIDUAL

THREE-SPEAKER RESIDUAL

Addressing the

Challenges

Applications

Methods

Modeling Speakers

Speaker Indexing

Speaker Count

Speaker Count-Indexing

Fusion of Distances

Evaluation

Summary and Further

Research

Residual Ratio after 2nd round of RRA

Residual Ratio after 2nd round of RRA

Speaker 2

Effects of Fusion

LPCCs

Addressing the

Challenges

Applications

Methods

Modeling Speakers

Speaker Indexing

Speaker Count

Speaker Count-Indexing

Fusion of Distances

Evaluation

Summary and Further

Research

Effects of Fusion

LPCCs

Addressing the

Challenges

Applications

Methods

Modeling Speakers

Speaker Indexing

Speaker Count

Speaker Count-Indexing

Fusion of Distances

Evaluation

Summary and Further

Research

Effects of Fusion

MFCCs

Addressing the

Challenges

Applications

Methods

Modeling Speakers

Speaker Indexing

Speaker Count

Speaker Count-Indexing

Fusion of Distances

Evaluation

Summary and Further

Research

Effects of Fusion

MFCCs

Addressing the

Challenges

Applications

Methods

Modeling Speakers

Speaker Indexing

Speaker Count

Speaker Count-Indexing

Fusion of Distances

Evaluation

Summary and Further

Research

Best Feature Size

Addressing the

Challenges

Applications

Methods

Modeling Speakers

Speaker Indexing

Speaker Count

Speaker Count-Indexing

Fusion of Distances

Evaluation

Summary and Further

Research

Best Feature Size

Addressing the

Challenges

Applications

Methods

Modeling Speakers

Speaker Indexing

Speaker Count

Speaker Count-Indexing

Fusion of Distances

Evaluation

Summary and Further

Research

Correlation Analysis

Draftsman’s Display - MFCC

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research

Robert Yantorno, Ph.D

Committee Members:

Brian Butz, Ph.D.

Dennis Silage, Ph.D.

Iyad Obeid, Ph.D.

Brett Smolenski, Ph.D.

Download Presentation

Connecting to Server..