Model formation and classification techniques for conversation based speaker discrimination
Download
1 / 79

Model Formation and Classification Techniques For Conversation-based Speaker Discrimination - PowerPoint PPT Presentation


  • 104 Views
  • Uploaded on

Model Formation and Classification Techniques For Conversation-based Speaker Discrimination. Advisor: Robert Yantorno, Ph.D Committee Members: Brian Butz, Ph.D. Dennis Silage, Ph.D. Iyad Obeid, Ph.D. Uchechukwu O. Ofoegbu. Acknowledgement . Advisor: Robert Yantorno, Ph.D

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Model Formation and Classification Techniques For Conversation-based Speaker Discrimination' - eros


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Model formation and classification techniques for conversation based speaker discrimination

Model Formation and Classification Techniques For Conversation-based Speaker Discrimination

Advisor:

Robert Yantorno, Ph.D

Committee Members:

Brian Butz, Ph.D.

Dennis Silage, Ph.D.

Iyad Obeid, Ph.D.

Uchechukwu O. Ofoegbu


Acknowledgement

Acknowledgement Conversation-based Speaker Discrimination

Advisor:

Robert Yantorno, Ph.D

Committee Members:

Brian Butz, Ph.D.

Dennis Silage, Ph.D.

Iyad Obeid, Ph.D.

My committee members, for your time and commitment to my research

The Air Force Research Labs, for financially supporting most of this research work

My family, for being there

Dr Y, the best advisor one could hope for

Members and Friends of the Speech Lab, for your valuable contributions

ECE faculty and staff, for your great support

The audience, for being a part of this


Presentation outline
Presentation Outline Conversation-based Speaker Discrimination

  • Introduction

    • Challenges of Conversational Data

    • General Applications of Research

    • Novelty of Research

  • Introduction

  • Evaluation Databases

  • Modeling Speakers

    • Traditional Speaker Modeling

    • Proposed Method

    • Features Used

    • Distance Used

  • Introduction

  • Evaluation Databases

  • Modeling Speakers

  • Application Systems

    • Unsupervised Speaker Indexing

    • Speaker Count

    • Generalized Speaker Indexing

  • Introduction

  • Evaluation Databases

    • HTIMIT

    • SWITCHBOARD

    • New Conversations Database

  • Introduction

  • Evaluation Databases

  • Modeling Speakers

  • Application Systems

  • Fusion of Distance Measures

    • “Optimized T Distance

    • Decision-Based Combination

    • Weighted Decision-Based Combination

  • Introduction

  • Evaluation Databases

  • Modeling Speakers

  • Application Systems

  • Fusion of Distance Measures

  • Summary

  • Introduction

  • Evaluation Databases

  • Modeling Speakers

  • Application Systems

  • Fusion of Distance Measures

  • Summary

  • Further Research

Advisor:

Robert Yantorno, Ph.D

Committee Members:

Brian Butz, Ph.D.

Dennis Silage, Ph.D.

Iyad Obeid, Ph.D.


Introduction Conversation-based Speaker Discrimination

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research


Challenges of conversational data
Challenges of Conversational Data Conversation-based Speaker Discrimination

  • No a priori information available from participating speakers

    • Training is impossible

  • No a priori knowledge of change points

  • Speakers alternate very rapidly

    • Limited amounts of data for single speaker representations

  • Distortion

    • Channel noise, co-channel data

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research


Proposed solutions
Proposed Solutions Conversation-based Speaker Discrimination

  • Selective creation of data models

  • Distance-Based Model Comparison

  • Development of application-specific system

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research


Novelty of this research
Novelty of this Research Conversation-based Speaker Discrimination

  • Selective creation of data models

  • Distance-Based Model Comparison

  • Development of application-specific system

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research


Applications
Applications Conversation-based Speaker Discrimination

  • Monitoring criminal conversations

  • Forensics

  • Automated Customer Services

  • Storage/Search/Retrieval of Audio Data

  • Military Activities

  • Conference calls

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research


Databases
Databases Conversation-based Speaker Discrimination

  • Standard Speaker Discrimination Databases

    • HTMIT

    • Switchboard

  • Temple Conversations Database (TCD)

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research


Modeling Speakers Conversation-based Speaker Discrimination

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research


Traditional speaker modeling
Traditional Speaker Modeling Conversation-based Speaker Discrimination

  • Examples

    • Gaussian Mixture Models

    • Hidden Markov Models

    • Neural Networks

    • Prosody-Based Models

  • Disadvantages

    • Require large amounts

    • Sometimes require training procedure

    • Relatively complex

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research


Conversational data modeling
Conversational Data Modeling Conversation-based Speaker Discrimination

  • Current Method

    • Equal segmentation of data

    • Indiscriminate use of data

  • Problems

    • Change points unknown

    • Not all speech is useful

    • Poor performance

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research


Proposed speaker modeling

S Conversation-based Speaker Discrimination

V

U

V

U

V

U

V

U

V

S

V

. . .

V

V

V

V

V

V

MEAN AND COVARIANCE MATRIX COMPUTATION

MEAN AND COVARIANCE MATRIX COMPUTATION

Proposed Speaker Modeling

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research

SEGMENT 1

SEGMENT M

FEATURE COMPUTATION

FEATURE COMPUTATION

. . .

MODEL 1

MODEL M


Proposed speaker modeling1
Proposed Speaker Modeling Conversation-based Speaker Discrimination

  • Why voiced only?

    • Same speech class compared

    • Contains the most information

  • What’s the appropriate number of phonemes?

    • Large enough to sufficiently represent speakers

    • Small enough to avoid speaker overlap

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research


Features considered
Features Considered Conversation-based Speaker Discrimination

  • Linear Predictive Cepstral Coefficients

    • Model the vocal tract

  • Mel-Scale Frequency Cepstral Coefficients

    • Model the human auditory system

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research


Distance measurements
Distance Measurements Conversation-based Speaker Discrimination

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research

Different speaker distances

Same speaker distances


Distances used
Distances Used Conversation-based Speaker Discrimination

  • Mahalanobis Distance

  • Hotelling’s T-Square Statistics

  • Kullback-Leibler Distance

  • Bhattacharyya Distance

  • Levene’s Test

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research


Analysis of cepstral features
Analysis of Cepstral Features Conversation-based Speaker Discrimination

  • Mahalanobis Distance

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research


Best number of phonemes
Best Number of Phonemes? Conversation-based Speaker Discrimination

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research

Number of Phonemes

Features Used - LPCC


Application Systems Conversation-based Speaker Discrimination

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research


Unsupervised speaker indexing
Unsupervised Speaker Indexing Conversation-based Speaker Discrimination

  • The Restrained-Relative Minimum Distance (RRMD) Approach

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research

REFERENCE MODELS

0 D1,2 D1,3 …

D2,1 0 D2,3 …

D3,1 D3,2 0 …

0 D1,2 D1,3 …

D2,1 0 D2,3 …

D3,1D3,2 0 …


Unsupervised speaker indexing1
Unsupervised Speaker Indexing Conversation-based Speaker Discrimination

  • The Restrained-Relative Minimum Distance (RRMD) Approach

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research

Observe distance

Reference 1

Reference 2

Unusable Data

Failed

Min. Distance

Failed

Relative

Distance

Condition

Passed

Restraining

Condition

Same Speaker?

Same Speaker

Passed


Rrmd approach
RRMD Approach Conversation-based Speaker Discrimination

  • Restraining Condition

    • Distance Likelihood Ratio

      DLR > 1  Same Speaker

      DLR < 1  Check Relative

      Distance Condition

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research


Rrmd approach1

Reference 1 Conversation-based Speaker Discrimination

Reference 2

RRMD Approach

  • Relative Distance Condition

    • Relative Distance:

      Drel = dmax – dmin

    • Drel > threshold

       Same Speaker

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research

dmin

dmax


Experiments and results
Experiments and Results Conversation-based Speaker Discrimination

  • Experiments

    • HTIMIT used for obtaining likelihood ratio parameters

      • 1000 same speaker and 1000 different speaker utterances computed

    • 100 conversations from Switchboard database used for evaluation

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research


Indexing results mahalanobis
Indexing Results - Mahalanobis Conversation-based Speaker Discrimination

LPCC

MFCC

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research


Indexing results t square
Indexing Results – T-Square Conversation-based Speaker Discrimination

LPCC

MFCC

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research


Indexing results bhattacharyya
Indexing Results - Bhattacharyya Conversation-based Speaker Discrimination

LPCC

MFCC

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research


Indexing results summary
Indexing Results - Summary Conversation-based Speaker Discrimination

  • Mahalanobis distance yielded best results

  • LPCCs outperformed MFCCs

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research


Speaker count system

Reference Model Selected Randomly Conversation-based Speaker Discrimination

Reference Model Selected Randomly

Reference Model Selected Randomly

Speaker Count System

  • The Residual Ratio Algorithm (RRA)

  • Process is repeated K-1 times for counting up to K speakers

Too little data

Removed, select

Another

model

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research

DLR-based Model Comparison

DLR-based Model Comparison

. . .


Speaker count
Speaker Count Conversation-based Speaker Discrimination

  • Added Residual Ratio:

  • Is the sum of the residual ratios in all elimination stages

  • Should be higher for greater number of speakers

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research


Experiments and results1
Experiments and Results Conversation-based Speaker Discrimination

  • Experiments

    • 4000 conversations generated from HTIMIT

    • All 40 conversations from new database used

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research


Speaker count results htimit
Speaker Count Results - HTIMIT Conversation-based Speaker Discrimination

LPCC

MFCC

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research


Speaker count results htimit1
Speaker Count Results - HTIMIT Conversation-based Speaker Discrimination

LPCC

MFCC

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research


Speaker count results tcd
Speaker Count Results – TCD Conversation-based Speaker Discrimination

LPCC

MFCC

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research


Speaker count results tcd1
Speaker Count Results – TCD Conversation-based Speaker Discrimination

LPCC

MFCC

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research


Cross evaluation
Cross Evaluation Conversation-based Speaker Discrimination

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research

HTIMIT – LPCCs with the WDBC

TCD – MFCCs with the T-Square


Speaker counting indexing
Speaker Counting-Indexing Conversation-based Speaker Discrimination

  • The Residual Ratio speaker count algorithm is applied

  • Test models are associated with their matching reference models

  • Unmatched models are assigned to the references from which it has the minimum distance.

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research


Speaker counting indexing results
Speaker Counting /Indexing Results Conversation-based Speaker Discrimination

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research

Solid - HTMIT; Patterned – TCD


Fusion of Distance Measures Conversation-based Speaker Discrimination

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research


Correlation analysis
Correlation Analysis Conversation-based Speaker Discrimination

Draftsman’s Display - LPCC

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research


Best distance
“Best Distance” Conversation-based Speaker Discrimination

  • Optimal Criteria for Fusion of Distances

    • Maximize inter-speaker variation

    • Minimize intra-speaker variation

    • Maximize T-test value between inter-class distance distributions

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research


Decision level fusion
Decision Level Fusion Conversation-based Speaker Discrimination

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research

D1 => match

D2 => no match

Match = ¾

No Match = ¼

Final Decision = Match

D3 => match

D4 => match


Weighted decision level fusion
Weighted Decision Level Fusion Conversation-based Speaker Discrimination

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research

Ti = T-value corresponding to each distance


Summary Conversation-based Speaker Discrimination

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research


Research goal
Research Goal Conversation-based Speaker Discrimination

  • To differentiate between speakers in a conversation

    • To determine the number of speakers present

    • To determine who is speaking when

  • To overcome the following challenges

    • No a priori information

    • Limited data size

    • No knowledge of change points

    • Co-channel speech

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research


Summary of accomplishments
Summary of Accomplishments Conversation-based Speaker Discrimination

  • Novel model formation technique

  • Three novel approaches for conversations-based speaker differentiation

  • Distance combination techniques to enhance performance

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research


Observations
Observations Conversation-based Speaker Discrimination

  • Mahalanobis Distance, LPCCs optimal for standard databases

  • T-Square Distance, MFCCs optimal for new database

  • Best fusion technique: Weighted voting combination technique most efficient

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research


Conclusion
Conclusion Conversation-based Speaker Discrimination

  • Developed system yields about 6% EER whereas state of the art speaker indexing systems yield about 10% error rate.

  • Methods for discrimination between speakers (speaker count or indexing) in CONVERSATIONS with more than two speakers have been introduced.

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research


Further Research Conversation-based Speaker Discrimination

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research


Further research
Further Research Conversation-based Speaker Discrimination

  • Investigation of prosodic speaker discrimination features

  • Improving model formation technique by determining speaker change-points a priori

  • Exploring the use of individual phonemes to form models

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research


Further research cont d
Further Research, cont’d Conversation-based Speaker Discrimination

  • Investigating the use of unvoiced speech, cautiously, in the formation of models

  • Speech enhancement techniques to handle distorted data

  • Implementation of other fusion techniques such as KL measure of divergence

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research


Publications
Publications Conversation-based Speaker Discrimination

  • U. Ofoegbu, A. Iyer, R. Yantorno and S. Wenndt, “Unsupervised Indexing of Noisy conversations with Short Speaker Utterances”, IEEE Aerospace Conference. March, 2007

  • U. Ofoegbu, A. Iyer, R. Yantorno, “Detection of a Third Speaker in Telephone Conversations”, ICSLP, INTERSPEECH 2006

  • U. Ofoegbu, A. Iyer, R. Yantorno, “A Simple Approach to Unsupervised Speaker Indexing”, IEEE ISPACS. 2006.

  • U. Ofoegbu, A. Iyer, R. Yantorno, “A Speaker Count System for Telephone Conversations”, IEEE ISPACS. 2006.

Advisor:

Robert Yantorno, Ph.D

Committee Members:

Brian Butz, Ph.D.

Dennis Silage, Ph.D.

Iyad Obeid, Ph.D.


or give suggestions Conversation-based Speaker Discrimination

ACKNOWLEDGEMENT

To the greatest teacher in the world, and the one who has made the most impact in my life, Dr. Robert E. Yantorno.

To my best friend and the love of my life, Dr. Jude C. Abanulo

To Dr. Brian Butz, Dr. John Helferty, Dr. Saroj Biswas and Dr. Henry Sendaula

To my dissertation committee members, Dr. Iyad Obeid and Dr. Dennis Silage, and to Dr. Rena Krakow.

To my friend, Ananth Iyer

To Abdoul Fall, Joe Fitschgrund, Angela Linse and Ralph Oyini; and to the members of the Speech Processing Lab and the faculty of the electrical engineering department

To engineering administrators, Tamika Butler, Carol Dahlberg, Yvette Gibson and Cheryl Sharp, and to Louise, day time janitress for the engineering building

To the Temple students who volunteered as participants in the New Conversations Database

To Temple

To the Air Force Research Labs at Rome – Financial supporters of most of the research

To my parents, Ugo & Joseph Ofoegbu; my siblings, Amaka & Humphrey Onyendi, Nene, Obinna and Chibuzor Ofoegbu; and my grandmother, Cordelia Osuji

To God

Thank you.

Advisor:

Robert Yantorno, Ph.D

Committee Members:

Brian Butz, Ph.D.

Dennis Silage, Ph.D.

Iyad Obeid, Ph.D.


Advisor: Conversation-based Speaker Discrimination

Robert Yantorno, Ph.D

Committee Members:

Brian Butz, Ph.D.

Dennis Silage, Ph.D.

Iyad Obeid, Ph.D.

Brett Smolenski, Ph.D.

Extra Slides


Cepstral analysis
Cepstral Analysis Conversation-based Speaker Discrimination

Frequency Analysis of Speech

Excitation Component

Vocal Tract Component

STFT of Speech

Slowly varying formants

Fast varying harmonics

=

X

Log of STFT

Log of Excitation

Log of Vocal Tract Component

=

+

IDFT of Log of STFT

Excitation

Vocal tract

+

=


Cepstral features
Cepstral Features Conversation-based Speaker Discrimination

  • Linear Predictive Cepstral Coefficients

    • Obtained Recursively from LPC Coefficients

Let LPC vector = [a0 a1 a2 …ap]   and

LPCC vector = [c0 c1 c2 …cp c0 … c1 c2 …cn-1] 


Conversational data modeling1
Conversational Data Modeling Conversation-based Speaker Discrimination

  • Current Method

    • Equal Segmentation of Data

    • Indiscriminate use of data

  • Problems

    • Change points unknown

    • Not all speech is useful


Best distance1
“Best Distance” Conversation-based Speaker Discrimination

  • Intra-speaker and inter-speaker distance lengths are always equal, therefore:

    P = sum of the covariance matrices of the two classes.

    λ1 = maximum eigenvalue obtained by solving the

    generalized eigenvalue problem:

    Q = is the square of the distance between the mean vectors

    of the two classes


Best distance2
“Best Distance” Conversation-based Speaker Discrimination

Distance Measure 2

Distance Measure 1


Rrmd approach2
RRMD Approach Conversation-based Speaker Discrimination

  • Relative Distance Condition


Modeling analysis
Modeling Analysis Conversation-based Speaker Discrimination

N = 20 – 4 seconds of

voiced speech


Modeling analysis1
Modeling Analysis Conversation-based Speaker Discrimination

N = 5 – 1 second of

voiced speech


Distance measures
Distance Measures Conversation-based Speaker Discrimination

  • Mahalanobis Distance

    • Measures the separation between the means of both classes

  • Hotelling’s T-Square Statistics

    • Measures the separation between the means of both classes and takes into consideration the data lengths

  • Kullback-Leibler Distance

    • Measures the separation between the distribution of both classes

  • Bhattacharyya Distance

    • Derived from measuring the classification error between both classes

  • Levene’s Test

    • Measures absolute deviation from the center of the class distribution


Speaker recognition

Reference Speech Conversation-based Speaker Discrimination

Feature Extraction

Model Building

Test Speech

Feature

Extraction

Recognition

Decision

Comparison

Speaker Recognition

  • Speaker Identification

    • Who is this speaker?

  • Speaker Verification

    • Is he who he claims to be?

System

Output


Speaker segmentation
Speaker Segmentation Conversation-based Speaker Discrimination

  • Broadcast News/Conference Data

  • Conversational Data


Procedural set up

Speaker A Conversation-based Speaker Discrimination

Speaker B

Utterances

from Speaker A

Randomly Select Utterance

Randomly Select 2 Utterances

Utterance 1

Window Data

Window Data

Compute

Feature

Compute Feature

Utterance 2

Compute Distance

Compute Distance

Randomly Select Utterance

Window Data

Window Data

Compute Feature

Compute

Feature

Procedural Set-up

Intra-speaker distance computations

  • 384-Speaker database used

  • Average Utterance Length = 5 seconds

Inter-speaker distance computations


Best n estimation
Best ‘N’ Estimation Conversation-based Speaker Discrimination

  • 245 conversations from SWITCHBOARD used

  • Results shown for T-Square distance

Addressing the

Challenges

Applications

Methods

Modeling Speakers

Speaker Indexing

Speaker Count

Speaker Count-Indexing

Fusion of Distances

Evaluation

Summary and Further

Research

N = 5


Rra examples 2 speakers
RRA Examples – 2 Speakers Conversation-based Speaker Discrimination

Addressing the

Challenges

Applications

Methods

Modeling Speakers

Speaker Indexing

Speaker Count

Speaker Count-Indexing

Fusion of Distances

Evaluation

Summary and Further

Research


Rra examples 3 speakers
RRA Examples – 3 Speakers Conversation-based Speaker Discrimination

Addressing the

Challenges

Applications

Methods

Modeling Speakers

Speaker Indexing

Speaker Count

Speaker Count-Indexing

Fusion of Distances

Evaluation

Summary and Further

Research


Comparison
Comparison Conversation-based Speaker Discrimination

TWO-SPEAKER RESIDUAL

THREE-SPEAKER RESIDUAL

Addressing the

Challenges

Applications

Methods

Modeling Speakers

Speaker Indexing

Speaker Count

Speaker Count-Indexing

Fusion of Distances

Evaluation

Summary and Further

Research

Residual Ratio after 2nd round of RRA

Residual Ratio after 2nd round of RRA

Speaker 2


Effects of fusion
Effects of Fusion Conversation-based Speaker Discrimination

LPCCs

Addressing the

Challenges

Applications

Methods

Modeling Speakers

Speaker Indexing

Speaker Count

Speaker Count-Indexing

Fusion of Distances

Evaluation

Summary and Further

Research


Effects of fusion1
Effects of Fusion Conversation-based Speaker Discrimination

LPCCs

Addressing the

Challenges

Applications

Methods

Modeling Speakers

Speaker Indexing

Speaker Count

Speaker Count-Indexing

Fusion of Distances

Evaluation

Summary and Further

Research


Effects of fusion2
Effects of Fusion Conversation-based Speaker Discrimination

MFCCs

Addressing the

Challenges

Applications

Methods

Modeling Speakers

Speaker Indexing

Speaker Count

Speaker Count-Indexing

Fusion of Distances

Evaluation

Summary and Further

Research


Effects of fusion3
Effects of Fusion Conversation-based Speaker Discrimination

MFCCs

Addressing the

Challenges

Applications

Methods

Modeling Speakers

Speaker Indexing

Speaker Count

Speaker Count-Indexing

Fusion of Distances

Evaluation

Summary and Further

Research


Best feature size
Best Feature Size Conversation-based Speaker Discrimination

Addressing the

Challenges

Applications

Methods

Modeling Speakers

Speaker Indexing

Speaker Count

Speaker Count-Indexing

Fusion of Distances

Evaluation

Summary and Further

Research


Best feature size1
Best Feature Size Conversation-based Speaker Discrimination

Addressing the

Challenges

Applications

Methods

Modeling Speakers

Speaker Indexing

Speaker Count

Speaker Count-Indexing

Fusion of Distances

Evaluation

Summary and Further

Research


Correlation analysis1
Correlation Analysis Conversation-based Speaker Discrimination

Draftsman’s Display - MFCC

Introduction

Evaluation Databases

Modeling Speakers

Application Systems

Fusion of Distance

Measures

Summary

Further Research


Advisor: Conversation-based Speaker Discrimination

Robert Yantorno, Ph.D

Committee Members:

Brian Butz, Ph.D.

Dennis Silage, Ph.D.

Iyad Obeid, Ph.D.

Brett Smolenski, Ph.D.


ad