Multi-Speaker Detection

Multi-Speaker Detection By Matt Fratkin EE 6820 3/9/05

Background Currently, there is a lot of work being done on speaker recognition, but a new problem arises when more than one speaker is present.

Uses for Multi-Speaker Detection Multi-speaker detection is important in instances when there is dialog between more than one person such as in a meeting, debate, conference, or court hearing. It can also be helpful if a certain key speaker needs to be tracked throughout a certain speech or debate.

Attempted Solutions The first attempt at solving this problem was transcribing the events, but that can cost up to $400/hr. An attempt was made to have each speaker have their own microphone therefore each microphone would represent a certain speaker. This proved unsuccessful because other microphones would pick up crosstalk from nearby speakers.

Possible Methods for Multi-Speaker Detection • Pattern Recognition • Dual Pitch Tracking • Speaker Segmentation

Pattern Recognition This method would be used by first hand marking overlaps and then calculating features for each of the overlaps. Some possible features would be critical band loudness values, energy, and zero-crossing rate. After that a classifier would be built to try to separate the two different classes.

Dual Pitch Tracking Using the idea that a single speaker’s voice only has one single pitch, a comb filter can be used to cancel out distinct harmonics. Using two comb filters tuned at different pitches that would be able to eliminate overlapping vowels which normally have different pitches. Therefore wherever the second comb filter cancelled out the most energy would indicate two different pitches in the frame.

Speaker Segmentation Using conventional speaker segmentation one would be able to take a look at the boundaries that are created by the different speakers. It would also be possible to take a look at events when speakers are interrupted, to see if these scenarios fit the classification of an overlap.

Sound Sources The sound that would be used for this project will be taken from Professor Ellis’ ICSI Meeting recorder project. Here there are examples from a recorded meeting, therefore providing the ability to take a look at real world overlaps.

Conclusion From examining the three different methods for multi-speaker detection, I will be able to chose the one that detects multiple speakers with the highest accuracy. After choosing the one that is the most accurate, I can investigate the method more closely in hopes of proposing new ideas to improve the method.

References • Brown, Guy J. , Renals, Steve, Wan, Vincent , and Wrigley, Stuart N. , “Speech and Crosstalk Detection in Multichannel Audio,” http://www.m4project.org/pdf/wrigleybrownwanrenals2005.pdf, 2005. • Ellis , Daniel P.W. and Kennedy,Lyndon S., “PITCH-BASED EMPHASIS DETECTION FOR CHARACTERIZATION OF MEETINGRECORDINGS,” http://www.ee.columbia.edu/~dpwe/pubs/asru01-sad.pdf, 2003. • Lu, Lie and Zhang, Hong-Jiang, “Speaker Change Detection and Tracking in Real-Time News Broadcasting Analysis,” http://delivery.acm.org/10.1145/650000/641127/p602-lu.pdf?key1=641127&key2=8647789011&coll=GUIDE&dl=GUIDE&CFID=39855083&CFTOKEN=82588953 , 2002. • Martin, Alvin F. and Przybocki, Mark A. , “Speaker Recognition in a Multi-Speaker Environment”, http://www.nist.gov/speech/publications/papersrc/euro01paperv2.7.pdf , 2000.

Multi-Speaker Detection

Multi-Speaker Detection

Presentation Transcript

Speaker Identification Using a Pitch Detection Algorithm

Statistical Learning of Multi-View Face Detection

SAFESITE Multi-Threat Detection System Advanced Threat Detection – Superior Protection

MUD: Multi-user detection

Multi-user Detection

Multi-dimensional quickest detection

MIDeA :A Multi-Parallel Instrusion Detection Architecture

Multi/Hyperspectral Image Exploitation for Ship Detection

Multi-Level Intrusion Detection System (ML-IDS)

Sharing features for multi-class object detection

Multi-target Detection in Sensor Networks

Aircraft Fault Detection and Classification Using Multi-Level Immune Learning Detection

Speaker Name Speaker Title Speaker Affiliation

Speaker Change Detection using Support Vector Machines

Multi Layered Pool Intrusion Detection System

SPEAKER NAME SPEAKER TITLE SPEAKER COMPANY

Speaker Detection Without Models

Multi-dimensional quickest detection

Detection of deforestation by multi-temporal SAR

Models for Multi-View Object Class Detection

Multi User Detection for CDMA

Asia-Pacific Multi-Cancer Early Detection Market