1 / 11

Multi-Speaker Detection

Multi-Speaker Detection . By Matt Fratkin EE 6820 3/9/05. Background. Currently, there is a lot of work being done on speaker recognition, but a new problem arises when more than one speaker is present. Uses for Multi-Speaker Detection.

jirair
Download Presentation

Multi-Speaker Detection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multi-Speaker Detection By Matt Fratkin EE 6820 3/9/05

  2. Background Currently, there is a lot of work being done on speaker recognition, but a new problem arises when more than one speaker is present.

  3. Uses for Multi-Speaker Detection Multi-speaker detection is important in instances when there is dialog between more than one person such as in a meeting, debate, conference, or court hearing. It can also be helpful if a certain key speaker needs to be tracked throughout a certain speech or debate.

  4. Attempted Solutions The first attempt at solving this problem was transcribing the events, but that can cost up to $400/hr. An attempt was made to have each speaker have their own microphone therefore each microphone would represent a certain speaker. This proved unsuccessful because other microphones would pick up crosstalk from nearby speakers.

  5. Possible Methods for Multi-Speaker Detection • Pattern Recognition • Dual Pitch Tracking • Speaker Segmentation

  6. Pattern Recognition This method would be used by first hand marking overlaps and then calculating features for each of the overlaps. Some possible features would be critical band loudness values, energy, and zero-crossing rate. After that a classifier would be built to try to separate the two different classes.

  7. Dual Pitch Tracking Using the idea that a single speaker’s voice only has one single pitch, a comb filter can be used to cancel out distinct harmonics. Using two comb filters tuned at different pitches that would be able to eliminate overlapping vowels which normally have different pitches. Therefore wherever the second comb filter cancelled out the most energy would indicate two different pitches in the frame.

  8. Speaker Segmentation Using conventional speaker segmentation one would be able to take a look at the boundaries that are created by the different speakers. It would also be possible to take a look at events when speakers are interrupted, to see if these scenarios fit the classification of an overlap.

  9. Sound Sources The sound that would be used for this project will be taken from Professor Ellis’ ICSI Meeting recorder project. Here there are examples from a recorded meeting, therefore providing the ability to take a look at real world overlaps.

  10. Conclusion From examining the three different methods for multi-speaker detection, I will be able to chose the one that detects multiple speakers with the highest accuracy. After choosing the one that is the most accurate, I can investigate the method more closely in hopes of proposing new ideas to improve the method.

  11. References • Brown, Guy J. , Renals, Steve, Wan, Vincent , and Wrigley, Stuart N. , “Speech and Crosstalk Detection in Multichannel Audio,” http://www.m4project.org/pdf/wrigleybrownwanrenals2005.pdf, 2005. • Ellis , Daniel P.W. and Kennedy,Lyndon S., “PITCH-BASED EMPHASIS DETECTION FOR CHARACTERIZATION OF MEETINGRECORDINGS,” http://www.ee.columbia.edu/~dpwe/pubs/asru01-sad.pdf, 2003. • Lu, Lie and Zhang, Hong-Jiang, “Speaker Change Detection and Tracking in Real-Time News Broadcasting Analysis,” http://delivery.acm.org/10.1145/650000/641127/p602-lu.pdf?key1=641127&key2=8647789011&coll=GUIDE&dl=GUIDE&CFID=39855083&CFTOKEN=82588953 , 2002. • Martin, Alvin F. and Przybocki, Mark A. , “Speaker Recognition in a Multi-Speaker Environment”, http://www.nist.gov/speech/publications/papersrc/euro01paperv2.7.pdf , 2000.

More Related