1 / 21

Speech and Face Recognition Semester Project Speaker Segmentation

Outline. GoalApproachResultsConclusion. Automatic Speaker Segmentation Goal. Input:Speech signal containing a spoken conversation between an unknown number of peopleSingle Channel No overlap/Simultaneous speakersMinimal Background NoiseOutputFind the Number of Distinct SpeakersIdentify

gin
Download Presentation

Speech and Face Recognition Semester Project Speaker Segmentation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Pedro Davalos & Hassan Kingravi May 9, 2007 CPSC 689-604 Speech and Face Recognition Semester Project “Speaker Segmentation”

    2. Outline Goal Approach Results Conclusion

    3. Automatic Speaker Segmentation Goal Input: Speech signal containing a spoken conversation between an unknown number of people Single Channel No overlap/Simultaneous speakers Minimal Background Noise Output Find the Number of Distinct Speakers Identify segments (times) where each speaker is talking

    4. Approach: Algorithm

    5. Algorithm: Pre-Processing

    6. LPC Filter (inverse)

    7. Features: MFCC’s & F0

    8. Speaker Change Detection Find sudden changes in features For each point in time, find difference (distance) from current to next window High Distances represent “possible” speaker change Treat each segment as a possible speaker KL Distance:

    9. Speaker Modeling & Identification Goal of Finding segments from same speaker Find Characteristics of each Segment Gaussian Means for each feature on each segment Assigning a speaker label to each segment can be achieved through k-means clustering K-means clustering addresses false positive transitions

    10. Results Summary

    11. Results 1 – “News7”

    12. Results 2: “news7half”

    13. Results 3 – “mtc_se”

    14. Results 4 – “Mtc-SE-3”

    15. Results 5 – “Mtc_se_3b”

    16. Results 6 – “Mtc_se_3d”

    17. Results 7 – “npr3a”

    18. Results 8 – “npr4c”

    19. Results 9 – “npr4d”

    20. Results 10 – “npr3g”

    21. Conclusions (1/2) Feature Extraction Accurate speaker dependent features required Ideal features would have greater variability between speakers than between phonemes Numerous available speech features did not yield adequate speaker separation Thresholds / Parameters Segmentation process involves numerous thresholds Lpc order and window, pitch estimation, num of mfccs, mfcc window, Distance function, distance window, peak estimation, peak threshold window …

    22. Conclusions (2/2) Performance Segmentation proved successful with ideal conditions False positive transitions are handled by clustering Missing true transitions degrade performance by mixing speakers Future Work Estimating Number of Speakers (Clustering Optimization)

More Related