1 / 20

S ubband cocktail-party speech separation: CASA vs. BSS

S ubband cocktail-party speech separation: CASA vs. BSS. Seungjin Choi Department of Computer Science and Engineering POSTECH, Korea seungjin@postech.ac.kr Co-work with Frederic Berthommier ICP, INPG, France. Number95 Stereo Database. ST-Numbers95 Database ICP/INP Grenoble

ahava
Download Presentation

S ubband cocktail-party speech separation: CASA vs. BSS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Subband cocktail-party speech separation: CASA vs. BSS • Seungjin Choi • Department of Computer Science and Engineering • POSTECH, Korea • seungjin@postech.ac.kr • Co-work with Frederic Berthommier • ICP, INPG, France

  2. Number95 Stereo Database ST-Numbers95 Database ICP/INP Grenoble Authors: E.Tessier and F. Berthommier Left source Right source Reference Mixture A large database of binary mixtures of sentences (n=613) has been recorded by [Tessier and Berthommier, 1999]. The signal of Numbers95 is played by loudspeakers and recorded. The temporal overlap between words is about 75% and the relative level is 0dB. The setup is static. Only 332 mixture sentences truncated at 1 s are used in the present study.

  3. Filterbank decomposition 1 1 0.8 0.8 0.6 0.6 Gain 1 0.4 0.4 2 nbsb= 0.2 0.2 0 0 100 4000 Hz 100 4000 Hz 1 1 0.8 0.8 0.6 0.6 Gain 0.4 0.4 3 4 0.2 0.2 0 0 100 4000 Hz 100 4000 Hz Frequency Frequency Subband processing

  4. The CASA Model Filterbank decomposition TDOA estimation and weighting Resynthesis

  5. Reconstruction Acuracy 1 Reference 0.8 0.6 Left source Frequency Rl 0.4 0.2 0 0 500 1000 1500 2000 2500 3000 Time 1 0.8 Left output 0.6 Frequency Yl 0.4 0.2 0 2 4 6 8 10 12 14 Frame of 1024 bins with half overlap RA (output) RA (mixture)

  6. Gain of CASA

  7. Gain of CASA : Relative Level 4 2 Gain left (dB) 0 -2 RAY RAX

  8. Subband effect for CASA RA left RA right RA left+right 9.5 9.5 19 256 512 9 9 18 8.5 8.5 17 8 8 16 dB dB 7.5 7.5 15 7 7 14 6.5 6.5 13 6 6 12 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 nbsb nbsb nbsb dB Effect of the number of subbands (nbsb) for the CASA model on the RA (in dB). From left to right: averaged left source RA, averaged right source RA, averaged left+right RA over all frames. The number of subbands varies from 1 to 5 and the two curves correspond to duration= 256 and 512 bins. The RA of the mixture, which is subtracted for gain evaluation is labelled (*).

  9. Effect of nbsb : RA Left 20 15 nbsb=4 nbsb=2 RA (dB) 10 5 nbsb=1 0 0 2 4 6 8 10 12 14 Right 15 10 RA (dB) 5 0 -5 0 2 4 6 8 10 12 14 Frame 1024 bins with half overlap Mixt. Left Right 2 4

  10. Subband effect for CASA: Gain 4 2 0 Gain (dB) -2 -4 -6 -8 -50 -40 -30 -20 -10 0 10 20 30 40 50 Relative Level (dB) Left Right nbsb=1 nbsb=4

  11. The BSS Model S Wrl Wlr S nbp 1 Yl(t) 0.9 0.8 0.7 Frequency 0.6 0.5 0.4 Yr(t) 0.3 0.2 0.1 Time 0 0 500 1000 1500 2000 2500 3000 3500 1 second Xl(t) Yl(t) Gain | Non linear function | Delayed output Xr(t) Yr(t)

  12. Gain of BSS :Relative Level 6 2 -2 -6 Gain left (dB) RAY RAX

  13. Subband effect for BSS left right left+right 10 10 20 9.5 9.5 19 9 9 18 8.5 8.5 17 8 8 16 dB dB dB 7.5 7.5 15 7 7 14 6.5 6.5 13 6 6 12 2 3 10 5.5 5.5 11 100 5 5 10 1 2 3 4 1 2 3 4 1 2 3 4 nbsb nbsb nbsb Effect of the number of subbands (nbsb) for the BSS model on the RA (in dB). From left to right: av. left source RA, av. right source RA, av. left+right RA over all frames. The number of subbands varies from 1 to 4 and the three curves correspond to nbp= 2,3,10, 100. The RA of the mixture is labelled (*). In each figures, two points are added at nbsb=1 for the "BSS giv" condition () and for "BSS ori" data ().

  14. RA and Gain for BSS 20 10 RL (dB) 0 -10 0 2 4 6 8 10 12 14 Speech Separation Program (C++) POSTECH Authors: S. Choi and H. Hong Left 20 RAX 15 Mixt. - - 10 RA (dB) + 5 0 RAY -5 0 2 4 6 8 10 12 14 Left Right 15 10 - RA (dB) + 5 Right 0 -5 0 2 4 6 8 10 12 14 Frame 1024 bins with half overlap

  15. Subband effect for BSS: Gain Gain of BSS (nbp=100) 6 4 2 Right 0 Left -2 Gain (dB) -4 nbsb=2 -6 nbsb=1 -8 -10 -12 -50 -40 -30 -20 -10 0 10 20 30 40 50 Relative Level (dB)

  16. Demixing filters Wlr Wlr 0.3 500 Wlr 400 0.2 400 0.1 nbsb=1 300 300 0 200 -0.1 200 100 -0.2 100 0 0 10 20 30 40 50 60 70 80 90 100 20 40 60 80 100 120 140 160 180 200 0 0 10 20 30 40 50 60 70 80 90 100 Wrl Wrl 0.3 300 Wrl 250 0.2 0.1 200 200 0 150 -0.1 100 100 -0.2 50 0 0 10 20 30 40 50 60 70 80 90 100 20 40 60 80 100 120 140 160 180 200 0 Frequency time (bin) 0 10 20 30 40 50 60 70 80 90 100 Frequency 1 0.8 0.6 0.4 0.2 0 20 40 60 80 100 120

  17. Coherence spectrograms left 1 0.8 0.6 Frequency 0.4 0.2 0 0 500 1000 1500 2000 2500 3000 3500 Yl(n), Yl(n+1) Time NBP=10 Frames of 256 bins with half overlap Mean(Coh)=0.65 right 1 0.8 0.6 Frequency 0.4 0.2 0 0 500 1000 1500 2000 2500 3000 3500 Time Yr(n), Yr(n+1)

  18. Effect of nbp: Coherence spectrograms Coh Left Right NBP=3 0.68 3 NBP=10 10 0.65 NBP=100 100 0.60

  19. Coherence statistic left+right Coh 20 0.8 19 18 0.75 17 16 0.7 dB 15 14 0.65 13 2 12 0.6 3 10 11 100 10 0.55 1 2 3 4 1 2 3 4 nbsb nbsb Effect of the number of subbands (nbsb) on the coherence index for the BSS model. Left: average left+right RA over all frames. Right: coherence defined as the mean of the coherence spectrogram. The number of subbands varies from 1 to 4 and the three curves correspond to nbp= 2,3,10, 100. The RA of the mixture is labelled (*). The CohX coherence between the two mixture channels is labelled (*) in the right figure. In each figures, two points are added at nbsb=1 for the "BSS giv" condition () and for "BSS ori" data ().

  20. Summary results … Hearing REF CASA BSS Left Right Right mean Left

More Related