Download
wild dolphin project 11 751 speech final project n.
Skip this Video
Loading SlideShow in 5 Seconds..
Wild Dolphin Project 11-751 Speech Final Project PowerPoint Presentation
Download Presentation
Wild Dolphin Project 11-751 Speech Final Project

Wild Dolphin Project 11-751 Speech Final Project

382 Views Download Presentation
Download Presentation

Wild Dolphin Project 11-751 Speech Final Project

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Wild Dolphin Project11-751 Speech Final Project by Jiazhi Ou jzou@cs.cmu.edu Tal Blum blum@cs.cmu.edu

  2. Outline • Wild Dolphin Project, Dolphin Speech • Data, Labeling, Labeling problems • Previous work • Models training • Experiments & Results • Conclusions

  3. The Wild Dolphin Project (WDP) • The Wild Dolphin Project (WDP), founded by Dr. Denise Herzing in 1985, is engaged in an ambitious, long-term scientific study of a specific pod of Atlantic spotted dolphins that live 40 miles off the coast of the Bahamas, in the Atlantic Ocean. For about 100 days each year, Phase I research has involved the photographing, videotaping, and audio taping of a group of resident dolphins, aiming to learn about their lives. • http://www.wilddolphinproject.org/index.cfm

  4. Dolphin’s Speech • Dolphin’s Speech is very different than man’s speech • Range of frequencies is wider • Two mechanisms for producing sound simultaneously • Directionality of some of the frequencies • Carried in water • Can travel large distances

  5. Dolphin’s Speech(2) • Is used for: • Identification • Communicating • Fighting • Defending • Courting • Warning • Calling • Hunting

  6. Dolphin’s Speech(3) • 3 main types • Whistles • Signature • Non-signature • Clicks • Spike trains

  7. What do we know • Not much • We know that each dolphin has a unique whistle called signature whistle. • The signature whistle is similar to those that are in close contact with the baby dolphin

  8. Data • 164 files containing sounds of one dolphin whose name is known. • Average file length is 7 sec • Total data length less than 20 minutes out of which about half is silence • The data does not contain all of the relevant frequencies

  9. Labeling • Dolphin Names • Dolphin ID project • Pause, Noise, Dolphin Signature Whistles, Dolphin Non-Signature whistles.

  10. Labeling Problems • How do we distinguish between those 2 whistles? • How to distinguish between whistles and non-whistles? • They co-occur • How to determine the duration of the label? • Should close labels be labeled as one label? • This has an effect on the model • Some signals are weak, probably due to a change in the dolphins direction

  11. Mapping from Labels to Models

  12. Label Statistics

  13. Previous Work • Dolphin-ID Project by Tanja, Alan and Yue • Task: To identify dolphin ID using their signature whistles • 51 labeled files by Alan • 13 HMMs: 10 for each dolphin + DOLPHIN, PAUSE, and GARBAGE • Use Janus to do training and testing • Try different kinds of features

  14. Our Work • Model Generalized Signature Whistles • Label More Files • Create HMMs for signature whistles, non-signature whistles, garbage, and pause • Train and test the HMMs using Janus • Evaluate the test results with our own method • Compare different model selections

  15. Signal Processing • Tanja scripts • Down sampling • High Pass Filter • FFT • LDA

  16. b b b m m m m m e m e e HMM Topologies Signature Whistles Non-Signature Whistles Garbage Pause (Water)

  17. Model Selection • Scheme 1 • Signature Whistles, Non-Signature Whistles, GARBAGE, PAUSE • Scheme 2 • Signature Whistles, GARBAGE, PAUSE • Scheme 3 • 10 HMMs (one for each dolphin), GARBAGE, PAUSE

  18. Evaluation • We can not use WER here since there are no words, just segments. • The method we used was to compute a confusion matrix over hidden states. • Janus treat silence differently and doesn’t show silence classification which complicates the evaluation.

  19. Experiments • Data • 162 labeled files were used • Half of the data for training, half for testing • Swap the training set and test set • 162 test results all together • Features • The same as those in dolphin-ID project • Model Selection • 3 different schemes

  20. Results – Scheme 1

  21. Results – Scheme 2

  22. Results – Scheme 3

  23. Analysis of Results • You can only get as good as your labels • Scheme 3 is the best to align signature whistles -- speaker dependent • Scheme 1 is the worst – Not enough data to model non-signature whistles and garbage • Scheme 2 is in the middle – speaker independent • Pause is the most difficult to model – It contains all different things. We modeled it with only 1 state

  24. Conclusion • Analyzing dolphin sounds is quite different than analyzing human speech. The methods used have to be adjusted to the characteristics of the dolphin sounds. • There is a lot of work to be done in the signal processing stage • Partly supervised training • It might be better just to construct a model for the labels we are sure and let the model learn what are signature whistles or units that discriminate between different labels.

  25. We also tried … • One-state model for non-signature whistles, garbage, and pause -- Segmentation fault in training • “Loop back” model for signature whistles -- The loop back transition makes no difference

  26. Acknowledgement Tanja Schultz Yue Pan Alan W Black Szu-Chen Stan Jou Hua Yu

  27. Thank You! Jiazhi Ou Tal Blue {jzou, tblum}@cs.cmu.edu