1 / 17

Mark D. Skowronski and John G. Harris Computational Neuro-Engineering Lab

Statistical automatic identification of microchiroptera from echolocation calls Lessons learned from human automatic speech recognition. Mark D. Skowronski and John G. Harris Computational Neuro-Engineering Lab Electrical and Computer Engineering University of Florida Gainesville, FL, USA

leal
Download Presentation

Mark D. Skowronski and John G. Harris Computational Neuro-Engineering Lab

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistical automatic identification of microchiroptera from echolocation callsLessons learned from human automatic speech recognition Mark D. Skowronski and John G. Harris Computational Neuro-Engineering Lab Electrical and Computer Engineering University of Florida Gainesville, FL, USA November 19, 2004

  2. Overview • Motivations for bat acoustic research • Review bat call classification methods • Contrast with 1970s human ASR • Experiments • Conclusions

  3. Bat research motivations • Bats are among: • the most diverse, • the most endangered, • and the least studied mammals. • 1000 species, ~25% of all mammal species • Close relationship with insects, agricultural impact, disease vectors • Acoustical research non-invasive, significant domain (echolocation) • Simplified biological acoustic communication system (compared to human speech)

  4. Bat echolocation • Ultrasonic, brief chirps • Determine range, velocity of nearby objects (clutter, prey, conspecifics) • Tailored for task, environment Tadarida brasiliensis (Mexican free-tailed bat) Listen to 10x time-expanded search calls:

  5. Echolocation calls • Two characteristics • Frequency modulated -- range • Constant frequency -- velocity • Features (holistic) • Freq. extrema • Duration • Shape • # harmonics • Call interval Mexican free-tailed calls, concatenated

  6. Current classification methods • Expert sonogram readers • Manual or automatic feature extraction • Comparison with exemplar sonograms • Automatic classification • Decision trees • Discriminant function analysis • Artificial neural networks • Spectrogram correlation Parallels the knowledge-based approach to human ASR from the 1970s (acoustic phonetics, expert systems, cognitive approach).

  7. Acoustic phonetics DH AH F UH T B AO L G EY EM IH Z OW V ER • Bottom up paradigm • Frames, boundaries, groups, phonemes, words • Manual or automatic feature extraction • Formants, voicing, duration, intensity, transitions • Classification • Decision tree, discriminant functions, neural network, Gaussian mixture model, Viterbi path

  8. Acoustic phonetics limitations • Variability of conversational speech • Complex rules, difficult to train • Boundaries difficult to define • Coarticulation • Feature estimates brittle • Variable noise robustness • Hard decisions, errors accumulate Shifted to information theoretic paradigm of human ASR, better able to account for variability of speech, noise.

  9. Information theoretic ASR • Data-driven models from computer science • Non-parametric: dynamic time warp (DTW) • Parametric: hidden Markov model (HMM) • Frame-based • Expert information in feature extraction • Models account for feature, temporal variability Information theoretic ASR dominates state-of-the-art speech understanding systems.

  10. Data collection • UF Bat House, home to 60,000 bats • Mexican free-tailed bat (vast majority) • Evening bat • Southeastern myotis • Continuous recording • 90 minutes around sunset • ~20,000 calls • Equipment: • B&K mic (4939), 100 kHz • B&K preamp (2670) • Custom amp/AA filter • NI 6036E 200kS/s A/D card • Laptop, Matlab

  11. Experiment design • Designs and assumptions • All recorded bats are Mexican free-tailed • Calls divided into different intraspecies calls • All calls are search phase • Hand-labeled call detection is complete (no discarded calls) • Hand labels • Narrowband spectrogram • Endpoints, class label • 436 calls in 261 0.5-sec sequences (2% of data) • Four classes, a priori: 34, 40, 20, 6% • All experiments on hand-labeled data only

  12. Experiments • Baseline • Features: Fmin, Fmax, Fmax_energy, and duration, from zero crossings and MUSIC • Classifier: Discriminant function analysis, quadratic boundaries • DTW and HMM • Frame-based features: fundamental frequency (MUSIC super-resolution estimate), log energy, temporal derivatives (HMM only) • DTW: MUSIC frequencies, 10% endpoint range • HMM: 5 states/model, 4 Gaussian mixtures/state, diagonal covariances • Tests • Leave one out • 75% train, 25% test, 1000 trials • Test on train (HMM only)

  13. Results • Baseline, zero crossing • Leave one out: 72.5% correct • Repeated trials: 72.5 ± 4% (mean ± std) • Baseline, MUSIC • Leave one out: 79.1% • Repeated trials: 77.5 ± 4% • DTW, MUSIC • Leave one out: 74.5 % • Repeated trials: 74.1 ± 4% • HMM, MUSIC • Test on train: 85.3 %

  14. Confusion matrices Baseline, zero crossing Baseline, MUSIC DTW, MUSIC HMM, MUSIC

  15. Conclusions • Human ASR algorithms applicable to bat echolocation calls • Experiments • Weakness: accuracy of class labels • No labeled calls excluded • HMM most accurate, undertrained • MUSIC frequency estimate robust, slow • Machine learning • DTW: fast training, slow classification • HMM: slow training, fast classification

  16. Future work • Find robust features of bat echolocation calls that match assumptions of machine learning algorithms • Noise robust • Distribution modeled by Gaussian mixtures • Use hand-labeled subset of data to create call detection algorithm • Explore unsupervised learning • Self-organized maps • Clustering • Real-time portable detection/classification system on laptop PC

  17. Further information • http://www.cnel.ufl.edu/~markskow • markskow@cnel.ufl.edu • DTW reference: • L. Rabiner and B. Juang, Fundamentals of Speech Recognition, Prentice Hall, Englewood Cliffs, NJ, 1993 • HMM reference: • L. Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition,” in Readings in Speech Recognition, A. Waibel and K.-F. Lee, Eds., pp. 267–296. Kaufmann, San Mateo, CA, 1990.

More Related