Automatic detection and classification of Microchiropteran echolocation calls: Why the current technology is wrong and

Automatic detection and classification of Microchiropteran echolocation calls: Why the current technology is wrong and what can be done about it Mark D. Skowronski and John G. Harris Computational Neuro-Engineering Lab, University of Florida, Gainesville, FL Receiver operator characteristic (ROC) curve: ABSTRACT Existing methods for automatic detection and classification of bat calls suffer from two significant limitations: inadequate feature extraction and simplistic modeling. A 10-ms call, sampled at 200 kHz, is represented by 2000 time-domain samples, yet the same call is reduced to less than 10 global feature values (minimum and maximum frequency, duration, shape features, energy) during feature extraction. Such a reduction in representation excludes discriminating information from the call, which increases classification and detection errors. Furthermore, detection is typically performed using an energy threshold, excluding frequency information altogether, and classification is performed using discriminant function analysis (DFA), which uses a single Gaussian kernel to model the probability distribution function of call features. For multi-modal distributions of features (e.g., Tadarida brasiliensis uses FM, CF, and FM-CF calls), a uni-modal Gaussian kernel is woefully inadequate. What can be done to address these limitations? A viable solution, developed and refined over the past 3 decades, comes from automatic speech recognition (ASR). The ASR research community has shifted from expert-driven models to data-driven models (machine learning), primarily because machine learning methods employ superior statistical models which better account for the variations of human speech. The role of experts in the machine learning paradigm of ASR has focused on incorporating knowledge about the production and perception of speech into feature extraction algorithms. We have recently applied two machine learning algorithms to the problem of automatic bat call detection and classification, a hidden Markov model (HMM) and a Gaussian mixture model (GMM), in an experiment using about 3000 hand-labeled calls from 5 species (Pipistrellus bodenheimeri, Molossus molossus, Lasiurus borealis, Lasiurus cinereus semotus, andTadarida brasiliensis). We applied two techniques common in ASR to improve performance: a noise-reduction algorithm called spectral mean subtraction, and the use of temporal derivatives to add local shape information to the feature vectors. For detection, we compared a GMM to a baseline energy method. At equal sensitivity and specificity, the accuracy of the GMM was 96%, while the accuracy for an energy baseline was 68%. For classification, we compared the machine learning algorithms to a baseline DFA classifier using a cross-validation experiment in which 50% of the calls were used to train the models and the remaining 50% of the calls were used to test the models. Over 20 trials, classification for the GMM and HMM were 99.4 ± 0.2 % while the accuracy of a DFA was 83.1 ± 1.1% (mean ± st. dev.). The experiment results demonstrate the superior performance of machine learning algorithms, reducing detection and classification errors by an order of magnitude compared to the existing methods. Machine learning methods have the potential to profoundly impact the use of acoustic studies in bat research. DETECTION DETECTION Confusion matrices at equal sensitivity and specificity:105,090 detection blocks (20 ms) Conventional method [1,2]: Gaussian mixture model (GMM) [3]: Detector output examples: Each gray column is a hand-labeled call from a pass of 25 calls from L. borealis. The black horizontal line represents θ for equal sensitivity and specificity. xk(n) - frame k of raw signal x(n) E(k) - energy in frame k L - frame length (~1ms) d(k) - detection decision θ - energy threshold xk - input features for frame k: spectral peak amplitude, frequency at peak amplitude, first- and second-order temporal derivatives ωi - class of signal: i = 1 for background frames, i = 2 for call frames p (xk|ωi) - class-conditional probability density for frame k of input feature vector x given class ωi G - Gaussian kernel with mean vector μ and covariance matrix Σ, estimated from hand-labeled data wi,m, μi,m, Σi,m - mixture weight, mean, and covariance of mth kernel for class ωi d(k) - detection decision for frame k θ - likelihood threshold CLASSIFICATION CLASSIFICATION Conventional method: Pipistrellus bodenheimeri: Lasiurus cinereus semotus: Molossus molossus: Lasiurus borealis: Features [2,4-8]: min frequency, max frequency, frequency at peak amplitude, and duration, extracted from hand-labeled calls using noise-robust methods [3]. Classifier [2,7-9]: discriminant function analysis (DFA) with stratified covariance matrices (quadratic) Gaussian mixture model (GMM) classifer: Same as GMM detector, except ωi represent each species. Averaged log likelihood over all K frames of a call was calculated for each class, and the classifier output was the label of the class with the maximum averaged log likelihood. Hidden Markov model (HMM) classifier [10]: State model of nonstationary signal, each state represents pseudo-stationary probability density function with a GMM. One model for each species was trained using the Baum-Welch algorithm on hand-labeled calls. Testing was performed using the Viterbi dynamic programming algorithm, which determines the log likelihood of the single most likely state sequence through a model. Classification confusion matrices: BIBLIOGRAPHY [1] M. K. Obrist, “Flexible bat echolocation: the influence of individual, habitat and conspecifics on sonar signal design,” Behav. Ecol. Sociobiol., vol. 36, pp. 207-219, 1995 [2] S. Parsons and G. Jones, “Acoustic identification of twelve species of echolocating bat by discriminant function analysis and artificial neural networks,” J. Exp. Biol., vol. 203, pp. 2641-2656, 2000 [3] M. D. Skowronski and J. G. Harris, “Acoustic detection and classification of microchiroptera using machine learning: lessons learned from automatic speech recognition,” J. Acoust. Soc. Am., 2005, submitted [4] M. B. Fenton and G. P. Bell, “Recognition of species of insectivorous bats by their echolocation calls,” J. Mammal., vol. 62, no. 2, pp. 233-243, May 1981 [5] M. J. O'Farrell, B. W. Miller, and W. L. Gannon, “Qualitative identification of free-flying bats using the Anabat detector,” J. Mammal., vol. 80, no. 1, pp. 11-23, Jan. 1999 [6] M. K. Obrist, “Flexible bat echolocation: the influence of individual, habitat and conspecifics on sonar signal design,” Behav. Ecol. Sociobiol., vol. 36, pp. 207-219, 1995 [7] M. K. Obrist, R. Boesch, and P. F. Fluckiger, “Variability in echolocation call design of 26 Swiss bat species: consequences, limits and options for automated field identification with a synergetic pattern recognition approach,” Mammalia, vol. 68, no. 4, pp. 307-322, Dec. 2004 [8] R. F. Lance, B. Bollich, C. L. Callahan, and P. L. Leberg, “Surveying forest-bat communities with Anabat detectors,” in Bats and Forests Symposium, R. M. R. Barclay and R. M. Brigham, eds., Res. Br., B.C. Min. For., Victoria, B.C., CA, pp. 175-184, 1996 [9] D. Russo and G. Jones, “Identification of twenty-two bat species (Mammalia: Chiroptera) from Italy by analysis of time-expanded recordings of echolocation calls,” J. Zool., Lond., vol. 258, no. 1, pp. 91-103, Sept. 2002 [10] L. R. Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition,” in Readings in Speech Recognition, A. Waibel and K.-F. Lee, eds., Kaufmann, San Mateo, CA, pp. 267-296, 1990 Tadarida brasiliensis: Average and st. dev. over 20 trials of randomly selected test and train calls, 50% test, 50% train. The GMM and HMM results were statistically indistinguishable (t-test, p>0.9).

Automatic detection and classification of Microchiropteran echolocation calls: Why the current technology is wrong and

Automatic detection and classification of Microchiropteran echolocation calls: Why the current technology is wrong and

Presentation Transcript

AUTOMATIC FAULT DETECTION BY USING WAVELET METHOD

What’s Wrong With current Semantic Web Reasoning (and how to fix it)

Automatic G enre Classification Using Large High-Level Musical Feature Sets

Automatic Detection of Excessive Glycemic Variability for Diabetes Management

Principled Asymmetric Boosting Approaches to Rapid Training and Classification in Face Detection

Tracking Down Software Bugs Using Automatic Anomaly Detection- Sudheendra Hangal, Monica S. Lam

Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and Computer Engineering

Mark D. Skowronski and John G. Harris Computational Neuro-Engineering Lab

My dolphin echolocation

Event Detection using Customer Care Calls

Automatic and Scalable Fault Detection for Mobile Applications

Automatic Detection of Voice Onset Time Contrasts For Use in Pronunciation Assessment

Automatic stylistic processing for classification and transformation of natural language text

Telephone Training

DIRA: Automatic Detection, Identification, and Repair of Control-Hijacking Attacks

Distributed Framework for Automatic Facial Mark Detection

Technology Gone Wrong

Mark D. Skowronski and John G. Harris Computational Neuro-Engineering Lab

Modeling How the Bat, Eptesicus fuscus, Captures Targets Using Echolocation