sound detection n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Sound Detection PowerPoint Presentation
Download Presentation
Sound Detection

Loading in 2 Seconds...

play fullscreen
1 / 16
brencis

Sound Detection - PowerPoint PPT Presentation

120 Views
Download Presentation
Sound Detection
An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Sound Detection Derek Hoiem Rahul Sukthankar (mentor) August 24, 2004

  2. Objective • Learn model of sound object from few (10-20) examples and distinguish from all other sounds • Examples of sound classes: • Gunshots, screams, laughter, car horns, meow, dog bark, etc

  3. Applications • “Tell me if you hear a gunshot.” (monitoring) • “Get me video clips containing dogs barking.” (search and retrieval) • “What’s going on?” (scene understanding)

  4. Why its difficult • Sound classes have large variations • Sounds are often ambiguous without context • Overlaid “noise” obscures sound

  5. Sound or not? Which of these sounds are not from their named classes? Car horn Dog bark Laser gun

  6. Previous work • Sound Classification (Wold 1996, Casey 2001, etc) • Categorize short sound clips • Reasonable accuracy (5-20% error) • Sound Detection (Defaux 2000, Piamsa-nga 1999) • Localize and recognize sound objects in long clips • Poor performance or assumption of unrealistic conditions (e.g., very quiet background)

  7. Clip 1 Clip 2 … Clip N Detection via Windowed Search Long Track Clip Classifier Return locations of detected sound object Break audio track into short overlapping short clips Independently classify short clips as object or non-object

  8. Features Features Features Features Time-frequency analysis: windowed Fourier transform Extract power percentage in each band over time and total power over time Compute features used for classification Representation meows phone rings Raw Representation

  9. Classification Features • Diverse feature set: • Different sound classes are distinctive in different ways • means and standard deviations of power at different frequencies • Band-width, peaks, loudness, etc. • 138 features in all

  10. Classification by Decision Trees • Try to find simple rules that discriminate object from non-object • Each decision is based on a threshold of a feature value • Assign confidence based on likelihood of data for object and non-object classes at each leaf node Decision nodes Leaf Nodes

  11. Boosted Trees • Problem: One decision tree by itself may not be a great classifier • Solution: Use several trees, with each one focusing on the mistakes of previously learned trees • Adaboost: • Weight training data uniformly • Learn a decision tree classifier on weighted data • Re-weight data giving more weight to incorrectly classified examples • Final classification based on linear combination of confidences from all learned decision trees

  12. Examples of Decision Trees Meow Gunshot Low percentage of power in low frequencies in mid-time of sound High power amplitude range Very high power amplitude range Gunshot More complex tree that focuses on examples misclassified by tree above

  13. Cascade of Classifiers • Goal: eliminate false positives with few false negatives in early stages • Advantages: • Allows use of large set of negative training examples • Improves classification speed • Dangers: cannot recover from false negatives Pass (5%) Pass (2%) Pass (0.005%) Sound Clip Stage 1 Stage 2 Stage 3 Pass Fail Fail Fail Fail

  14. Best Performance Worst Performance Results: Classification Error

  15. Results: ROC curves Note: to approximate negative error rate divide FP by 25,000

  16. Results: Anecdotal Gunshots Female Laugh Male Laugh Swords Scream