1 / 21

Audio Meets Image Retrieval Techniques

Audio Meets Image Retrieval Techniques. Dave Kauchak Department of Computer Science University of California, San Diego dkauchak@cs.ucsd.edu. Image vs. Audio. ?. ?. ?. ?. ?. ?. Rock. Classical. Country. Image techniques to audio.

makaio
Download Presentation

Audio Meets Image Retrieval Techniques

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Audio Meets Image Retrieval Techniques Dave Kauchak Department of Computer Science University of California, San Diego dkauchak@cs.ucsd.edu

  2. Image vs. Audio ? ? ? ? ? ? Rock Classical Country

  3. Image techniques to audio • Idea: Apply image retrieval (and classification) techniques to audio • Image is 2-D • Audio is 1-D 

  4. Benefits • Don’t have to reinvent the wheel • Image techniques have had fairly good success • More literature in image processing • Audio retrieval is a relatively new field

  5. Key Concepts and Goals • Image techniques to audio processing • Apply a number of different image techniques (and show they work ) • Relate various parts of audio to counterparts in image • Novel data set with known ground truth • Multiple input for audio • Raw audio

  6. A first step… • Audio retrieval • Input: A number of songs • Output: “Similar” songs from an audio database • Histogramming methods (Puzicha et. al.) • Wavelets instead of gabor filters

  7. Basic Technique histogram Database DWT Most “similar” songs

  8. Normal vs. Proportional Histogramming • Remember DWT: • Different number of samples per level • Normal: Histogram each level with same number of bins • Proportional: Histogram each level keeping samples/bin equal

  9. Compare Histograms • Chi-square on each level • Sum chi-square value and use for dissimilarity measure (lower the better) • Sum dissimilarity over all input songs

  10. Ground Truth Data Set • Songs by 4 different bands (10 songs each) • Dave Mathews band • U2 • Blink 182 • Green Day • Mono, sampled at 22 KHz from a number of sources

  11. Experiment • Input = 5 songs by a single band • Goal = Pull out 5 other songs by that band • 10 random experiments per band (40 total) • Normal bins: 8, 16, 32, 64, 128, 192, 256, 320, 384, 448, 512 • Proportional bins: 4, 8, 16, 32, 64

  12. Scoring • By points: • 5 pts. Correct answer in first place • 4 pts. Correct answer in second place, etc. • Perfect = 5+4+3+2+1 = 15 • Percentage correct at each place • Percentage that have correct answer less than or equal to place

  13. Results: Points

  14. Results: Points Proportional

  15. Best Score Results: 16 bins 

  16. Different Bands

  17. Percentage correct

  18. One last result 

  19. Summary of Results • Overall, results are not amazing • Band choice has large influence • Normal and Proportional perform somewhat similar • Proportional is more even over all bands • Bin size doesn’t appear to be crucial • 75% of a chance a song by the same band will end up in top 5

  20. Next Step… • Adaptive Binning • Vary Parameters • Levels • Song length • Histogram comparison methods • Another image retrieval algorithm • Boosting for feature selection using large feature set? • Other? • Larger and more diverse database

  21. Conclusion • Even though results are not fabulous, image processing techniques CAN be used for audio processing • Using bands for testing allows for ground truth • Audio files are BIG!

More Related