1 / 17

Robust Audio Identification for Commercial Applications

Robust Audio Identification for Commercial Applications. Matthias Gruhne ghe@emt.iis.fhg.de Fraunhofer IIS, AEMT, D-98693 Ilmenau, Germany. Overview. What is AudioID? Requirements System Architecture MPEG 7 Recognition Performance Applications Conclusions Demonstration.

jemima
Download Presentation

Robust Audio Identification for Commercial Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Robust Audio Identification for Commercial Applications Matthias Gruhne ghe@emt.iis.fhg.de Fraunhofer IIS, AEMT, D-98693 Ilmenau, Germany

  2. Overview • What is AudioID? • Requirements • System Architecture • MPEG 7 • Recognition Performance • Applications • Conclusions • Demonstration

  3. What is AudioID?

  4. What is AudioID? • Identify audio material (artist, song, etc.) by analysis of the signal itself • ”Content-Based Identification” • No associated information required (headers, ID3 tags) • No embedded signals (e.g. watermark), are required • Some knowledge available about music to be identified (reference database) Purpose Conditions

  5. Requirements • High recognition rates (> 95%), even with distorted signals • Robust against various distortions: • volume change, equalization, noise addition, audio coding (e.g. MP3), ... • “analog” artifacts (e.g. D/A, A/D) • Small “signature” size • Extensibility of database (> 106 items) while keeping processing time low(few ms/item) Recognition rate Robustness Compactness Scalability

  6. System Architecture - Overview

  7. System Architecture • Signal preprocessing • Extract the “essence” of audio signal • Increase discriminance & efficiency • Temporal grouping of features (super vector) • Statistics calculation (mean, variance, etc.) FeatureExtractor FeatureProcessor

  8. System Architecture • Clustering of processed feature vectors: • further reduce the amount of data • enhance robustness (overfitting) • Add class with associated metadata to database • Compare feature vectors against classes in database by means of some metric • Find class yielding the best approximation • Retrieve associated metadata Class generator Classification

  9. MPEG-7 - Elements for Robust Audio Matching Low leveldata • “AudioSpectrumFlatness” LLD • Derived from:Spectral Flatness Measure (SFM) • Describes “un/flatness” of spectrum in frequency bands (tonal  noise) • “AudioSignature” Description Scheme • Statistical data summarization of“AudioSpectrumFlatness” LLD • Textual description in XML syntax “Fingerprint”

  10. MPEG-7 - Benefits • Standardized Feature Format guarantees worldwide interoperability • Published, open format descriptive data can be produced easily • Large MPEG-7 compliant databases expected to be available in near future (incl. “fingerprints”) • Long term format stability/ life time

  11. Recognition Performance- Conditions Conditions • Training and test sets (mostly rock / pop): • 15,000 items • 90,000 items • Spectral Flatness Measure (SFM) • Number of correctly identified items (both “single best” and “within top 10”) Considered feature Classificationperformance

  12. Top 1 /Top 10 Recognition Performance - 15k items • 16 bands • Advanced matching with temporal tracking

  13. Recognition Performance - 90k items ! ! • 16 bands • Advanced matching with temporal tracking

  14. Applications • Retrieve associated metadata by identifying audio content • Automated search of audio content on the Internet • Broadcast monitoring by protocoling the transmission of audio material • Feature based indexing of audio databases (similarity search) • ...

  15. Conclusions • High recognition rates (>99 % tested with 90,000 items) • Robust to “real world” signal distortions • Fast and reliable extraction and classification • Underlying feature specified in MPEG-7 standard ensures worldwide interoperability and licensing available for everyone

  16. Real Time Demonstration: • Demo running on laptop(Pentium III @ 500 MHz) • Local database with 15,000 items(Rock / Pop genre) • Acoustic transmission: mp3 -> D/A -> Speakers -> Noisy Environment -> Microphone -> A/D -> AudioID

  17. Thanks for your Attention !

More Related