robust audio identification for commercial applications
Download
Skip this Video
Download Presentation
Robust Audio Identification for Commercial Applications

Loading in 2 Seconds...

play fullscreen
1 / 17

Robust Audio Identification for Commercial Applications - PowerPoint PPT Presentation


  • 72 Views
  • Uploaded on

Robust Audio Identification for Commercial Applications. Matthias Gruhne [email protected] Fraunhofer IIS, AEMT, D-98693 Ilmenau, Germany. Overview. What is AudioID? Requirements System Architecture MPEG 7 Recognition Performance Applications Conclusions Demonstration.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Robust Audio Identification for Commercial Applications' - jemima


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
robust audio identification for commercial applications

Robust Audio Identification for Commercial Applications

Matthias Gruhne

[email protected]

Fraunhofer IIS, AEMT, D-98693 Ilmenau, Germany

overview
Overview
  • What is AudioID?
  • Requirements
  • System Architecture
  • MPEG 7
  • Recognition Performance
  • Applications
  • Conclusions
  • Demonstration
what is audioid1
What is AudioID?
  • Identify audio material (artist, song, etc.) by analysis of the signal itself
  • ”Content-Based Identification”
  • No associated information required (headers, ID3 tags)
  • No embedded signals (e.g. watermark), are required
  • Some knowledge available about music to be identified (reference database)

Purpose

Conditions

requirements
Requirements
  • High recognition rates (> 95%), even with distorted signals
  • Robust against various distortions:
    • volume change, equalization, noise addition, audio coding (e.g. MP3), ...
    • “analog” artifacts (e.g. D/A, A/D)
  • Small “signature” size
  • Extensibility of database (> 106 items) while keeping processing time low(few ms/item)

Recognition rate

Robustness

Compactness

Scalability

system architecture
System Architecture
  • Signal preprocessing
  • Extract the “essence” of audio signal
  • Increase discriminance & efficiency
  • Temporal grouping of features (super vector)
  • Statistics calculation (mean, variance, etc.)

FeatureExtractor

FeatureProcessor

system architecture1
System Architecture
  • Clustering of processed feature vectors:
    • further reduce the amount of data
    • enhance robustness (overfitting)
  • Add class with associated metadata to database
  • Compare feature vectors against classes in database by means of some metric
  • Find class yielding the best approximation
  • Retrieve associated metadata

Class generator

Classification

mpeg 7 elements for robust audio matching
MPEG-7 - Elements for Robust Audio Matching

Low leveldata

  • “AudioSpectrumFlatness” LLD
    • Derived from:Spectral Flatness Measure (SFM)
    • Describes “un/flatness” of spectrum in frequency bands (tonal  noise)
  • “AudioSignature” Description Scheme
    • Statistical data summarization of“AudioSpectrumFlatness” LLD
    • Textual description in XML syntax

“Fingerprint”

mpeg 7 benefits
MPEG-7 - Benefits
  • Standardized Feature Format guarantees worldwide interoperability
  • Published, open format descriptive data can be produced easily
  • Large MPEG-7 compliant databases expected to be available in near future (incl. “fingerprints”)
  • Long term format stability/ life time
recognition performance conditions
Recognition Performance- Conditions

Conditions

  • Training and test sets (mostly rock / pop):
    • 15,000 items
    • 90,000 items
  • Spectral Flatness Measure (SFM)
  • Number of correctly identified items (both “single best” and “within top 10”)

Considered feature

Classificationperformance

recognition performance 15k items
Top 1 /Top 10Recognition Performance - 15k items
  • 16 bands
  • Advanced matching with temporal tracking
recognition performance 90k items
Recognition Performance - 90k items

!

!

  • 16 bands
  • Advanced matching with temporal tracking
applications
Applications
  • Retrieve associated metadata by identifying audio content
  • Automated search of audio content on the Internet
  • Broadcast monitoring by protocoling the transmission of audio material
  • Feature based indexing of audio databases (similarity search)
  • ...
conclusions
Conclusions
  • High recognition rates (>99 % tested with 90,000 items)
  • Robust to “real world” signal distortions
  • Fast and reliable extraction and classification
  • Underlying feature specified in MPEG-7 standard ensures worldwide interoperability and licensing available for everyone
real time demonstration
Real Time Demonstration:
  • Demo running on laptop(Pentium III @ 500 MHz)
  • Local database with 15,000 items(Rock / Pop genre)
  • Acoustic transmission: mp3 -> D/A -> Speakers -> Noisy Environment -> Microphone -> A/D -> AudioID
ad