Robust audio identification for commercial applications
This presentation is the property of its rightful owner.
Sponsored Links
1 / 17

Robust Audio Identification for Commercial Applications PowerPoint PPT Presentation


  • 55 Views
  • Uploaded on
  • Presentation posted in: General

Robust Audio Identification for Commercial Applications. Matthias Gruhne [email protected] Fraunhofer IIS, AEMT, D-98693 Ilmenau, Germany. Overview. What is AudioID? Requirements System Architecture MPEG 7 Recognition Performance Applications Conclusions Demonstration.

Download Presentation

Robust Audio Identification for Commercial Applications

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Robust audio identification for commercial applications

Robust Audio Identification for Commercial Applications

Matthias Gruhne

[email protected]

Fraunhofer IIS, AEMT, D-98693 Ilmenau, Germany


Overview

Overview

  • What is AudioID?

  • Requirements

  • System Architecture

  • MPEG 7

  • Recognition Performance

  • Applications

  • Conclusions

  • Demonstration


What is audioid

What is AudioID?


What is audioid1

What is AudioID?

  • Identify audio material (artist, song, etc.) by analysis of the signal itself

  • ”Content-Based Identification”

  • No associated information required (headers, ID3 tags)

  • No embedded signals (e.g. watermark), are required

  • Some knowledge available about music to be identified (reference database)

Purpose

Conditions


Requirements

Requirements

  • High recognition rates (> 95%), even with distorted signals

  • Robust against various distortions:

    • volume change, equalization, noise addition, audio coding (e.g. MP3), ...

    • “analog” artifacts (e.g. D/A, A/D)

  • Small “signature” size

  • Extensibility of database (> 106 items) while keeping processing time low(few ms/item)

Recognition rate

Robustness

Compactness

Scalability


System architecture overview

System Architecture - Overview


System architecture

System Architecture

  • Signal preprocessing

  • Extract the “essence” of audio signal

  • Increase discriminance & efficiency

  • Temporal grouping of features (super vector)

  • Statistics calculation (mean, variance, etc.)

FeatureExtractor

FeatureProcessor


System architecture1

System Architecture

  • Clustering of processed feature vectors:

    • further reduce the amount of data

    • enhance robustness (overfitting)

  • Add class with associated metadata to database

  • Compare feature vectors against classes in database by means of some metric

  • Find class yielding the best approximation

  • Retrieve associated metadata

Class generator

Classification


Mpeg 7 elements for robust audio matching

MPEG-7 - Elements for Robust Audio Matching

Low leveldata

  • “AudioSpectrumFlatness” LLD

    • Derived from:Spectral Flatness Measure (SFM)

    • Describes “un/flatness” of spectrum in frequency bands (tonal  noise)

  • “AudioSignature” Description Scheme

    • Statistical data summarization of“AudioSpectrumFlatness” LLD

    • Textual description in XML syntax

“Fingerprint”


Mpeg 7 benefits

MPEG-7 - Benefits

  • Standardized Feature Format guarantees worldwide interoperability

  • Published, open format descriptive data can be produced easily

  • Large MPEG-7 compliant databases expected to be available in near future (incl. “fingerprints”)

  • Long term format stability/ life time


Recognition performance conditions

Recognition Performance- Conditions

Conditions

  • Training and test sets (mostly rock / pop):

    • 15,000 items

    • 90,000 items

  • Spectral Flatness Measure (SFM)

  • Number of correctly identified items (both “single best” and “within top 10”)

Considered feature

Classificationperformance


Recognition performance 15k items

Top 1 /Top 10

Recognition Performance - 15k items

  • 16 bands

  • Advanced matching with temporal tracking


Recognition performance 90k items

Recognition Performance - 90k items

!

!

  • 16 bands

  • Advanced matching with temporal tracking


Applications

Applications

  • Retrieve associated metadata by identifying audio content

  • Automated search of audio content on the Internet

  • Broadcast monitoring by protocoling the transmission of audio material

  • Feature based indexing of audio databases (similarity search)

  • ...


Conclusions

Conclusions

  • High recognition rates (>99 % tested with 90,000 items)

  • Robust to “real world” signal distortions

  • Fast and reliable extraction and classification

  • Underlying feature specified in MPEG-7 standard ensures worldwide interoperability and licensing available for everyone


Real time demonstration

Real Time Demonstration:

  • Demo running on laptop(Pentium III @ 500 MHz)

  • Local database with 15,000 items(Rock / Pop genre)

  • Acoustic transmission:mp3 -> D/A -> Speakers -> Noisy Environment -> Microphone -> A/D -> AudioID


Thanks for your attention

Thanks for your Attention !


  • Login