audiodb scalable approximate nearest neighbor search with automatic radius bounded indexing n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing PowerPoint Presentation
Download Presentation
AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing

Loading in 2 Seconds...

play fullscreen
1 / 30

AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing - PowerPoint PPT Presentation


  • 59 Views
  • Uploaded on

AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing. Michael A. Casey Digital Musics Dartmouth College, Hanover, NH. Scalable Similarity. 8M tracks in commercial collection PByte of multimedia data Require passage-level retrieval (~ 2 bars)

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing' - len-griffin


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
audiodb scalable approximate nearest neighbor search with automatic radius bounded indexing

AudioDB: Scalable approximate nearest-neighbor search with automatic radius-bounded indexing

Michael A. Casey

Digital Musics

Dartmouth College, Hanover, NH

ASA 156: Statistical Approaches for Analysis of Music and Speech Audio Signals

scalable similarity
Scalable Similarity
  • 8M tracks in commercial collection
  • PByte of multimedia data
  • Require passage-level retrieval (~ 2 bars)
  • Require scalable nearest-neighbor methods
specificity
Specificity
  • Partial track retrieval
  • Alternate versions: remix, cover, live, album
  • Task is mid-high specificity
example remixing
Example: remixing
  • Original Track
  • Remix 1
  • Remix 2
  • Remix 3
audio shingles
Audio Shingles
  • Shingles provide contextual information about features
  • Originally used for Internet search engines:
    • Andrei Z. Broder, Steven C. Glassman, Mark S. Manasse, Geoffrey Zweig:
    • “Syntactic Clustering of the Web”.Computer Networks 29(8-13): 1157-1166 (1997)
  • Related to N-grams, overlapping sequences of features
  • Applied to audio domain by Casey and Slaney :
    • Casey, M.   Slaney, M.   “The Importance of Sequences in Musical Similarity”, in Proc.
  • IEEE Int. Conf. onAcoustics, Speech and Signal Processing, 2006. ICASSP 2006

, concatenate l frames of m dimensional features

A shingle is defined as:

audio shingle similarity1
Audio Shingle Similarity

For shingles with M dimensions (M=l.m); m=12, 20; l=30,40

, a query shingle drawn from a query track {Q}

, database of audio tracks indexed by (n)

, a database shingle from track n

Shingles are normalized to unit vectors, therefore:

slide8

AudioDB: Shingle Nearest Neighbor Search

  • Open source: google: “audioDB”
  • Management of tracks, sequences, salience
  • Automatic indexing parameters
  • OMRAS2, Yahoo!, AWAL, CHARM, more…
  • Web-services interface (SOAP / JSON)
  • Implementation of LSH for large N ~ 1B
  • 1-10 ms whole-track retrieval from 1B vectors
whole track similarity
Whole-track similarity
  • Often want to know which tracks are similar
  • Similarity depends on specificity of task
    • Distortion / filtering / re-encoding (high)
    • Remix with new audio material (mid)
    • Cover song: same song, different artist (mid)
whole track resemblance radius bounded search
Whole-track resemblance:radius-bounded search

Compute the number of shingle collisions between two tracks:

whole track resemblance radius bounded search1
Whole-track resemblance:radius-bounded search

Compute the number of shingle collisions between two tracks:

  • Requires a threshold for considering shingles to be related
  • Need a way to estimate relatedness (threshold) for data set
distribution of minimum distances
Distribution of minimum distances

Database: 1.4 million shingles. The left bump is the minimum between 1000 randomly selected

query shingles and this database. The right bump is a small sampling (1/98 000 000) of the full

histogram of all distances.

radius bounded retrieval performance cover song opus task
Radius-bounded retrieval performance: cover song (opus task)
  • Performance depends critically on xthresh, the collision threshold
  • Want to estimate xthresh automatically from unlabelled data
order statistics
Order Statistics
  • Minimum-value distribution is analytic
  • Estimate the distribution parameters
  • Substitute into minimum value distribution
  • Define a threshold in terms of FP rate
  • This gives an estimate of xthresh
estimating xthresh from unlabelled data
Estimating xthresh from unlabelled data
  • Use theoretical statistics
  • Null Hypothesis:
    • H0: shingles are drawn from unrelated tracks
  • Assume elements i.i.d., normally distributed
  • M dimensional shingles, d effective degrees of freedom:
  • Squared distance distribution for H0
ml for background distribution
ML for background distribution
  • Likelihood for N data points (distances squared)
  • d = effective degrees of freedom
  • M = shingle dimensionality
background distribution parameters
Background distribution parameters
  • Likelihood for N data points (distances squared)
  • d = effective degrees of freedom
  • M = shingle dimensionality
estimate of xthresh
Estimate of xthresh

, false positive rate

unlabelled data experiment
Unlabelled data experiment
  • Unlabelled data set
  • Known to contain:
    • cover songs (same work, different performer)
    • Near duplicate recordings (misattribution, encoding)
  • Estimate background distance distribution
  • Estimate minimum value distribution
  • Set xthresh so FP rate is <= 1%
  • Whole-track retrieval based on shingle collisions
scaling
Scaling
  • Locality sensitive hashing
  • Trade-off approximate NN for time complexity
  • 3 to 4 orders of magnitude speed-up
  • No noticeable degradation in performance
    • For optimal radius threshold
current deployment
Current deployment
  • Large commercial collections
    • AWAL ~ 100,000 tracks
    • Yahoo! 2M+ tracks, related song classifier
  • AudioDB: open-source, international consortium of developers
  • Google: “audioDB”
conclusions
Conclusions
  • Radius-bounded retrieval model for tracks
  • Shingles preserve temporal information, high d
  • Implements mid-to-high specificity search
  • Optimal radius threshold from order statistics
    • null hypothesis: shingles are drawn from unrelated tracks
  • LSH requires radius bound, automatic estimate
  • Scales to 1B shingles+ using LSH
thanks
Thanks
  • Malcolm Slaney, Yahoo! Research Inc.
  • Christophe Rhodes, Goldsmiths, U. of London
  • Michela Magas, Goldsmiths, U. of London
  • Funding: EPSRC: EP/E02274X/1