Locating cover songs and alternate performances in databases of raw audio
Download
1 / 12

Locating Cover Songs and Alternate Performances in Databases of Raw Audio - PowerPoint PPT Presentation


  • 257 Views
  • Uploaded on

Locating Cover Songs and Alternate Performances in Databases of Raw Audio Robert Turetsky [email protected] Advent Workshop May 17, 2002 Technology enables “liquid music” Production Distribution Consumption Content-Based Analysis: Motivation

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Locating Cover Songs and Alternate Performances in Databases of Raw Audio' - andrew


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Locating cover songs and alternate performances in databases of raw audio l.jpg

Locating Cover Songs and Alternate Performances in Databases of Raw Audio

Robert Turetsky

[email protected]

Advent Workshop

May 17, 2002


Technology enables liquid music l.jpg
Technology enables “liquid music” of Raw Audio

Production

Distribution

Consumption


Content based analysis motivation l.jpg
Content-Based Analysis: Motivation of Raw Audio

  • Search on file-sharing systems (e.g. KaZaA) involves meta-data

    • Meta-data prone to errors, omission, distortion

    • Only works if user already knows what to look for

  • Musical Content Analysis means:

    • Query by humming

    • Query by segment/prototype

    • Recommendation engines and artist discovery

    • Machine feedback/collaboration in composition

  • Locating cover songs is a first step


Locating cover songs prior work l.jpg
Locating Cover Songs: Prior Work of Raw Audio

  • Query By Humming

    • Mature field (kiosks, applets) but limited to monophonic music or manually transcribed polyphonic music

  • Jonathan Foote (FX Palo Alto)

    • ARTHUR (2000): align RMS energy. Works only on orchestral music, pop music has less dynamic range.

    • Content-Based Retrieval of Music and Audio (1997). Measures acoustic similarity, not equivalence.

  • Cheng Yang (Stanford)

    • Music Database Retrieval Based on Spectral Similarity (2001). Aligns MFCC at points of high energy using DTW.

    • MACS (2001). Aligns estimates of pitch likelihood. Indexing. “Bad” alignments discarded after linearity filter.


Why is locating cover songs so difficult l.jpg
Why is locating cover songs of Raw Audioso difficult?

  • Alternate performances can vary:

    • Studio vs. Live

      • Tempo (non-linear time shifting)

      • Pitch transposition

      • Production technique, acoustic character

      • Additions (i.e. audience interaction)

      • Alternate lyrics (i.e. Don’t Cry versions I and II)

    • Cover versions, artist re-interpretations

      • Vocalist, instrumentation, ornamentation

      • Entire character changes (i.e. Layla, dance remixes)

  • Yet we still know these songs are the same!


System overview l.jpg
System Overview of Raw Audio

Locate Section Breaks

Identify Summary Sections

Preprocessing

Pitch Extraction

Tonic Estimation

Query

Alignment


Phase 1 locate section breaks l.jpg
Phase 1: Locate Section Breaks of Raw Audio

  • Employ Foote’s Similarity Matrix

  • Theory: Windows of same section will have similar features. Windows of different sections will have features.

  • Similarity Matrix: Cosine distance between every fixed width window of the song

  • Novelty Score - measure of ‘newness’: correlation with checkerboard matrix.

  • Section breaks are peaks in the Novelty Score.


Phase 2 summary segments l.jpg
Phase 2: Summary Segments of Raw Audio

Section 1 ->

  • Motivation: Only transcribe and align salient segments

  • Measure of salience: Repetition

  • Method: Search for largest off-diagonal line in Similarity Matrix for each segment to measure extent of repetition (“score”)

  • Summary segment is most repeated section. Prune rows/columns of similar sections in score matrix. Repeat until 45-75 sec of audio is kept

Section 4 ->

Sec 1

Sec 2

Sec 3

Sec 4

Sec 1

Sec 2

Sec 3

Sec 4


Phase 3 pitch extraction l.jpg
Phase 3: Pitch Extraction of Raw Audio

Noise Suppression

  • Multi-pitch extraction algorithm based on Klapuri et al, 2001.

  • Works well, except in presence of drums.

Predominant

Pitch Estimation

Time ->

Estimate Pitched Sound Characteristics

Estimate # Voices and Iterate

Remove Found

Sound from Mixture

<- Pitch ->


Phase 3 mpe details l.jpg
Phase 3: MPE Details of Raw Audio

Noise Reduction: RASTA style filter

Predominant pitch estimation: “Fuzzy search” for harmonic peaks

Spectral Smoothing to estimate sound parameters

Resynthesis

Repeat on mixture after removal

Resynthesis


Phase 4 5 query time alignment l.jpg
Phase 4-5: Query-time alignment of Raw Audio

  • Exhaustively align summary segments

  • Two alignments needed: Pitch and Time

  • Pitch Alignment: Tonic Estimation

    • Align two piano rolls at point of maximum cross-correlation between note histograms

  • Temporal Alignment: Dynamic Programming (Dynamic Time Warp)

    • Currently investigating different weights for rewarding note matches, penalizing mismatches


Locating cover songs future work l.jpg
Locating Cover Songs: Future Work of Raw Audio

  • Indexing scheme, other alignment techniques to improve speed of query

  • Thematic extraction to find only melody or harmony lines

  • Include Beat Tracking as part of score

  • Investigate harmonic analysis (identifying chord structure) for better feature

  • Speech recognition on lyrics???


ad