Introduction to Music Informatics: I548/N560, Spring 2011

Introduction to Music Informatics: I548/N560, Spring 2011 Instructor: Eric Nichols epnichols@gmail.com http://tinyurl.com/Info548

OverviewTues, Feb 15 • HW – questions? • HW: contest and output format • Dynamic Time Warping for Audio-to-MIDI alignment • Symbolic Representations • Reading: Dannenberg

Polyphonic Audio Matching and Alignment • NingHu, Roger B. Dannenberg and George Tzanetakis • Goal: align polyphonic audio to a symbolic score • Does not perform transcription • Used to search MIDI databases for a match to a given audio recording

Motivation • Query by Humming is an important problem, and it uses a symbolic database. • Why is symbolic better than audio matching for this problem? • Possible solution: do polyphonic transcription on the query. Then find best match. However, transcription is hard.

Idea • Instead of transcription of the query, convert the symbolic database into audio! • Instead of using an entire spectrum, convert to a chroma vector. • Do dynamic time warping (DTW) on audio to look for matches.

Chroma Vector • For each bin in the FFT • Assign the bin to the nearest half-step • Remove octave information • For each pitch class (1-12), average the value of its associated bins. • For this paper: 0.25 seconds of audio per chroma vector. Nonoverlapping windows. • Computing pitch from MIDI and vice versa • freq = 440 * 2^((MIDI-69) / 12.0) • MIDI = 69 + 12*log(freq/440.0) / log(2)

Chroma Vectors

Why chroma? • Not super-sensitive to spectral distribution – ignores many details of timbre by collapsing everything into one octave • Mostly is sensitive to fundamental pitches and chords

Converting MIDI to chroma • Two possibilities: • Render the MIDI with a synthesizer, and then compute the FFT and then the chroma vector. • Go directly from MIDI to chroma with a theoretical model (in this paper, it is assumed that no overtones are present in the chroma for each given MIDI pitch.) • One difficulty: dealing with percussive sounds

Chroma Similarity • Now we have lists of chroma vectors for an audio query and for a database of MIDI files • Normalize all vectors to have mean 0 and variance 1 • This helps reduce differences in vectors due to absolute loudness • Compute the Euclidean distance between vectors (0 distance = perfect match) • Compute the entire similarity matrix between vector pairs.

Similarity Matrix Dark = highly similar Black diagonal = matching path Note start, end, and length disparity

DTW computation

Results: 10 Beatles songs

Results 2

Results 3

Conclusion • More sophisticated DTW could be used to speed up the search • Gives an example of linking symbolic and audio domains

Discussion • What elements/features of music should we represent? • Can we create a “dream” representation?

Introduction to Music Informatics: I548/N560, Spring 2011