1 / 18

Introduction to Music Informatics: I548/N560, Spring 2011

Introduction to Music Informatics: I548/N560, Spring 2011. Instructor: Eric Nichols epnichols@gmail.com http://tinyurl.com/Info548. Overview Tues, Feb 15. HW – questions? HW: contest and output format Dynamic Time Warping for Audio-to-MIDI alignment Symbolic Representations

river
Download Presentation

Introduction to Music Informatics: I548/N560, Spring 2011

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Music Informatics: I548/N560, Spring 2011 Instructor: Eric Nichols epnichols@gmail.com http://tinyurl.com/Info548

  2. OverviewTues, Feb 15 • HW – questions? • HW: contest and output format • Dynamic Time Warping for Audio-to-MIDI alignment • Symbolic Representations • Reading: Dannenberg

  3. Polyphonic Audio Matching and Alignment • NingHu, Roger B. Dannenberg and George Tzanetakis • Goal: align polyphonic audio to a symbolic score • Does not perform transcription • Used to search MIDI databases for a match to a given audio recording

  4. Motivation • Query by Humming is an important problem, and it uses a symbolic database. • Why is symbolic better than audio matching for this problem? • Possible solution: do polyphonic transcription on the query. Then find best match. However, transcription is hard.

  5. Idea • Instead of transcription of the query, convert the symbolic database into audio! • Instead of using an entire spectrum, convert to a chroma vector. • Do dynamic time warping (DTW) on audio to look for matches.

  6. Chroma Vector • For each bin in the FFT • Assign the bin to the nearest half-step • Remove octave information • For each pitch class (1-12), average the value of its associated bins. • For this paper: 0.25 seconds of audio per chroma vector. Nonoverlapping windows. • Computing pitch from MIDI and vice versa • freq = 440 * 2^((MIDI-69) / 12.0) • MIDI = 69 + 12*log(freq/440.0) / log(2)

  7. Chroma Vectors

  8. Why chroma? • Not super-sensitive to spectral distribution – ignores many details of timbre by collapsing everything into one octave • Mostly is sensitive to fundamental pitches and chords

  9. Converting MIDI to chroma • Two possibilities: • Render the MIDI with a synthesizer, and then compute the FFT and then the chroma vector. • Go directly from MIDI to chroma with a theoretical model (in this paper, it is assumed that no overtones are present in the chroma for each given MIDI pitch.) • One difficulty: dealing with percussive sounds

  10. Chroma Similarity • Now we have lists of chroma vectors for an audio query and for a database of MIDI files • Normalize all vectors to have mean 0 and variance 1 • This helps reduce differences in vectors due to absolute loudness • Compute the Euclidean distance between vectors (0 distance = perfect match) • Compute the entire similarity matrix between vector pairs.

  11. Similarity Matrix Dark = highly similar Black diagonal = matching path Note start, end, and length disparity

  12. DTW computation

  13. Results: 10 Beatles songs

  14. Results 2

  15. Results 3

  16. Conclusion • More sophisticated DTW could be used to speed up the search • Gives an example of linking symbolic and audio domains

  17. Discussion • What elements/features of music should we represent? • Can we create a “dream” representation?

More Related