1 / 44

Transcription by Beat-Boxing Elliot Sinyor - MUMT 611 Feb 17, 2005

Transcription by Beat-Boxing Elliot Sinyor - MUMT 611 Feb 17, 2005. Presentation . Introduction Background Making “beat-box” sounds Some Common Methods Related Work “Query-by-beat-boxing: Music Retrieval for the DJ” Kapur, Benning, Tzanetakis

judd
Download Presentation

Transcription by Beat-Boxing Elliot Sinyor - MUMT 611 Feb 17, 2005

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Transcription by Beat-BoxingElliot Sinyor - MUMT 611Feb 17, 2005

  2. Presentation • Introduction • Background • Making “beat-box” sounds • Some Common Methods • Related Work • “Query-by-beat-boxing: Music Retrieval for the DJ” • Kapur, Benning, Tzanetakis • “A Drum pattern Retrieval Method by Voice Percussion” • Nakano, Ogata, Goto, Hiraga • “Towards Automatic Transcription of Expressive Oral Percussive Performances” • Hazan • Project for MUMT 605

  3. Introduction • Ways to input percussion: • Electronic Drums (Yamaha DD-5, Roland V-Drums) • Velocity-sensitive MIDI keyboard • Velocity-insensitive computer keyboard • Vocalized percussion: • Common practice - “beat-boxing”, tabla vocal notation • Few applications that explicitly use vocalized percussion as input.

  4. Introduction • Uses • Method of percussion input, for composition or performance • Method of transcription, along with expressive information • Method of retrieving stored percussion samples

  5. Background Plosives (aka Stops) • /t/ sound being made • Step 1: tongue at alveolar ridge behind teeth • Air builds up behind tongue • Step 2: tongue released, along with air.

  6. Background Fricatives • /z/ sound being made (voiced) • /s/ sound being made. (unvoiced) • In both cases, the flow of air is constricted by the tongue and the alveolar ridge (right behind teeth). • Turbulence results in white noise sound.

  7. Background • Why does this matter? • Plosives and Fricatives yield short signals (approximately 30 msec) • Noisy, non-deterministic signals • Varies greatly from person to person

  8. Common Methods • Segment monophonic input stream (onset detection) • Distinguish between silence and “beats” • Analyse features • Temporal/Spectral features • Classify each sound based on features, and training data • (eg, ANN, minimum-distance criteria)

  9. High-level diagram [XXX]

  10. Analysis Features • Some Time-domain features: • Root Mean Squared (RMS) analysis - measure of energy level over a frame • Relative Difference Function (RDF) • used to determine perceptual onset • Zero-crossing Rate (ZCR) analysis - Used to estimate frequency components

  11. Analysis Features • Some Frequency-Domain features: • Spectral Flux • Measure of change from 1 frame to another • Spectral Centroid • “center of gravity” • Mel-frequency Cepstral Coefficients • Compact and perceptually relevant way to model the spectrum

  12. Analysis Features • Some Frequency-Domain features: • Spectral Flux • Measure of change from 1 frame to another • Spectral Centroid • “center of gravity” • Mel-frequency Cepstral Coefficients • Compact and perceptually relevant way to model the spectrum

  13. Analysis Features

  14. Onset Detection Relative Difference Function • Klapuri (99):

  15. Relative Difference Function • “This is psychoacoustically relevant, since perceived increase in signal amplitude is in relation to its level, the same amount of increase being more prominent in a quiet signal.” • Can be used to find the perceptual onset, whereas physical onset may occur earlier.

  16. Relative Difference Function

  17. Relative Difference Function /p/ /t/

  18. Relative Difference Function /k/ /s/

  19. Relative Difference Function loop2

  20. Time Domain - RMS • Can be used as a measure of a signal’s energy for a given frame of N samples. • Usable for perceptual onset detection? • Following figures: taken for N = 100 samples.

  21. RMS (/p/, /t/, /k/, /s/)N = 500 samples

  22. Relative Difference Function loop1

  23. Zero-Crossing Rate • Quite simply how many times does the signal cross the zero mark in a given frame of samples? • Somewhat analogous to “frequency” • Should be used with a noise-gate for silent portions. • Gouyon et al. (2000)

  24. Zero-Crossing Rate(/p/, /t/, /k/, /s/)

  25. Zero-Crossing Rate(loop1) N = 500 samples

  26. Zero-Crossing Rate(loop2) N = 500 samples

  27. Frequency-Domain Features Spectrogram: /s/ /k/ /t/ /p/

  28. Frequency-Domain Features • Spectral Centroid (ie center of gravity): • For each frame: average frequency weighted by averages, divided by sum of amplitudes • The midpoint of spectral energy distribution • Can be used as a rough estimate of “brightness”

  29. Spectral Centroid(/p/)

  30. Spectral Centroid(/t/)

  31. Spectral Centroid(/k/)

  32. Spectral Centroid(/s/)

  33. “Query-by-beat-boxing: Music Retrieval for the DJ” • Kapur, Benning, Tzanetakis (ISMIR 2004) • Identify drum sound being made • Induce tempo of beat • Match the beat-boxed input to a drum loop stored in a sample bank

  34. “Query-by-beat-boxing: Music Retrieval for the DJ” Pre-processed targets (drum loops created in Reason) Used ZCR, spectral-centroid, spectral rolloff, LPC as features in a NN Experimented with features to determine most reliable feature set

  35. “Query-by-beat-boxing: Music Retrieval for the DJ” • Bionic BeatBoxing Voice Processor • User provides 4 examples for each class of drum • User beat-boxes according to a click-track • Input beat is segmented, each sound is classified by ANN using ZCR. • Can play back, or use as input in MuseScape

  36. “Query-by-beat-boxing: Music Retrieval for the DJ” • MuseScape • User enters tempo/style (eg Dub, Rnb, House) • Can use analyzed BeatBoxed loop

  37. “A drum pattern retrieval method by voice percussion” • Nakano, Ogata, Goto, Hiraga (ISMIR 2004) • Use “onomatopea” to make monophonic bass-snare patterns • IOI (inter onset interval) compared to stored drum-sequences (all 4/4, 1 measure) • Allows for use of different consonants and vowels to make drum sounds

  38. “A drum pattern retrieval method by voice percussion” • Typical onomatopeic expressions of drum sounds stored in pronunciation dictionary (eg Don, Ton, Zu) • Onomatopeic expression mapped to drum sound • Use MFCC as analysis feature

  39. “Towards automatic Transcription of expressive oral percussive performances” • Hazan • Goal: to create symbolic representation of voice percussion that includes expressive features • Used 28 features (10 temporal, 18 spectral) • Tree-induction and Lazy Learning (k-NN) tested for accuracy.

  40. “Classification of Unvoiced Plosives and Fricatives for Control of Percussion” • Sought to distinguish between /p/, /t/, /k/, /s/ sounds • Used 5 features and minimum-distance criteria to classify • Implemented in Matlab

  41. “Classification of Unvoiced Plosives and Fricatives for Control of Percussion”

  42. “Classification of Unvoiced Plosives and Fricatives for Control of Percussion”

  43. “Classification of Unvoiced Plosives and Fricatives for Control of Percussion”

  44. References • A. Kapur, M. Benning, and G. Tzanetakis, “Query-by-beat-boxing: music retrieval for the DJ”, Proc. Int. Conf. Music Information Retrieval (ISMIR), Barcelona, Spain, 2004. • T. Nakano, J. Ogata, M. Goto, Y. Hiraga, “A drum pattern retrieval method by voice percussion”, Proc. Int. Conf. Music Information Retrieval (ISMIR), Barcelona, Spain, 2004. • A. Hazan, “Towards automatic transcription of expressive oral percussive performances”, Proc of the 10th international conference on Intelligent user interfaces, San Diego, 2005. • F. Gouyon, F. Pachet, and O. Delerue, “On the use of zero-crossing rate for an application of classification of percussive sounds”, Proc. Of the COST G-6 Conf. on Digital Audio Effects (DAFX-00),Verona, Italy, 2000. • A. Klapuri, “Sound onset detection by applying psychoacoustic knowledge”, Proc. IEEE Int. Conf.Acoust., Speech, and Signal Proc (ICASSP), 1999.

More Related