1 / 23

Musical Instrument Identification based on F0-dependent Multivariate Normal Distribution

Musical Instrument Identification based on F0-dependent Multivariate Normal Distribution. Tetsuro Kitahara* Masataka Goto** Hiroshi G. Okuno* *Grad. Sch’l of Informatics, Kyoto Univ. **PRESTO JST / Nat’l Inst. Adv. Ind. Sci. & Tech. ICASSP’03 (6-10 th Apr. 2003 in Hong Kong). Today’s talk.

niveditha
Download Presentation

Musical Instrument Identification based on F0-dependent Multivariate Normal Distribution

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Musical Instrument Identification based on F0-dependent Multivariate Normal Distribution Tetsuro Kitahara*Masataka Goto**Hiroshi G. Okuno* *Grad. Sch’l of Informatics, Kyoto Univ.**PRESTO JST / Nat’l Inst. Adv. Ind. Sci. & Tech. ICASSP’03 (6-10th Apr. 2003 in Hong Kong)

  2. Today’s talk • What is musical instrument identification? • What is difficult in musical instrument identification?The pitch dependency of timbre • How is the pitch dependency coped with? Approximate it as a function of F0 • Musical instrument identification using F0-dependent multivariate normal distribution • Experimental results • Conclusions

  3. 1. What is musical instrument identification? • It is to obtain the name of musical instruments from sounds (acoustical signals). • It is useful for music automatic transcription, music information retrieval, etc. • Its research began recently (since 1990s). Feature Extraction (e.g. Decay speed, Spectral centroid) w = argmax p(w|X) = argmax p(X|w) p(w) p(X|wpiano) p(X|wflute) <inst>piano</inst>

  4. (a) Pitch = C2 (65.5Hz) (b) Pitch = C6 (1048Hz) 0.5 0.5 0 0 -0.5 -0.5 0 1 2 3 0 1 2 3 time [s] time [s] 2. What is difficult in musical instrument identification? The pitch dependency of timbree.g. Low-pitch piano sound = Slow decay High-pitch piano sound = Fast decay

  5. 3. How is the pitch dependency coped with? Most previous studies have not dealt with the pitch dependency. Example: [Martin99] used hierarchical classification.[Brown99] used cepstral coefficients.[Eronen00] used both techniques.[Kashino98] developed a system for computational music scene analysis.[Kashino00] introduced template adaptation and musical contexts

  6. 3. How is the pitch dependency coped with? Proposal: Approximate the pitch dependency of each featureas a function of fundamental frequency (F0)

  7. 3. How is the pitch dependency coped with? An F0-dependent multivariate normal distributionhas following two parameters: F0-dependent mean function which captures the pitch dependency (i.e. the position of distributions of each F0) F0-normalized covariancewhich captures the non-pitch dependency

  8. 4. Musical instrument identification using F0-dependent multivariate normal distribution 1st step: Feature extraction129 features defined based on consulting literatures are extracted. e.g. Spectral centroid (which captures brightness of tones) Piano Flute Spectral centroid Spectral centroid

  9. 4. Musical instrument identification using F0-dependent multivariate normal distribution 1st step: Feature extraction129 features defined based on consulting literatures are extracted. e.g. Decay speed of power Flute Piano not decayed decayed

  10. 4. Musical instrument identification using F0-dependent multivariate normal distribution 2nd step: Dimensionality reductionFirst, the 129-dimensional feature spaceis transformed to a 79-dimensional spaceby PCA (principal component analysis) (with the proportion value of 99%)Second, the 79-dimensional feature space is transformed to an 18-dimensional space by LDA (linear discriminant analysis)

  11. 4. Musical instrument identification using F0-dependent multivariate normal distribution 3rd step: Parameter estimationFirst, the F0-dependent mean function is approximated as a cubic polynomial.

  12. 4. Musical instrument identification using F0-dependent multivariate normal distribution 3rd step: Parameter estimationSecond, the F0-normalized covariance is obtained by subtracting the F0-dependent mean from each feature. eliminating the pitch dependency

  13. 4. Musical instrument identification using F0-dependent multivariate normal distribution Final step: Using the Bayes decision ruleThe instrument w satisfying w = argmax [log p(X|w; f) + log p(w; f)]is determined as the result. p(X|w; f) … - A probability density function of the F0-dependent multivariate normal distribution. - Defined using the F0-dependent mean function and the F0-normalized covariance.

  14. 5. Experiments (Conditions) • Database: A subset of RWC-MDB-I-2001 • Consists of solo tones of 19 real instrumentswith all pitch range. • Contains 3 individuals and 3 intensitiesfor each instrument. • Contains normal articulation only. • The number of all sounds is 6,247. • Using the 10-fold cross validation. • Evaluate the performance both at individual-instrument level and at category level.

  15. 5. Experiments (Results) Recognition rates: 79.73% (at individual level)90.65% (at category level) Improvement:4.00% (at individual level)2.45% (at category level) Error reduction (relative):16.48% (at individual level)20.67% (at category level) Category level(8 classes) Individual level(19 classes)

  16. 5. Experiments (Results) The recognition rates of following 6 instruments were improved by more than 7%. Piano: The best improved (74.21%a83.27%) Because the piano has the wide pitch range.

  17. 6. Conclusions • To cope with the pitch dependency of timbre in musical instrument identification, F0-dependent multivariate normal distribution is proposed. • Experimental results: Recognition rate: 75.73%a79.73% (Using 6,247 solo tones of 19 instruments) • Future works: • Evaluation against mixture of sounds • Development of application systems using the proposed method.

  18. Recognition rates at category level Err Rdct35% 8% 23% 33% 20% 13% 15% 8% • Recognition rates for all categories were improved. • Recognition rates for Piano, Guitar, Strings: 96.7%

  19. We adopt Bayes vs k-NN Bayes (18 dim; PCA+LDA) Bayes (79 dim; PCA only)Bayes (18 dim; PCA only)3-NN (18 dim; PCA+LDA)3-NN (79 dim; PCA only)3-NN (18 dim; PCA only) • PCA+LDA+Bayes achieved the best performance. • 18-dimension is better than 79-dimension. # of training data is not enough for 79-dim. • The use of LDA improved the performance. LDA considers separation between classes.

  20. We adopt Bayes vs k-NN Bayes (18 dim; PCA+LDA) Bayes (79 dim; PCA only)Bayes (18 dim; PCA only)3-NN (18 dim; PCA+LDA)3-NN (79 dim; PCA only)3-NN (18 dim; PCA only) Jain’s guideline (1982):Having 5 to 10 times as many training data as # of dimensions seems to be a good practice. • PCA+LDA+Bayes achieved the best performance. • 18-dimension is better than 79-dimension. # of training data is not enough for 79-dim. • The use of LDA improved the performance. LDA considers separation between classes.

  21. Relationship between training data and dimension 14 dim. (85%)18 dim. (88%)20 dim. (89%)23 dim. (90%)32 dim. (93%)41 dim. (95%)52 dim. (97%)79 dim. (99%) Hughes’s peaking phenomenon • At 23-dimension, the performance peaked. • Any results without LDA are worse than that with LDA.

More Related