1 / 31

Music Information Retrieval System based on Cascade Classifiers

www.kdd.uncc.edu. Music Information Retrieval System based on Cascade Classifiers. CCI, UNC-Charlotte. http//:www.mir.uncc.edu. presented by Zbigniew W. Ras. Research sponsored by NSF IIS-0414815, IIS-0968647. Collaborators:

klouis
Download Presentation

Music Information Retrieval System based on Cascade Classifiers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. www.kdd.uncc.edu Music Information Retrieval System based on Cascade Classifiers CCI, UNC-Charlotte http//:www.mir.uncc.edu presented by Zbigniew W. Ras Research sponsored by NSF IIS-0414815, IIS-0968647

  2. Collaborators: Alicja Wieczorkowska (Polish-Japanese Institute of IT, Warsaw, Poland) Krzysztof Marasek (Polish-Japanese Institute of IT, Warsaw, Poland) PhD students supported by two NSF Grants: Elzbieta Kubera (Maria Curie-Sklodowska University, Lublin, Poland ) Rory Lewis (University of Colorado at Colorado Springs, USA) Wenxin Jiang (Fred Hutchinson Cancer Research Center in Seattle, USA) Xin Zhang (University of North Carolina, Pembroke, USA) Jacek Grekow (Bialystok University of Technology, Poland) Amanda Cohen-Mostafavi (InfoBelt LCC, Charlotte, USA)

  3. MIRAI - Musical Database (mostly MUMS) [music pieces played by 57 different music instruments] Goal: Design and Implement a System for Automatic Indexing of Music by Instruments Outcome: Musical Database indexed by instruments.

  4. MIRAI - Musical Database [music pieces played by 57+ different music instruments (see below) and described by over 910 attributes] Alto Flute, Bach-trumpet, bass-clarinet, bassoon, bass-trombone, Bb trumpet, b-flat clarinet, cello, cello-bowed, cello-martele, cello-muted, cello-pizzicato, contrabassclarinet, contrabassoon, crotales, c-trumpet, ctrumpet-harmonStemOut, doublebass-bowed, doublebass-martele, doublebass-muted, doublebass-pizzicato, eflatclarinet, electric-bass, electric-guitar, englishhorn, flute, frenchhorn, frenchHorn-muted, glockenspiel, marimba-crescendo, marimba-singlestroke, oboe, piano-9ft, piano-hamburg, piccolo, piccolo-flutter, saxophone-soprano, saxophone-tenor, steeldrums, symphonic, tenor-trombone, tenor-trombone-muted, tuba, tubular-bells, vibraphone-bowed, vibraphone-hardmallet, viola-bowed, viola-martele, viola-muted, viola-natural, viola-pizzicato, violin-artificial, violin-bowed, violin-ensemble, violin-muted, violin-natural-harmonics, xylophone.

  5. Automatic Indexing of Polyphonic Music What is needed & where is the problem? Database of monophonic and polyphonic music signals and their descriptions in terms of the standard MPEG7 features and new features (including temporal) . These signals are labeled by instruments forming additional feature called the decision feature. Why is needed? To build classifiers for automatic indexing of musical sound by instruments.

  6. Automatic Indexing of Music

  7. MIRAI - Cooperative Music Information Retrieval System based on Automatic Indexing Indexed Audio Database Query … … Instruments … Durations … Query Adapter … … Music Objects Empty Answer? User …

  8. Feature extractions frame Signal Data Sampling 0.12s frame size 0.04s hop size lower level raw data Feature Extraction MATLAB Higher level representations Feature Database manageable traditional pattern recognition classification clustering regression

  9. MPEG7 features Hamming NFFT Window FFT points Power STFT Spectral Centroid Spectrum Log Attack Time Signal envelope Temporal Centroid Signal Instantaneous Harmonic Spectral Spread Harmonic STFT Peaks Instantaneous Detection Harmonic Spectral Centroid Hamming Window Fundamental Frequency Instantaneous Harmonic Spectral Deviation Instantaneous Harmonic Spectral Variation

  10. Derived Database MPEG7 features Non-MPEG7 features & new temporal features

  11. New Temporal Features – S’(i), C’(i), S’’(i), C’’(i) S’(i) = [S(i+1) – S(i)]/S(i) ; C’(i) = [C(i+1) – C(i)]/C(i) where S(i+1), S(i) and C(i+1), C(i) are the spectrum spread and spectrum centroid of two consecutive frames: framei+1 and frame i. The changing ratios of spectrum spread and spectrum centroid for two consecutive frames are considered as the first derivatives of the spread and spectrum centroid. Following the same method we calculate the second derivatives: S’’(i) = [S’(i+1) – S’(i)]/S’(i) ; C’’(i) = [C’(i+1) – C’(i)]/C’(i) Remark: Sequence [S(i), S(i+1), S(i+2),….., S(i+k)] can be approximated by polynomial p(x)=a0+a1*x+a2*x2 + a3*x3 + ……… ; new features: a0, a1, a2, a3, ……

  12. Experiment with WEKA: 19 instruments [flute, piano, violin, saxophone, vibraphone, trumpet, marimba, french-horn, viola, basson, clarinet, cello, trombone, accordian, guitar, tuba, english-horn, oboe, double-bass], J48 with 0.25 confidence factor for pruning tree, minimum number of instances per leaf – 10; KNN – number of neighbors – 3 Euclidean distance is used as similarity function. Classification confidence with temporal features

  13. Confusion matrices: left is from Experiment 1, right is from Experiment 3. The correctly classified instances are highlighted in green and the incorrectly classified instances are highlighted in yellow

  14. Precision of the decision tree for each instrument Recall of the decision tree for each instrument F-score of the decision tree for each instrument

  15. Polyphonic sounds – how to handle? • Single-label classification Based on Sound Separation • Multi-labeled classifiers • Training classifiers on polyphonic sounds ? Problems? Polyphonic Sound Get frame Classifier . segmentation Feature extraction Sound separation Get Instrument Information loss during the signal subtraction Sound Separation Flowchart

  16. Multi-label classifier [collection of N classifiers] N – number of instruments 1 second window window segmentation frame – 0.12s 22 – frames with 0.04s hop size Get frame 85% 80% 70% 55% 45% 16% 12% …… Features Extraction N Classifiers

  17. Schema I - Hornbostel Sachs Idiophone Membranophone Aerophone Chordophone Lip Vibration Single Reed Free Side C Trumpet Tuba Bassoon Whip Flute French Horn Oboe Alto Flute

  18. Schema II - Play Methods …… Blow Bowed Muted Picked Pizzicato Shaken Alto Flute Flute Piccolo Bassoon ……

  19. Instrument granularity classifiers which are trained at each level of the hierarchical tree Hornbostel/Sachs We do not include membranophones because instruments in this family usually do not produce harmonic sound so that they need special techniques to be identified

  20. Modules of cascade classifier for single instrument estimation --- Hornboch /Sachs Pitch 3B 96.02% 91.80% 98.94% * = 95.00% >

  21. HIERARCHICAL STRUCTURE BUILT BY CLUSTERING ANALYSIS Seven common method to calculate the distance or similarity between clusters: single linkage (nearest neighbor), complete linkage (furthest neighbor), unweighted pair-group method using arithmetic averages (UPGMA), weighted pair-group method using arithmetic averages (WPGMA), unweighted pair-group method using the centroid average (UPGMC), weighted pair-group method using the centroid average (WPGMC), Ward's method. Six most common distance functions: Euclidean, Manhattan, Canberra (examines the sum of series of a fraction differences between coordinates of a pair of objects), Pearson correlation coefficient (PCC) – measures the degree of association between objects, Spearman's rank correlation coefficient, Kendal (counts the number of pairwise disagreements between two lists) Clustering algorithm – HCLUST (Agglomerative hierarchical clustering) – R Package

  22. Clustering result from Hclust algorithm with Ward linkage method and Pearson distance measure; Flatness coefficients are used as the selected feature “ctrumpet” and “batchtrumpet” are clustered in the same group. “ctrumpet_harmonStemOut” is clustered in one single group instead of merging with “ctrumpet”. Bassoon is considered as the sibling of the regular French horn. “French horn muted” is clustered in another different group together with “English Horn” and “Oboe” .

  23. Looking for optimal [classification method  data representation] in polyphonic music Testing Data: 49 polyphonic sounds are created by selecting three different single instrument sounds from the training database and mixing them together. KNN (k=3) is used as the classifier for each experiment.

  24. WWW.MIR.UNCC.EDU • Auto indexing system for musical instruments • intelligent query answering system for music instruments

  25. Questions?

  26. User entering query User is not satisfied and he is entering a new query - Action Rules System

  27. Action Rule Action rule is defined as a term [(ω) ∧ (α→β)] →(ϕ→ψ) conjunction of fixed condition features shared by both groups proposed changes in values of flexible features Information System desired effect of the action

  28. Action Rules Discovery Meta-actions based decision system S(d)=(X,A{d}, V ), with A= {A1,A2,…,Am} Influence Matrix if E32 = [a2 a2’], then E31 = [a1 a1’], E34 = [a4 a4’] Candidate action rule - r = [(A1 , a1 a1’)  (A2 , a2 a2’)  (A4 , a4 a4’)]) (d , d1 d1’) Rule r is supported & covered by M3

  29. "Action Rules Discovery without pre-existing classification rules", Z.W. Ras, A. Dardzinska, Proceedings of RSCTC 2008 Conference, in Akron, Ohio, LNAI 5306, Springer, 2008, 181-190 http://www.cs.uncc.edu/~ras/Papers/Ras-Aga-AKRON.pdf ROOT

  30. Since the window diminishes the signal on both edges, it leads to information loss due to the narrowing of frequency spectrum. In order to preserve this information, those consecutive analysis frames have overlap in time. The empirical experiments show the best overlap is two third of window size A A B A A A Time

  31. Windowing Hamming window spectral leakage

More Related