1 / 49

Description et Classification automatique des sons instrumentaux

Description et Classification automatique des sons instrumentaux . Geoffroy Peeters Ircam (Analysis/Synthesis Team) peeters@ircam.fr . 1. Introduction. trumpet. Musical Instrument Sound Classification numerous studies on sound classification

niles
Download Presentation

Description et Classification automatique des sons instrumentaux

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Description et Classification automatique des sons instrumentaux Geoffroy PeetersIrcam (Analysis/Synthesis Team) peeters@ircam.fr

  2. 1. Introduction trumpet • Musical Instrument Sound Classification • numerous studies on sound classification • few of them address the problem of generalization of sound sources (recognition of the same source possibly recorded in different conditions with various instrument manufacturers and players) • Evaluation of the system performance • training on a subset of the database, evaluation on the rest of the database • does not prove any applicability for the classification of sounds which do not belong to the database • Martin [1999] 76% (family) 39% for 14 instruments • Eronen [2001] 77% (family) 35% for 16 instruments • Goal of this study • study large database classification • How ? New classification system • Extract a large amount of features • New feature selection algorithm • Compare flat and hierarchical gaussian classifier

  3. Feature extraction Feature selection Feature Transform Classification Evaluation Confusion matrix Which features Classes organization

  4. 2. Feature extraction • Features for sound recognition: • speech recognition community, previous studies on musical instrument sounds classification, results of psycho-acoustical studies. • each feature set is supposed to perform well for a specific task • Principle: • 1) extract a large set of features • 2) filter the feature set a posteriori by a Feature Selection Algorithm

  5. 2. Feature extraction Audio features Taxonomy • Global descriptors • Instantaneous descriptors • Temporal modeling • Mean, • Variance • Modulation (pitch, energy)

  6. 2. Feature extraction Audio features Taxonomy • DT: temporal descriptors • DE: energy descriptors • DS: spectral descriptors • DH: harmonic descriptors • DP: perceptual descriptors

  7. DT.zero-crossing rate DT.auto-correlation 2. Feature extraction DT/DE: Temporal/Energy descriptors Envelop Energy sound • DT.log-attack time • DT.temporal increase • DT.temporal decrease • DT.temporal centroid • DT.effective duration • DE.total energy • DE.energy of harmonic part • DE.energy of noise part

  8. 2. Feature extraction DS: Spectral descriptors Window FFT sound • DS.centroid, DS.spread, DS.skewness, DS.kurtosis • DS.slope, DS.decrease, DS.roll-off • DS.variation

  9. 2. Feature extraction DH: Harmonic descriptors Window FFT sound Sinudoidal model • DH.Centroid, DH.Spread, DH.Skewness, DH.Kurtosis • DH.Slope, DH.Decrease, DH.Roll-off • DH.Variation • DH.Fundamental frequency • DH.Noisiness, DH.OddEvenRatio, DH.Inharmonicity • DH.Tristimulus • DH.Deviation

  10. 2. Feature extraction DP: Perceptual descriptors / DV: Various descriptors Window FFT sound Perception • DP.Centroid, DP.Spread, DP.Skewness, DP.Kurtosis • DP.Slope, DP.Decrease, DP.Roll-off • DP.Variation • DP.Loudness, RelativeSpecific Loudness • DP.Sharpness, DP.Spread • DP.Roughness, DP.FluctuationStrength • DV.MFCC, DV.Delta-MFCC, DV.Delta-Delta-MFCC • DV.SpectralFlatness, DV.SpectralCrest Mid-ear filering Bark scale Mel scale

  11. 2. Feature extraction Audio features design • No consensus on the use of amplitude and frequency scale • All features are computed using the following scale: • Frequency scale: linear / log / bark-bands • Amplitude scale: linear / power / log • note: log(0.0)=-infty -> normalization 24bits • Features must be independent of the recording level • Normalization in linear, in power scale • Normalization in logarithmic scale • Features must be independent of the sampling rate • Maximum frequency taken into account: 11025/2 Hz • Resampling (for zcr, xcorr)

  12. Feature extraction Feature selection Feature Transform Classification Evaluation Confusion matrix Which features Classes organization

  13. 3. Feature selection algorithm (FSA) • Problem: using a high number of features • some features can be irrelevant for the given task • over fitting of the model to the training set (especially with LDA) • classification models are difficult to interpret by human • Goal of feature selection algorithm (FTA) • find the minimal set ofcriterion 1) informative features with respect to the classescriterion 2) features that provide non redundant information • Forms of feature selection algorithm • embedded: the FSA is part of the classifier • filter: the FSA is distinct from the classifier and used before the classifier • wrapper: the FSA makes use of the classification results

  14. Criterion 1 informative features with respect to the classes principle: “feature values for sounds belonging to a specific class should be separated from the values for all the other classes » measure: for a specific feature i ratio of the Between-class inertia B to the Total class inertia T 3. Feature selection algorithm: IRMFSP • Inertia Ratio Maximization using Feature Space Projection • Criterion 2 features that provide non redundant information • apply an orthogonalization process of the feature space after the selection of each new feature (Gram-Schmidt Orthogonalization)

  15. 3. Feature selection algorithm: IRMFSP • Example :sustained/non-sustained sound separation • computation of the BT ratio for each feature • feature with the weakest ratio (r=6.9e-6) • Specific loudness m8 mean • feature with the highest ratio (r=0.58) • Energy temporal decrease • first three selected dimensions • 1st dim: temporal decrease • 2nd dim: spectral centroid • 3rd dim: temporal increase

  16. Feature extraction Feature selection Feature Transform Classification Evaluation Confusion matrix Which features Classes organization

  17. 4. Feature transformation: LDA • Linear Discriminant Analysis • find linear combination among features in order to maximize discrimination between classes: F -> F’ • Total inertia • Between Class Inertia • Transform initial feature space F by a transformation matrix Uin order to maximize the ratio • Solution: • eigen vectors of • associated to the eigen values (discriminative power)

  18. Feature extraction Feature selection Feature Transform Classification Evaluation Confusion matrix Which features Classes organization

  19. 5. Class modeling: flat classifiers • Flat classifiers • Flat gaussian classifier (F-GC) • “Flat”= all classes considered on a same level • Training: model each class k by a multi-dimensional gaussian pdf (mean vector, covariance matrix) • Evaluation: Bayes formula • Flat KNN classifier (F-KNN) • instance-based algorithm • assign to the input sound the majority class among its K Nearest Neighbors in the Feature Space • Euclidean distance => weighting of the axes ? • Apply to the output of the LDA (implicit weighting of the axes)

  20. 5. Class modeling: hierarchical classifiers • Hierarchical classifiers (F-GC) • Hierarchical gaussian classifier (H-GC) • Training: a tree of flat gaussian classifier each node has its own FSA, FTA and F-GC • Tree construction is supervised (>< decision tree) • Only the subset of sounds belonging to the classes of the current node are used • Evaluation: local probability decides which branch of the tree to follow • Advantages of H-GC • Learning facilities: it is easier to learn differences in a small subset of classes • Reduced class confusion: benefit from the higher recognition rate at the higher levels of the tree • Hierarchical KNN classifier (H-KNN)

  21. 5. Class modeling: hierarchical classifiers • Hierarchical classifiers (F-GC) • Hierarchical gaussian classifier (H-GC) • Training: a tree of flat gaussian classifier each node has its own FSA, FTA and F-GC • Tree construction is supervised (>< decision tree) • Only the subset of sounds belonging to the classes of the current node are used • Evaluation: local probability decides which branch of the tree to follow • Advantages of H-GC • Learning facilities: it is easier to learn differences in a small subset of classes • Reduced class confusion: benefit from the higher recognition rate at the higher levels of the tree • Hierarchical KNN classifier (H-KNN) • Decision Trees: • Binary Entropy Reduction Tree (BERT) • C4.5. • Partial Decision Tree (PART)

  22. Feature extraction Feature selection Feature Transform Classification Evaluation Confusion matrix Which features Classes organization

  23. 6. EvaluationTaxonomy used • Three different levels • T1: sustained/non-sustained sounds • T2: instrument families • T3: instrument names

  24. 6. EvaluationTest set • 6 databases • Ircam Studio OnLine (1323 sounds, 16 instruments), • Iowa University database (816 sounds, 12 instruments), • McGill University database (585 sounds, 23 instruments), • Microsoft “Musical Instruments” CD-ROM (216 sounds, 20 instruments), • two commercial databases Pro (532 sounds, 20 instruments) Vi databases (691 sounds, 18 instruments), • total = 4163 sounds. • notes: • 27 instrument have been considered • a large pitch range has been considered (4 octaves on average) • no muted, martele/staccato sounds

  25. 6. EvaluationEvaluation process • 1) Random 66%/33% partition of database (50 sets) • 2) One to One (O2O) [Livshin2003]: each database is used in turns to classify all other databases • 3) Leave One Database Out (LODO) [Livshin 2003]: all database except one are used in turnsto classify the remaining one

  26. 6. EvaluationResults O2O (II)

  27. 6. EvaluationResults O2O (II) • O2O (mean value over the 30 (6*5) experiments) • Discussion • low recognition rate for O2O compared to 66%/33% -> problem of generalization ? • system mainly learns the instrument instance instead of the instrument (each database contains a single instance of an instrument) • LODO (mean value over the 6 Left Out databases) • Goal: to increase the number of instances of each instrument • How: by combining several databases

  28. Feature extraction Feature selection Feature Transform Classification Evaluation Confusion matrix Which features Classes organization

  29. 5. EvaluationConfusion matrix • Low confusion between sustained / non-sustained sounds

  30. 5. EvaluationConfusion matrix • Largest confusions inside each instrument family

  31. 5. EvaluationConfusion matrix • Lowest recognition rates -> smallest training sets

  32. 5. EvaluationConfusion matrix • Confusion piano / guitar-harp

  33. 5. EvaluationConfusion matrix • Cross-family confusions

  34. 5. EvaluationConfusion matrix • Cross-family confusions • Cornet -> Bassoon • Cornet -> English-horn • Flute -> Clarinet • Oboe -> Flute • Trombone -> Flute

  35. Feature extraction Feature selection Feature Transform Classification Evaluation Confusion matrix Which features Classes organization

  36. 5. EvaluationMain selected features • Par FSA (IRMFSP)

  37. 5. EvaluationMain selected features • Par arbre de décision (C4.5)

  38. 5. EvaluationMain selected features • Par arbre de décision, décision regroupée (PART)

  39. Feature extraction Feature selection Feature Transform Classification Evaluation Confusion matrix Which features Classes organization

  40. 7. Instrument Class Similarity ? • Goal: • check that the proposed tree structure corresponds to natural class organization • How ? • Most people use Martin hierarchy • 1) check the grouping among the decision trees leaves • 2) MDS ? • MDS on acoustic features ? [Herrera AES114th] • Compute the dissimilarity between each class • How ?Compute the between-group F-matrix between class models • Observe the dissimilarity between the classes • How ? MDS (Multi-dimensional scaling) analysis • MDS preserve as much as possible distances between the dataand allows representing them into a lower dimensional space • usually MDS is used for representing dissimilarity judgements (Timbre similarity), used here on acoustic features • MDS (Kruskal’s STRESS formula 1 scaling method) • 3 dimensional space

  41. 7. Instrument Class Similarity • Clusters ? • non-sustained sounds

  42. 7. Instrument Class Similarity • Clusters ? • non-sustained sounds • Bowed-strings sounds

  43. 7. Instrument Class Similarity • Clusters ? • non-sustained sounds • Bowed-strings sounds • Brass sounds (TRPU ?)

  44. 7. Instrument Class Similarity • Clusters ? • non-sustained sounds • Bowed-strings sounds • Brass sounds (TRPU ?) • mix between single/double reeds and brass instruments

  45. 7. Instrument Class Similarity • Dimension 1: • separate sustained sounds / non sustained sounds • negative values: PIAN, GUI, HARP, VLNP, VLAP, CELLP, DBLP • -> attack-time, decrease time

  46. 7. Instrument Class Similarity • Dimension 1: • separate sustained sounds / non sustained sounds • negative values: PIAN, GUI, HARP, VLNP, VLAP, CELLP, DBLP • -> attack-time, decrease time • Dimension 2: • brightness • dark sounds:TUBB, BSN, TBTB, FHOR • bright sounds: PICC, CLA, FLUT • problem DBL ?

  47. 7. Instrument Class Similarity • Dimension 1: • separate sustained sounds / non sustained sounds • negative values: PIAN, GUI, HARP, VLNP, VLAP, CELLP, DBLP • -> attack-time, decrease time • Dimension 2: • brightness • dark sounds TUBB, BSN, TBTB, FHOR • bright sounds: PICC, CLA, FLUT • problem DBL ? • Dimension 3: • ? • Separation of bowed stings (VLN, VLA, CELL, DBL) • amount of modulation ?

  48. Conclusion ?

  49. Conclusion • State of the art • Martin [1999] 76% (family) 39% for 14 instruments • Eronen [2001] 77% (family) 35% for 16 instruments • This study • 85% (family) 64% for 23 instruments • increased recognition rates mainly explained by the use of new features • Perspectives • derive automatically the tree structure (analysis of decision tree ?) • test other classification algorithm (GMM, SVM, …) • test the system for other sound classes (non-instrumental sounds, sound FX) • extend the system to musical phrases • extend the system to polyphonic sounds • extend the system to multi-sources sounds • Links: http://www.cuidado.mu http://www.cs.waikato.ac.nz/ml/weka/

More Related