Pitch dependent musical instrument identification and its application to musical sound ontology l.jpg
This presentation is the property of its rightful owner.
Sponsored Links
1 / 32

Pitch-dependent Musical Instrument Identification and Its Application to Musical Sound Ontology PowerPoint PPT Presentation

Pitch-dependent Musical Instrument Identification and Its Application to Musical Sound Ontology Tetsuro Kitahara* Masataka Goto** Hiroshi G. Okuno* *Grad. Sch’l of Informatics, Kyoto Univ. **PRESTO JST / Nat’l Inst. Adv. Ind. Sci. & Tech. IEA/AIE-2003 (24 th June 2003 in UK) Today’s talk

Download Presentation

Pitch-dependent Musical Instrument Identification and Its Application to Musical Sound Ontology

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Pitch dependent musical instrument identification and its application to musical sound ontology l.jpg

Pitch-dependent Musical Instrument Identification and Its Application to Musical Sound Ontology

Tetsuro Kitahara*Masataka Goto**Hiroshi G. Okuno*

*Grad. Sch’l of Informatics, Kyoto Univ.**PRESTO JST / Nat’l Inst. Adv. Ind. Sci. & Tech.

IEA/AIE-2003 (24th June 2003 in UK)


Today s talk l.jpg

Today’s talk

  • Musical Instrument Identification

    • Difficulty: The pitch dependency of timbre

    • Solution:Approximating it as a function of F0

    • Experiments

  • Musical Sound Ontology

    • A hierarchy of musical instrument sounds

    • Systematically constructed by C5.0


1 what is musical instrument identification l.jpg

1. What is musical instrument identification?

  • To obtain the names of musical instruments from sounds (acoustical signals).

  • Useful for automatic music transcription, music information retrieval, etc.

Feature Extraction

(e.g. Decay speed,

Spectral centroid)

w = argmax p(w|X)

= argmax p(X|w) p(w)

p(X|wpiano)

p(X|wflute)

<inst>piano</inst>


2 what is the difficulty l.jpg

(a) Pitch = C2 (65.5Hz)

(b) Pitch = C6 (1048Hz)

0.5

0.5

0

0

-0.5

-0.5

0

1

2

3

0

1

2

3

time [s]

time [s]

2. What is the difficulty?

The pitch dependency of timbree.g.Low-pitch piano sounds decay slowlyHigh-pitch piano sound decay fast


2 what is the difficulty5 l.jpg

(a) Pitch = C2 (65.5Hz)

(b) Pitch = C6 (1048Hz)

0.5

0.5

0

0

-0.5

-0.5

0

1

2

3

0

1

2

3

time [s]

time [s]

2. What is the difficulty?

The pitch dependency of timbree.g.Low-pitch piano sound = Slow decayHigh-pitch piano sound = Fast decay

In previous studies…

The pitch dependency was pointed out,

but has not been dealt with.


3 how is the pitch dependency coped with l.jpg

3. How is the pitch dependency coped with?

Our solution:

Approximate the pitch dependency of each featureas a function of fundamental frequency (F0)


3 how is the pitch dependency coped with7 l.jpg

Modelling of how each feature varies according to F0

3. How is the pitch dependency coped with?

Our solution:

Approximate the pitch dependency of each featureas a function of fundamental frequency (F0)


3 how is the pitch dependency coped with8 l.jpg

3. How is the pitch dependency coped with?

An F0-dependent multivariate normal distributionhas following two parameters:

F0-dependent mean functionwhich captures the pitch dependency(i.e. the position of distributions of each F0)

F0-normalized covariancewhich captures the non-pitch dependency


4 musical instrument identification using f0 dependent multivariate normal distribution l.jpg

4. Musical instrument identification using F0-dependent multivariate normal distribution

A musical instrument identification method has following four steps:

1st step: Feature extraction

2nd step: Dimensionality reduction

3rd step: Parameter estimation

Final step: Using the Bayes decision rule


Slide10 l.jpg

4. Musical instrument identification using F0-dependent multivariate normal distribution(1st) Feature extraction

129 features defined based on consulting literatures are extracted.

(1) Spectral centroid (which captures brightness of tones)

Piano

Flute

Spectral centroid

Spectral centroid


Slide11 l.jpg

4. Musical instrument identification using F0-dependent multivariate normal distribution(1st) Feature extraction

129 features defined based on consulting literatures are extracted.

(2) Decay speed of power

Flute

Piano

not decayed

decayed


Slide12 l.jpg

4. Musical instrument identification using F0-dependent multivariate normal distribution(2nd)Dimensionality reduction

The dimensionality of the feature space is reduced by following two methods.

129-dimensional feature space

PCA (principal component analysis)(with the proportion value of 99%)

79-dimensional feature spaceLDA (linear discriminant analysis)

18-dimensional feature space


Slide13 l.jpg

4. Musical instrument identification using F0-dependent multivariate normal distribution(3rd) Parameter estimation

First, the F0-dependent mean function is approximated as a cubic polynomial.


Slide14 l.jpg

4. Musical instrument identification using F0-dependent multivariate normal distribution(3rd) Parameter estimation

Second, the F0-normalized covariance is obtained by subtracting the F0-dependent mean from each feature.

eliminating the pitch dependency


Slide15 l.jpg

4. Musical instrument identification using F0-dependent multivariate normal distribution(Final)The Bayes decision rule

The instrument w satisfyingw = argmax [log p(X|w; f) + log p(w; f)]is determined as the result.

p(X|w; f) … - A probability density function of the F0-dependent multivariate normal distribution. - Defined by F0-dependent mean function and the F0-normalized covariance.


5 experimental conditions l.jpg

5. Experimental Conditions

  • Database: A subset of RWC-MDB-I-2001

    • Consists of solo tones of 19 real instrumentswith all pitch range.

    • Contains 3 individuals and 3 intensitiesfor each instrument.

    • Contains normal articulation only.

    • The number of all sounds is 6,247.

  • Using the 10-fold cross validation.

  • Evaluate the performance both at individual instrument level and at category level.


6 experimental results l.jpg

6. Experimental Results

Recognition rates: 79.73% (at individual level)90.65% (at category level)

Improvement:4.00% (at individual level)2.45% (at category level)

Error reduction (relative):16.48% (at individual level)20.67% (at category level)

Category level(8 classes)

Individual level(19 classes)


6 experimental results19 l.jpg

6. Experimental Results

The recognition rates of following 6 instruments were improved by more than 7%.

Piano: The best improved (74.21%a83.27%) Because the piano has the wide pitch range.


7 musical sound ontology l.jpg

7. Musical sound ontology

  • A hierarchy of musical instrument sounds

  • Important for various applications.e.g.Category-level musical instrument recognition(such as strings, wind instruments)Music composing (or arrangement) supporting

  • However, its systematic construction has not been reported.

  • We report the result of constructing acoustics-based musical sound ontology using C5.0 decision tree program.


7 musical sound ontology21 l.jpg

7. Musical sound ontology


7 musical sound ontology22 l.jpg

7. Musical sound ontology

Different from conventional hierarchy.


7 musical sound ontology23 l.jpg

7. Musical sound ontology

Acoustic characteristics depend on the pitch as well as the sounding mechanism.


7 musical sound ontology24 l.jpg

7. Musical sound ontology

This hierarchy was known to musicians experientially, but has not been constructed by computer previously.


8 conclusions l.jpg

8. Conclusions

  • We proposed a method for musical instrument identification which takes into consideration the pitch dependency of timbre.aRecognition rate improved: 75.73%a79.73%

  • We reported the construction ofmusical sound ontology based on acoustic characteristics.

  • Future works:

    • Evaluation against mixture of sounds

    • Development of application systems using the proposed method.


Slide27 l.jpg

Temporal mean of kurtosis of spectral peaks

Spectral peaks

Non-harmonic

If power of non-harmonic components are stronger,

kurtosis of spectral peaks become higher

a This feature captures how much are non-harmonic components contained in spectrum.


Recognition rates at category level l.jpg

Recognition rates at category level

Err Rdct35% 8% 23% 33% 20% 13% 15% 8%

  • Recognition rates for all categories were improved.

  • Recognition rates for Piano, Guitar, Strings: 96.7%


Bayes vs k nn l.jpg

We adopted

Bayes vs k-NN

Bayes (18 dim; PCA+LDA)

Bayes (79 dim; PCA only)Bayes (18 dim; PCA only)3-NN (18 dim; PCA+LDA)3-NN (79 dim; PCA only)3-NN (18 dim; PCA only)

  • PCA+LDA+Bayes achieved the best performance.

  • 18-dimension is better than 79-dimension.# of training data is not enough for 79-dim.

  • The use of LDA improved the performance.LDA considers separation between classes.


Bayes vs k nn30 l.jpg

We adopt

Bayes vs k-NN

Bayes (18 dim; PCA+LDA)

Bayes (79 dim; PCA only)Bayes (18 dim; PCA only)3-NN (18 dim; PCA+LDA)3-NN (79 dim; PCA only)3-NN (18 dim; PCA only)

Jain’s guideline (1982):Having 5 to 10 times as many training data as # of dimensions seems to be a good practice.

  • PCA+LDA+Bayes achieved the best performance.

  • 18-dimension is better than 79-dimension.# of training data is not enough for 79-dim.

  • The use of LDA improved the performance.LDA considers separation between classes.


Relationship between training data and dimension l.jpg

Relationship between training data and dimension

14 dim. (85%)18 dim. (88%)20 dim. (89%)23 dim. (90%)32 dim. (93%)41 dim. (95%)52 dim. (97%)79 dim. (99%)

Hughes’s peaking phenomenon

  • At 23-dimension, the performance peaked.

  • Any results without LDA were worse than that with LDA.


Slide32 l.jpg

Conventional hierarchy(Sounding-mechanism-based)


  • Login