- 364 Views
- Uploaded on
- Presentation posted in: Sports / GamesEducation / CareerFashion / BeautyGraphics / DesignNews / Politics

Pitch-dependent Musical Instrument Identification and Its Application to Musical Sound Ontology

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Pitch-dependent Musical Instrument Identification and Its Application to Musical Sound Ontology

Tetsuro Kitahara*Masataka Goto**Hiroshi G. Okuno*

*Grad. Sch’l of Informatics, Kyoto Univ.**PRESTO JST / Nat’l Inst. Adv. Ind. Sci. & Tech.

IEA/AIE-2003 (24th June 2003 in UK)

- Musical Instrument Identification
- Difficulty: The pitch dependency of timbre
- Solution:Approximating it as a function of F0
- Experiments

- Musical Sound Ontology
- A hierarchy of musical instrument sounds
- Systematically constructed by C5.0

- To obtain the names of musical instruments from sounds (acoustical signals).
- Useful for automatic music transcription, music information retrieval, etc.

Feature Extraction

(e.g. Decay speed,

Spectral centroid)

w = argmax p(w|X)

= argmax p(X|w) p(w)

p(X|wpiano)

p(X|wflute)

<inst>piano</inst>

(a) Pitch = C2 (65.5Hz)

(b) Pitch = C6 (1048Hz)

0.5

0.5

0

0

-0.5

-0.5

0

1

2

3

0

1

2

3

time [s]

time [s]

The pitch dependency of timbree.g.Low-pitch piano sounds decay slowlyHigh-pitch piano sound decay fast

(a) Pitch = C2 (65.5Hz)

(b) Pitch = C6 (1048Hz)

0.5

0.5

0

0

-0.5

-0.5

0

1

2

3

0

1

2

3

time [s]

time [s]

The pitch dependency of timbree.g.Low-pitch piano sound = Slow decayHigh-pitch piano sound = Fast decay

In previous studies…

The pitch dependency was pointed out,

but has not been dealt with.

Our solution:

Approximate the pitch dependency of each featureas a function of fundamental frequency (F0)

Modelling of how each feature varies according to F0

Our solution:

Approximate the pitch dependency of each featureas a function of fundamental frequency (F0)

An F0-dependent multivariate normal distributionhas following two parameters:

F0-dependent mean functionwhich captures the pitch dependency(i.e. the position of distributions of each F0)

F0-normalized covariancewhich captures the non-pitch dependency

A musical instrument identification method has following four steps:

1st step: Feature extraction

2nd step: Dimensionality reduction

3rd step: Parameter estimation

Final step: Using the Bayes decision rule

129 features defined based on consulting literatures are extracted.

(1) Spectral centroid (which captures brightness of tones)

Piano

Flute

Spectral centroid

Spectral centroid

129 features defined based on consulting literatures are extracted.

(2) Decay speed of power

Flute

Piano

not decayed

decayed

The dimensionality of the feature space is reduced by following two methods.

129-dimensional feature space

PCA (principal component analysis)(with the proportion value of 99%)

79-dimensional feature spaceLDA (linear discriminant analysis)

18-dimensional feature space

First, the F0-dependent mean function is approximated as a cubic polynomial.

Second, the F0-normalized covariance is obtained by subtracting the F0-dependent mean from each feature.

eliminating the pitch dependency

The instrument w satisfyingw = argmax [log p(X|w; f) + log p(w; f)]is determined as the result.

p(X|w; f) … - A probability density function of the F0-dependent multivariate normal distribution. - Defined by F0-dependent mean function and the F0-normalized covariance.

- Database: A subset of RWC-MDB-I-2001
- Consists of solo tones of 19 real instrumentswith all pitch range.
- Contains 3 individuals and 3 intensitiesfor each instrument.
- Contains normal articulation only.
- The number of all sounds is 6,247.

- Using the 10-fold cross validation.
- Evaluate the performance both at individual instrument level and at category level.

Recognition rates: 79.73% (at individual level)90.65% (at category level)

Improvement:4.00% (at individual level)2.45% (at category level)

Error reduction (relative):16.48% (at individual level)20.67% (at category level)

Category level(8 classes)

Individual level(19 classes)

The recognition rates of following 6 instruments were improved by more than 7%.

Piano: The best improved (74.21%a83.27%) Because the piano has the wide pitch range.

- A hierarchy of musical instrument sounds
- Important for various applications.e.g.Category-level musical instrument recognition(such as strings, wind instruments)Music composing (or arrangement) supporting
- However, its systematic construction has not been reported.
- We report the result of constructing acoustics-based musical sound ontology using C5.0 decision tree program.

Different from conventional hierarchy.

Acoustic characteristics depend on the pitch as well as the sounding mechanism.

This hierarchy was known to musicians experientially, but has not been constructed by computer previously.

- We proposed a method for musical instrument identification which takes into consideration the pitch dependency of timbre.aRecognition rate improved: 75.73%a79.73%
- We reported the construction ofmusical sound ontology based on acoustic characteristics.
- Future works:
- Evaluation against mixture of sounds
- Development of application systems using the proposed method.

Temporal mean of kurtosis of spectral peaks

Spectral peaks

Non-harmonic

If power of non-harmonic components are stronger,

kurtosis of spectral peaks become higher

a This feature captures how much are non-harmonic components contained in spectrum.

Err Rdct35% 8% 23% 33% 20% 13% 15% 8%

- Recognition rates for all categories were improved.
- Recognition rates for Piano, Guitar, Strings: 96.7%

We adopted

Bayes (18 dim; PCA+LDA)

Bayes (79 dim; PCA only)Bayes (18 dim; PCA only)3-NN (18 dim; PCA+LDA)3-NN (79 dim; PCA only)3-NN (18 dim; PCA only)

- PCA+LDA+Bayes achieved the best performance.
- 18-dimension is better than 79-dimension.# of training data is not enough for 79-dim.
- The use of LDA improved the performance.LDA considers separation between classes.

We adopt

Bayes (18 dim; PCA+LDA)

Bayes (79 dim; PCA only)Bayes (18 dim; PCA only)3-NN (18 dim; PCA+LDA)3-NN (79 dim; PCA only)3-NN (18 dim; PCA only)

Jain’s guideline (1982):Having 5 to 10 times as many training data as # of dimensions seems to be a good practice.

- PCA+LDA+Bayes achieved the best performance.
- 18-dimension is better than 79-dimension.# of training data is not enough for 79-dim.
- The use of LDA improved the performance.LDA considers separation between classes.

14 dim. (85%)18 dim. (88%)20 dim. (89%)23 dim. (90%)32 dim. (93%)41 dim. (95%)52 dim. (97%)79 dim. (99%)

Hughes’s peaking phenomenon

- At 23-dimension, the performance peaked.
- Any results without LDA were worse than that with LDA.

Conventional hierarchy(Sounding-mechanism-based)