Automatic g enre classification using large high level musical feature sets
1 / 27

ISMIR04presentation - PowerPoint PPT Presentation

  • Uploaded on

Automatic G enre Classification Using Large High-Level Musical Feature Sets. Cory McKay and Ichiro Fujinaga Dept. of Music Theory Music Technology Area McGill University Montreal, Canada. Topics. Introduction Existing research Taxonomy Features Classification methodology Results

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'ISMIR04presentation' - victoria

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Automatic g enre classification using large high level musical feature sets l.jpg

Automatic Genre Classification Using Large High-Level Musical Feature Sets

Cory McKay and Ichiro Fujinaga

Dept. of Music Theory

Music Technology Area

McGill University

Montreal, Canada

Topics l.jpg

  • Introduction

  • Existing research

  • Taxonomy

  • Features

  • Classification methodology

  • Results

  • Conclusions


Introduction l.jpg

  • GOAL: Automatically classify symbolic recordings into pre-defined genre taxonomies

  • This is first stage of a larger project:

    • General music classification system

    • Classifies audio

    • Simple interface


Why symbolic recordings l.jpg
Why symbolic recordings?

  • Valuable high-level features can be used which cannot currently be extracted from audio recordings

    • Research provides groundwork that can immediately be taken advantage of as transcription techniques improve

  • Can classify music for which only scores exist (using OMR)

  • Can aid musicological and psychological research into how humans deal with the notion of musical genre

  • Chose MIDI because of diverse recordings available

    • Can convert to MusicXML, Humdrum, GUIDO, etc. relatively easily


Existing research l.jpg
Existing research

  • Automatic audio genre classification becoming a well researched field

    • Pioneering work: Tzanetakis, Essl & Cook

  • Audio results:

    • Less than 10 categories

    • Success rates generally below 80% for more than 5 categories

  • Less research done with symbolic recordings:

    • 84% for 2-way classifications (Shan & Kuo)

    • 63% for 3-way classifications (Chai & Vercoe)

  • Relatively little applied musicological work on general feature extraction. Two standouts:

    • Lomax 1968 (ethnomusicology)

    • Tagg 1982 (popular musicology)


Taxonomies used l.jpg
Taxonomies used

  • Used hierarchical taxonomy

    • A recording can belong to more than one category

    • A category can be a child of multiple parents in the taxonomical hierarchy

  • Chose two taxonomies:

    • Small (9 leaf categories):

      • Used to loosely compare system to existing research

    • Large (38 leaf categories):

      • Used to test system under realistic conditions


Small taxonomy l.jpg
Small taxonomy

  • Jazz

    • Bebop

    • Jazz Soul

    • Swing

  • Popular

    • Rap

    • Punk

    • Country

  • Western Classical

    • Baroque

    • Modern Classical

    • Romantic


Training and test data l.jpg
Training and test data

  • 950 MIDI files

  • 5 fold cross-validation

    • 80% training, 20% testing


Features l.jpg

  • 111 high-level features implemented:

    • Instrumentation

      • e.g. whether modern instruments are present

    • MusicalTexture

      • e.g. standard deviation of the average melodic leap of different lines

    • Rhythm

      • e.g. standard deviation of note durations

    • Dynamics

      • e.g. average note to note change in loudness

    • Pitch Statistics

      • e.g. fraction of notes in the bass register

    • Melody

      • e.g. fraction of melodic intervals comprising a tritone

    • Chords

      • e.g. prevalence of most common vertical interval

  • More information available in Cory McKay’s master’s thesis (2004)


Overview of the classifier l.jpg
Overview of the classifier


Feature types l.jpg
Feature types

  • One-dimensional features

    • Consist of a single number that represents an aspect of a recording in isolation

    • e.g. an average or a standard deviation

  • Multi-dimensional features

    • Consist of vectors of closely coupled statistics

    • Individual values may have limited significance taken alone, but together may reveal meaningful patterns

    • e.g. bins of a histogram, instruments present


Classifiers used l.jpg
Classifiers used

  • K-nearest neighbour (KNN)

    • Fast

    • One for all one-dimensional features

  • Feedforward neural networks

    • Can learn complex interrelationships between features

    • One for each multi-dimensional feature


A classifier ensemble16 l.jpg
A “classifier ensemble”

  • Consists of one KNN classifier and multiple neural nets

  • An ensemble with n candidate categories classifies a recording into 0 to n categories

  • Input:

    • All available feature values

  • Output:

    • A score for each candidate category based on a weighted average of KNN and neural net output scores


Feature and classifier selection weighting l.jpg
Feature and classifier selection/weighting

  • Some features more useful than others

  • Context dependant

    • e.g. best features for distinguishing between Baroque and Romantic different than when comparing Punk and Heavy Metal

  • Hierarchical and round-robin classifiers only trained on recordings belonging to candidate categories

    • Feature selection allows specialization to improve performance

  • Used genetic algorithms to perform:

    • Feature selection (fast) followed by

    • Feature weighting of survivors


Exploration of taxonomy space l.jpg
Exploration of taxonomy space

  • Three kinds of classification performed:

    • Parent (hierarchical)

      • 1 ensemble for each category with children

      • Only promising branch(es) of taxonomy explored

      • Field initially narrowed using relatively easy broad classifications before proceeding to more difficult specialized classifications

    • Flat

      • 1 ensemble classifying amongst all leaf categories

    • Round-robin

      • 1 ensemble for each pair of leaf categories

      • Final results arrived at through averaging


Overall average success rates across all folds l.jpg
Overall average success rates across all folds

  • 9 Category Taxonomy

    • Leaf: 86%

    • Root: 96%

  • 38 Category Taxonomy

    • Leaf: 57%

    • Root: 75%


Importance of number of candidate features l.jpg
Importance of number of candidate features

  • Examined effect on success rate of only providing subsets of available features to feature selection system:


Conclusions l.jpg

  • Success rates better than previous research with symbolic recordings and on the upper end of research involving audio recordings

    • True comparisons impossible to make without standardized testing

  • Effectiveness of high-level features clearly demonstrated

  • Large feature library combined with feature selection improves results

  • Not yet at a point where can effectively deal with large realistic taxonomies, but are approaching that point


Slide27 l.jpg