automatic g enre classification using large high level musical feature sets
Skip this Video
Download Presentation
Automatic G enre Classification Using Large High-Level Musical Feature Sets

Loading in 2 Seconds...

play fullscreen
1 / 27

ISMIR04presentation - PowerPoint PPT Presentation

  • Uploaded on

Automatic G enre Classification Using Large High-Level Musical Feature Sets. Cory McKay and Ichiro Fujinaga Dept. of Music Theory Music Technology Area McGill University Montreal, Canada. Topics. Introduction Existing research Taxonomy Features Classification methodology Results

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'ISMIR04presentation' - victoria

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
automatic g enre classification using large high level musical feature sets

Automatic Genre Classification Using Large High-Level Musical Feature Sets

Cory McKay and Ichiro Fujinaga

Dept. of Music Theory

Music Technology Area

McGill University

Montreal, Canada

  • Introduction
  • Existing research
  • Taxonomy
  • Features
  • Classification methodology
  • Results
  • Conclusions


  • GOAL: Automatically classify symbolic recordings into pre-defined genre taxonomies
  • This is first stage of a larger project:
    • General music classification system
    • Classifies audio
    • Simple interface


why symbolic recordings
Why symbolic recordings?
  • Valuable high-level features can be used which cannot currently be extracted from audio recordings
    • Research provides groundwork that can immediately be taken advantage of as transcription techniques improve
  • Can classify music for which only scores exist (using OMR)
  • Can aid musicological and psychological research into how humans deal with the notion of musical genre
  • Chose MIDI because of diverse recordings available
    • Can convert to MusicXML, Humdrum, GUIDO, etc. relatively easily


existing research
Existing research
  • Automatic audio genre classification becoming a well researched field
    • Pioneering work: Tzanetakis, Essl & Cook
  • Audio results:
    • Less than 10 categories
    • Success rates generally below 80% for more than 5 categories
  • Less research done with symbolic recordings:
    • 84% for 2-way classifications (Shan & Kuo)
    • 63% for 3-way classifications (Chai & Vercoe)
  • Relatively little applied musicological work on general feature extraction. Two standouts:
    • Lomax 1968 (ethnomusicology)
    • Tagg 1982 (popular musicology)


taxonomies used
Taxonomies used
  • Used hierarchical taxonomy
    • A recording can belong to more than one category
    • A category can be a child of multiple parents in the taxonomical hierarchy
  • Chose two taxonomies:
    • Small (9 leaf categories):
      • Used to loosely compare system to existing research
    • Large (38 leaf categories):
      • Used to test system under realistic conditions


small taxonomy
Small taxonomy
  • Jazz
    • Bebop
    • Jazz Soul
    • Swing
  • Popular
    • Rap
    • Punk
    • Country
  • Western Classical
    • Baroque
    • Modern Classical
    • Romantic


training and test data
Training and test data
  • 950 MIDI files
  • 5 fold cross-validation
    • 80% training, 20% testing


  • 111 high-level features implemented:
    • Instrumentation
      • e.g. whether modern instruments are present
    • MusicalTexture
      • e.g. standard deviation of the average melodic leap of different lines
    • Rhythm
      • e.g. standard deviation of note durations
    • Dynamics
      • e.g. average note to note change in loudness
    • Pitch Statistics
      • e.g. fraction of notes in the bass register
    • Melody
      • e.g. fraction of melodic intervals comprising a tritone
    • Chords
      • e.g. prevalence of most common vertical interval
  • More information available in Cory McKay’s master’s thesis (2004)


feature types
Feature types
  • One-dimensional features
    • Consist of a single number that represents an aspect of a recording in isolation
    • e.g. an average or a standard deviation
  • Multi-dimensional features
    • Consist of vectors of closely coupled statistics
    • Individual values may have limited significance taken alone, but together may reveal meaningful patterns
    • e.g. bins of a histogram, instruments present


classifiers used
Classifiers used
  • K-nearest neighbour (KNN)
    • Fast
    • One for all one-dimensional features
  • Feedforward neural networks
    • Can learn complex interrelationships between features
    • One for each multi-dimensional feature


a classifier ensemble16
A “classifier ensemble”
  • Consists of one KNN classifier and multiple neural nets
  • An ensemble with n candidate categories classifies a recording into 0 to n categories
  • Input:
    • All available feature values
  • Output:
    • A score for each candidate category based on a weighted average of KNN and neural net output scores


feature and classifier selection weighting
Feature and classifier selection/weighting
  • Some features more useful than others
  • Context dependant
    • e.g. best features for distinguishing between Baroque and Romantic different than when comparing Punk and Heavy Metal
  • Hierarchical and round-robin classifiers only trained on recordings belonging to candidate categories
    • Feature selection allows specialization to improve performance
  • Used genetic algorithms to perform:
    • Feature selection (fast) followed by
    • Feature weighting of survivors


exploration of taxonomy space
Exploration of taxonomy space
  • Three kinds of classification performed:
    • Parent (hierarchical)
      • 1 ensemble for each category with children
      • Only promising branch(es) of taxonomy explored
      • Field initially narrowed using relatively easy broad classifications before proceeding to more difficult specialized classifications
    • Flat
      • 1 ensemble classifying amongst all leaf categories
    • Round-robin
      • 1 ensemble for each pair of leaf categories
      • Final results arrived at through averaging


overall average success rates across all folds
Overall average success rates across all folds
  • 9 Category Taxonomy
    • Leaf: 86%
    • Root: 96%
  • 38 Category Taxonomy
    • Leaf: 57%
    • Root: 75%


importance of number of candidate features
Importance of number of candidate features
  • Examined effect on success rate of only providing subsets of available features to feature selection system:


  • Success rates better than previous research with symbolic recordings and on the upper end of research involving audio recordings
    • True comparisons impossible to make without standardized testing
  • Effectiveness of high-level features clearly demonstrated
  • Large feature library combined with feature selection improves results
  • Not yet at a point where can effectively deal with large realistic taxonomies, but are approaching that point