Musical genre classification
Download
1 / 36

Musical Genre Classification - PowerPoint PPT Presentation


  • 726 Views
  • Updated On :

Musical Genre Classification. Prepared by Elliot Sinyor for MUMT 611 March 3, 2005. Table of Contents. What is Genre? Approaches to Genre Classification Manual Automatic Related Work Soltau 1998 Tzanetakis & Cook prescriptive approach Pachet et al. 2001 emergent approach

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Musical Genre Classification' - MartaAdara


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Musical genre classification l.jpg

Musical Genre Classification

Prepared by Elliot Sinyor

for MUMT 611

March 3, 2005


Table of contents l.jpg
Table of Contents

  • What is Genre?

  • Approaches to Genre Classification

    • Manual

    • Automatic

  • Related Work

  • Soltau 1998

  • Tzanetakis & Cook

    • prescriptive approach

  • Pachet et al. 2001

    • emergent approach

  • Conclustion


  • What is genre l.jpg
    What is Genre?

    • A way of describing what an item shares with other items as well as what differentiates it from other items

    • From Aucouturier and Pachet

      • “The genesis of genre is therefore to be found in our natural and irrepressible tendency to classify”


    What is genre4 l.jpg
    What is Genre?

    • A&P separate into two broad categories

      • Intentional vs. Extensional


    What is genre intentional l.jpg
    What is Genre? - Intentional

    • More subjective

      • Relies on collective cultural knowledge

      • Social/Historical context

      • Eg 60s, hippies, brit-pop


    Problems with genre l.jpg
    Problems with “Genre”

    • What do the names mean?

      • Rock? Pop?

  • No fixed semantics

    • Amazon.com Genres by:

      • Period (“60s pop”)

      • Topic (“love song”)

      • Country of Origin (“Japanese music”)

  • Genre is based on extrinsic habits rather than intrinsic properties

    • To a French person – C. Aznavour – Variety

    • To an English person – C. Aznavour – French


  • What is genre extensional l.jpg
    What is Genre? - Extensional

    • Analysis-based

      • Describes the music itself

      • Tempo, timbre, pitch, language, etc.

      • (sometimes) easier for automatic genre classification systems

      • Eg fast rock, mellow classical.


    Problems with genre8 l.jpg
    Problems with “Genre”

    • What granularity to use?

      • By Artist?

        • Please Please Me vs. Sgt. Pepper

    • By Album?

      • Revolution 9 vs. Helter Skelter vs. Mother Nature’s Son

  • Does work for broad categories

    • Rock vs. Classical


  • Problems with genre9 l.jpg
    Problems with “Genre”

    • Does anyone agree?

      • Allmusic.com – 531 genres

      • Amazon.com – 719 genres

      • Mp3.com – 430 genres

        • Only 70 words common to the three taxonomies (Pachet and Cazaly 2000)


    Approaches to genre classification l.jpg
    Approaches to Genre Classification

    • Manual

      • Musicologists and Elbow Grease

    • Automatic

      • Prescriptive

        • Signal Analysis based

      • Emergent

        • Uses existing human-entered meta-data to group things together


    Manual classification l.jpg
    Manual Classification

    • Dannenberg et al. 2001:

      • To build a taxonomy for MSN Music Search Engine

      • “Few hundred thousand songs”

      • Hired full-time musicologists

      • Took 30 human years

      • “The details of the taxonomy and the design methodology are, however, not available”


    Manual classification12 l.jpg
    Manual Classification

    • Pachet and Cazaly 2001 (CUIDADO)

      • Separated descriptors – country, instrumentation, artist type, etc

        • _____ Rock

      • Too sensitive to musical evolution, difficult to build, difficult to maintain

      • Changed focus to artists instead of titles.

      • In any case, insufficient for millions of titles


    Prescriptive history l.jpg
    Prescriptive – History

    • Originated from Speech Recognition work

    • Most Classified audio from TV into music/speech/environmental


    Prescriptive various approches l.jpg
    Prescriptive – Various Approches

    • Saunders 1996

      • Thresholding/ZCR techniques

    • Scheirer and Slaney 1997

      • Multiple features and statistical pattern recognition

    • Kimber and Wilcox 1996

      • MFCCs and HMM to classify into music, speech, laughter and nonspeech

    • Zhang and Kuo 2001

      • Rule-based system for classifying audio from movies and TV into:

        • Non-music

          • Pure speech, non harmonic environmental sound

        • Music

          • Harmonic environmental sound, pure music, song, speech with music, environmental sound with music


    Prescriptive l.jpg
    Prescriptive

    • Soltau et al 1998 – “Recognition of Music Types”

    • New approach – Explicit Time Modelling with Neural Network (ETM-NN)


    Prescriptive soltau et al 1998 l.jpg
    Prescriptive – Soltau et al. 1998

    • In a nutshell:

      • Transform acoustic signal into sequence of abstract sonic events

      • Look at statistical patterns derived from sequences  combine into vectors that represent temporal structure

      • 3-layer feed-forward network


    Prescriptive soltau et al 199817 l.jpg
    Prescriptive – Soltau et al. 1998

    • Experimental Results:

      • 3 hours of data (360 samples, 30 sec each)

      • Rock, Pop, Techno, Classical

      • 67% training, 13% cross-validation, 20% evaluation

    • Compare ETM-NN vs. HMM, using cepstral coefficients

      • ETM-NN: 86.1% HMM: 79.2%


    Musical genre classification of audio signals tzanetakis and cook 2002 l.jpg
    “Musical Genre Classification of Audio Signals” – Tzanetakis and Cook, 2002

    • Timbral Texture Features

      • Spectral {Centroid, Rolloff, Flux}, ZCR, MFCC (5 coefficients)

    • Analysis Window – features should be stable – 23 ms

    • Texture Window – “minimum amount of time to identify a 'texture’” 43 analysis windows, 1 sec.

      • “Memory of the past”

  • Statistics (means, variances) of features over the texture window


  • Musical genre classification of audio signals tzanetakis and cook 200219 l.jpg
    “Musical Genre Classification of Audio Signals” – Tzanetakis and Cook, 2002

    • Timbral Texture Features

      • Spectral {Centroid, Rolloff, Flux}, ZCR, MFCC (5 coefficients)

    • Analysis Window – features should be stable – 23 ms

    • Texture Window – “minimum amount of time to identify a 'texture’” 43 analysis windows, 1 sec.

      • “Memory of the past”


    Timbral texture feature vector l.jpg
    Timbral Texture Feature Vector Tzanetakis and Cook, 2002

    • Statistics (means, variances) of features over the texture window

      • 19 dimensions

        • (m, v) of SC, SF, SR, ZCR, 5 MFCC

        • “low energy feature” fraction of analysis windows over texture window that have less than average RMS energy

          • Eg vocal music will have more silences


    Rhythmic content beat histogram l.jpg
    Rhythmic Content – “Beat Histogram” Tzanetakis and Cook, 2002

    • “Pitch detection with larger periods”

    • Use DWT to divide signal into frequency bands



    Features taken from bh l.jpg
    Features taken from BH Tzanetakis and Cook, 2002

    • A0, A1: relative amplitude (divided by the sum of amplitudes) of the first, and second histogram peak;

    • RA: ratio of the amplitude of the second peak divided by the amplitude of the first peak;

    • P1, P2: period of the first, second peak in bpm;

    • SUM: overall sum of the histogram (indication of beat strength).


    Pitch content features l.jpg
    Pitch Content Features Tzanetakis and Cook, 2002

    • Used enhanced Autocorrelation function to create folded (1 octave) and unfolded (all notes) pitch histograms

    • Mapped to MIDI note numbers

    • Folded- common pitch classes

    • Unfolded – pitch range

      • Higher for jazz, classical

  • FA0, UP0, UP1, IPO1 (interval between 2 highest peaks), SUM


  • Experimental results l.jpg
    Experimental Results Tzanetakis and Cook, 2002

    • Used GMM classifiers with diagonal covariance matrices


    Experimental results26 l.jpg
    Experimental Results Tzanetakis and Cook, 2002


    Prescriptive some results from a p l.jpg
    Prescriptive – Some Results: (from A&P) Tzanetakis and Cook, 2002

    • Gaussian and Gaussian Mixture Models, used in 48% of successful classification in Ermolinskiy et al.(2001) using 100 songs for each class in the training phase. This result has to be taken with care since the system uses only pitch information.

    • Tzanetakis et al. (2001) achieves a rather disappointing 57%, but also reports 75% in Tzanetakis and Cook (2000a) using 50 songs per class.

    • 90% in Lambrou and Sandler (1998) and 75% in Deshpande et al. (2001) on a very small training and test set, which may not be representative.

    • Pye (2000) reports 90% on a total set of 175 songs.

    • Soltau (1998) reports 80% with HMM, 86% with NN, with a database of 360 songs.


    Emergent l.jpg
    Emergent Tzanetakis and Cook, 2002

    • Unlike Prescriptive, it is unsupervised

    • Based on “cultural similarity from text documents”

    • Possible to extract similarities that are not possible to extract from the audio signal


    Emergent collaborative filtering l.jpg
    Emergent – Collaborative Filtering Tzanetakis and Cook, 2002

    • Shardanand & Maes 1995, Pestoni et al. 2001

    • There are patterns in tastes

    • Have users rate their music, match like-tasted users, recommend unknown items to users

    • Problems

      • Good for naïve profiles, bad for broad, eclectic tastes

      • Favors “middle of the road” – liked by large proportion

      • Only works some time after release of new music


    Emergent co concurrent analysis l.jpg
    Emergent – co-concurrent analysis Tzanetakis and Cook, 2002

    • Pachet et al. 2001

    • Looks at online text sources for co-occurrences of songs (aka data mining)

    • If 2 items appear in the same context (or share a common neighbour), this is evidence of some sort of similarity


    Co occurrence l.jpg
    Co-occurrence Tzanetakis and Cook, 2002

    • Pachet et al. 2001 “Musical Data Mining for Electronic Music Distribution”

    • Sources used

      • Track listing databases (CDDB)

        • Mostly look at compilations of similar artists

      • Radio Show playlists

        • Specialty programs better than daily commercial radio

        • Lists made by experts


    Co occurrence32 l.jpg
    Co-occurrence Tzanetakis and Cook, 2002

    • Build a matrix where:

      • Value of entry (i, j) corresponds to number of times title i co-occurs with title j

    • What about indirect co-occurrence?

      • Eg Eleanor Rigby/Good Vibrations, Good Vibrations/God Only Knows  Eleanor Rigby God Only Knows

    • Correlation measure, using co-variance matrices of each title


    Experimental results33 l.jpg
    Experimental Results Tzanetakis and Cook, 2002

    • Using distance functions, use Ascendant Hierarchical Clustering

    • Used CDDB database, compared co-occurrence vs correlation

    • Manually examined results

    • “70% of clusters had interesting similarities”


    Experimental results34 l.jpg
    Experimental Results Tzanetakis and Cook, 2002


    Challenges l.jpg
    Challenges Tzanetakis and Cook, 2002

    • Name format is not strictly enforced

      • The Beatles; Beatles, The; Beatles

    • Difficult to characterize the nature of the similarities

    • Cover songs can sound nothing alike


    Conclusions and future directions l.jpg
    Conclusions and Future directions Tzanetakis and Cook, 2002

    • “It seems that samples of Techno and Classical are easy to discriminate … Rock and Pop seems to be more difficult” – Soltau et al 1998

    • Manual classification not feasible

    • Why not combine prescriptive/emergent techniques?