Automatic Genre Classification of Music Content [A survey]

Automatic Genre Classification of Music Content[A survey] Nicolas Scaringella, Giorgio Zoia, Daniel Mlynek, IEEE SIGNAL PROCESSING MAGAZINE MARCH 2006 By Yi-Tang Wang

Outline • Introduction • Feature extraction techniques • Genre classification paradigms • Classification results • Future directions & Conclusion

Introduction • EMD (electronic music distribution) • Restoration of analog archives • New content • music catalogues become huge • What do you want to listen ? • 1 million tracks online • Efficient ways to browse & organize

Introduction (cont.) • Music Genres • Categories to characterize similarities • Boundaries are fuzzy • Automatic Classification • Finding a taxonomy • Hierarchical set of categories • Nontrivial task

Critical issues • Artists, Albums, or Titles • One song to one genre(?) • Albums - heterogeneous material • Artists - several albums • Same Titles? • Nonagreement on Taxonomies • Allmusic, Amazon, Mp3 [2] F. Pachet and D. Cazaly, “A taxonomy of musical genres,” in Proc. Content- Based Multimedia Information Access (RIAO), Paris, France, 2000

Critical issues (cont.) • ILL-Defined Genre Labels • Varied criteria (geographically, timely, etc) • Dependant on cultural • Scalability of genre taxonomies • New genres appear frequently • Merging or splitting • Automatic system

Feature extraction techniques • High-level model • Event-like format (MIDI) • Symbolic format (MusicXML) • Rarely availiable • Low-level • Audio samples • Low level and low density of info • Do feature extraction • Timbre, Melody, Harmony, Rhythm

Timbre • Same pitch and loudness but sound different • Features to characterize timbre • Temporal features • Energy features • Spectral shape features • Perceptual features • Some have been normalized in MPEG-7

Timbre (cont.)

Timbre (cont.) • Transformations • new feature or increase dimensionality • Suggested transforming into logarithmic decibel scale • Texture window • Larger window • Reduce computation • Increase classification accuracy • 1s • Variant size and positions

Timbre (cont.) • Texture model • model of features over texture window: • 1) simple modeling with low-order statistics • 2) modeling with autoregressive model • 3) modeling with distribution estimation algorithms (for example, EM estimation of a GMM of frames)

Melody & Harmony • Melody • succession of pitched events • Horizontal element • Harmony • pitch simultaneity, chords • Vertical element

Melody & Harmony (cont.) • Pitch function • Characterizing pitch distribution • Amplitude, position of main peak, … • Unfolded • Contains pitch content and info of its range • Folded • Mapped to a single octave • Harmonic content

Rhythm • No precise definition • Generically, all of the temporal aspects • Periodicity function • Low level approach as pitch function • 1) tempo: periodicities typically in the range 0.3–1,5s (i.e., 200–40 bpm) • 2) musical pattern: periodicities between 2 and 6 s (corresponding to the length of one or more measure bar) • Gouyon et al. get MFCCs-like descriptors

Extracting from segments • Small segment may contain sufficient information • Reduced required computation • Typically 30s segment • and 30s after beginning • Artist classification • Voice is easier to identify than music only

Local conclusion • High level descriptors from polyphonic audio signal is not yet state of the art • Focus on timbre modeling • Timbre may contain sufficient info • 250ms : 53% , 3s : 72% • Among 10 genres

Local conclusion (cont.) • Another point of view (pessimistic) • Timbre similarity measure & 20,000 titles distributed over 18 genres • Little correlation • May not scalable • Take cultrual features into account

Genre classification • Expert systems • Unsupervised approach • clustering • Supervised approach • Machine learning algorithms

Expert systems • A knowledge based system made up of a set of rules • No model based on it so far • Expensive to implement and maintain • May yield unexpected interactions

Expert systems (cont.) • Pachet and Cazaly’s work • State differences with language based, e.g. instrumentation

Unsupervised approach • Clustering with similarity measures • Similarity measures • If time invariant • Euclidean distance or cosine distance • Otherwise • Build statistical model (Gaussian or GMMs) • Kullback-Leibler divergence, relative entropy • Sampling, Earth’s mover distance, asymptotic likelihood approximation • Shao et al. use HMMs

Unsupervised approach • Clustering algorithms • K-means • Shao et al.’s work • agglomerative hierarchical clustering • SOM (self-organizing map) • Artificial neural network • High dim onto lower dim • GHSOM (growing hierarchical SOM) • Rauber et al.

Supervised approach • A taxonomy of genres is given • VS. Expert System • No rules (or description to genre) • Supervised machine learning algo • KNN (K-Nearest Neighbor) • GMMs (Gaussian Mixture Models) • HMM (Hidden Markov Models) • LDA (Linear Discriminant Analysis) • SVMs (Support Vector Machines) • ANNs (Artificial Neural Networks)

Classification results • MIREX genre classification contest • 1,005 / 510 songs over ten genres • 940 / 447 songs over six genres

Classification results

Future directions • Classification into perceptual categories • Moods, emotions • Novelty Detection • New or unknown data (not belong to any class) • Classification with multiple labels • Probably closer to human experience • From taxonomies to folksonomies • Does the taxonomy fit to users

Conclusion • Definitions of music genres are convoluted • Features → classification → result • Research is evolving from purely objective machine calculations to techniques • Machine learning plays a fundamental role in classification domains

Thank You

Automatic Genre Classification of Music Content [A survey]