1 / 14

Data Mining and Text Analytics in Music Audi Sugianto and Nicholas Tawonezvi

Data Mining and Text Analytics in Music Audi Sugianto and Nicholas Tawonezvi. Overview. Introduction Building a ground truth set Experiments Results. Introduction. Purpose: Music mood classification through lyric text mining approaches MIR (Music Information Retrieval)

Download Presentation

Data Mining and Text Analytics in Music Audi Sugianto and Nicholas Tawonezvi

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Mining and Text Analytics in Music Audi Sugianto and Nicholas Tawonezvi

  2. Overview • Introduction • Building a ground truth set • Experiments • Results

  3. Introduction • Purpose: Music mood classification through lyric text • mining approaches • MIR (Music Information Retrieval) • Use of Audio Datasets: • AMC (Audio Mood Classification) • USPOP, USCRAP, etc. • Use of Social tags from last.fm • Challenges: • Natural subjectivity of music • Human perspectives on mood

  4. Generating Ground Truth Data Collection • Combination of in-house and public audio tracks • Collect songs with at least one social tag from last.fm • Lyrics can be gathered from mainly Lyricwiki.org. • Use of Lingua to ensure data quality • Finalise songs that have both correct lyrics and tags

  5. Generating Ground TruthAlgorithms, Resources and Techniques • WordNet-Affect • Used to filter out junk tags • Assignment of labels to concepts (emotions, • moods, responses) • Use of human expertise to identify mood-related • words in the music domain • Affective Aspect • Judgemental Tags • Ambiguous Meanings • Use of WordNet to categorise into groups based • on synonyms. • Use of music experts to merge groups by musical similarity

  6. Generating Ground TruthSelecting Songs • Approaches: • Tag identification • Lyric counts • Multi-label Classification

  7. Mood Categories and Song Distributions

  8. ExperimentsEvaluation Measures and Classifiers • Use of 10-fold Cross Validation • Break data into 10 sets of size n/10. • Train on 9 datasets and test on 1. • Repeat 10 times and take a mean accuracy. • Classification with Support Vector Machines (SVM) • Algorithms to analyse data and recognise patterns

  9. ExperimentsLyric Preprocessing • Facts: • Repetitions of words and sections: • - Lack of verbatim transcripts • Consisting of sections: • Intro, interlude, verse, etc. in the annotations • Notes about song and instrumentation • Possible solution: • Identifying and converting repetition and annotation • patterns to actual repeated segments

  10. ExperimentsLyrics Features • Common text classification tasks: • Bag-of-words (BOW) • Collection of Unordered words • Part-of-Speech (POS) • Use of Stanford Tagger • Function Words (the, a, etc.) • Assigning of values: • Frequency • Tf-idf weight • Normalised-frequency • Boolean Value

  11. ExperimentsStemming • Stemming – Merging words with same morphological roots • Snowball Stemmer • Irregular nouns and verbs as inputs

  12. Results • Text categorisation provides dimensionality and good • generalisability POS Boolean representation is poorer • because of high content of POS types in lyrics • Content words are more useful in mood classification • 10th International Society for Music Information Retrieval Conference (ISMIR 2009)

  13. Acknowledgement Hu, X. et al. 2009. Lyric Text Mining in Music Mood Classification. International Music Information Retrieval Systems Evaluation Laboratory University of Illinois at Urbana- Champaign. [Online]. Pp.411-416. [Accessed 6 December 2013]. Available fromː http://ismir2009.ismir.net/proceedings/PS3-4.pdf Training and Testing Data Sets. 2013. Training and Testing Data Sets. [Online]. [Accessed 5 December 2013].Available from: http://technet.microsoft.com/en-us/library/bb895173.aspx. Kohavi, Ron (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection.Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence2 (12): 1137–1143.(Morgan Kaufmann, San Mateo, CA) D. Ellis, A. Berenzweig, and B. Whitman: The USPOP2002 Pop Music Data Set. Available fromː http://labrosa.ee.columbia.edu/projects/musicsim/uspop2002.html.

  14. Software & Additional Resources http://www.music-ir.org/mirex/2007/index.php/AMC http://en.wikipedia.org/wiki/MoodLogic http://search.cpan.org/search%3fmodule=Lingua::Ident – Statistical language identifier http://snowball.tartarus.org/ http://www.englishpage.com/irregularverbs/irregularverbs.htm - irregular verb list http://www.esldesk.com/eslquizzes/irregular-nouns/irregular-nouns.htm - irregular noun list http://nlp.stanford.edu/software/tagger.shtml - http://www.music-ir.org/mirex/2007/abs/AI_CC_GC_MC_AS_tzanetakis.pdf - POS Tagger http://www.music-ir.org/archive/figs/18moodcat.htm - Mood Categories & Song Distributions http://www.originlab.com/index.aspx?go=Products/Origin/Statistics/Nonparametric Tests&pid=1087 – Performance identifier

More Related