260 likes | 485 Views
Music Information Retrieval Information Universe. Seongmin Lim hovern@snu.ac.kr Dept. of Industrial Engineering Seoul National University. contents. Brief history of MIR and state of research. Cross media retrieval supporting Natural language queries like mood, melody information.
E N D
Music Information RetrievalInformation Universe Seongmin Lim hovern@snu.ac.kr Dept. of Industrial Engineering Seoul National University
Brief history of MIR and state of research • Cross media retrieval supporting Natural language queries like mood, melody information. • Contain semantic information taken from community data bases • “A Music Search Engine Built upon Audio-based and Web-based Similarity Measures” • Query by Example • You have an example query having the same representation in the database. • For music search: humming, recorded by cell phones, microphones • “Music Structure Based Vector Space Retrieval”
Stages of First Paper • “A Music Search Engine Built upon Audio-based and Web-based Similarity Measures”
Stage 1: Preprocessing the Collection • Using information in the ID3 tag • Artist • Album • Title • all duplicates of tracks are excluded to avoid redundancies • Live or instrumentals of the same song removed
Stage 2: Web based features addition • Search on the web for • “artist”music • “artist”“album”music review • “artist”“title”music review –lyrics
Stage 2: Web based features addition (2) • Every term is weighted according to the term frequency ×inverse document frequency (tf×idf) function. w(t,m) of a term t for music piece m. N is the total number of documents.
Stage 3: Audio Based Similarity measures • For each audio track, Mel Frequency Cepstral Coefficients (MFCCs) are computed on short-time audio segments (called frames) • each song is represented as a Gaussian Mixture Model (GMM) of the distribution of MFCCs • Kullback-Leibler divergence can be calculated on the means and covariance matrices • A rank list of similar tracks is found based on this measure corresponding to each track
GMM(Gaussian Mixture Model) • a probabilistic model for representing the presence of sub-populations within an overall population • the mixture distribution that represents the probability distribution of observations in the overall population
Stage 4: Dimensionality Reduction • chi square test to distinguish the most similar terms using audio similarities • A is the number of documents in s which contain t • B is the number of documents in d which contain t • C is the number of documents in s without t • D is the number of documents in d without t • N is the total number of examined documents
Stage 5: Vector Adaptation • Smoothing for tracks where no related information
Querying the Music Search Engine • method to find those tracks that are most similar to a natural language query • extend queries to the music search engine by the word music and send them to Google • Query vector is constructed in the feature space from the top 10 pages retrieved • Euclidean distances are calculated from the collection tracks and a relevance ranking is got
Evaluating the System • to evaluate on “real-world” queries, a source for phrases which are used by people to describe music is needed • Tags provided by AudioScrobblergroundtruth is used • 227 tags are used as test queries
Goal of the evaluation • Goals • Effect of dimensionality on the feature space • Retrieving relevant information • Effect of re weighting of the term vectors • Effect of query expansion • Metrics used : precision values for various recall levels
Performance Evaluation -I • audio-based term selection has a very positive impact on the retrieval • setting 2/50 yields best results
Performance Evaluation -II • Effect of re weighting using various re weighting techniques • the impact of audiobased vector re-weighting is only marginal
System design of Second paper • “Music structure based vector space retrieval”
Stage 1: MUSIC INFORMATION MODELING • Music Segmentation by smallest note length • Cord modeling • Music region content modeling
Stage 2: MUSIC INDEXING AND RETRIEVAL • Harmony Event and Acoustic Event • each song’s cord and music region information is represented as a Gaussian Mixture Model (GMM) of the distribution of MFCCs • n-gram Vector • The harmony and acoustic decoders serve as the tokenizers for music signal • an event is represented in a text-like format
Summary • Natural query vs. query by example • Information from web and audio • Audio frame segmentation • KL divergence vs. vector space modeling • Analyzing audio features • Data itself vs. metadata • domain knowledge of music
End of Document Seongmin Lim hovern@snu.ac.kr Dept. of Industrial Engineering Seoul National University