140 likes | 263 Views
This paper presents a novel text-based search engine focused on lyrics, addressing the challenges of traditional search methods that rely on artist name, song title, and the lyrics themselves. It explores the need for thematic searches, allowing users to find songs based on themes not explicitly stated in the lyrics. Utilizing Probabilistic Latent Semantic Analysis (PLSA) and WordNet for query expansion, we enhance search capabilities and improve retrieval performance. Through comprehensive data collection and evaluation, we demonstrate the effectiveness of our approach and suggest future work on expanding the methodologies.
E N D
LyricSearch: A Text-Based Search Engine for Lyrics R96921033 徐兆良 R96942052 林佑璟
Outline • Introduction • System description • Data collection • Query Expansion – PLSA、WordNet • Evaluation • Conclusion • Future Work
Introduction • Traditional lyrics search method: • Artist • Title of song • Lyrics • If user wants a playlist for a wedding… • Search by theme • What if the terms not in the lyrics? • How to connect song’s theme and lyrics is the main issue
Framework Lyrics dataset Text Pre-processing Model Creation Vocabulary Term Inverted-file Stemming & Stop words removing Text Pre-processing Query Expansion Text Retrieval Query (theme) Ranking song list PLSA or WordNet Okapi BM25 Result Evaluation Ranking song list Evaluation scores Ground-truth
Data Collection • Data collection • Annotations from AMG • Lyrics from Web Lyrics Search Engine • Data statistics • 500 albums • 3267 songs • 67 themes
WordNet • A large lexical database of English • Synonyms • Hyperonyms • Antonyms • Query expansion • Find synset of the query terms • regret: repent rue ruefulness sorrow
PLSA - Query Expansion • Find top K most similar terms (KNN) • Fast search: KD-tree sun sky fly P(z|wi)
Evaluation – Query Expansion (1/2) • AP of each query (total 67 themes)
Evaluation – Query Expansion (2/2) • MAP (PLSA expansion VS random expansion)
PLSA result • The top 10 words in latent topics of PLSA Lyrics General Articles P(w|zi)
Conclusion • Most terms used in theme annotations do not appear in the corresponding lyrics • Query expansion is necessary • Query expansion with PLSA can improve the performance of lyrics search • Lyrics are often short and repeated, so there are few meaningful terms in each lyrics. • The concepts of PLSA are not obvious different • The performance of PLSA is not good enough
Future Work • Use different expansion methods and compare the evaluation results with PLSA • WordNet • WordNet + PLSA • Others? • Lyrics expansion • WordNet