1 / 35

MIR: Status and Trends 音樂資訊檢索的現況與未來

MIR: Status and Trends 音樂資訊檢索的現況與未來. J.-S. Roger Jang ( 張智星 ) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan http://www.cs.nthu.edu.tw/~jang. Outline. Intro. to music information retrieval (MIR) Our work on MIR (with demos) Query by singing/humming (QBSH)

donald
Download Presentation

MIR: Status and Trends 音樂資訊檢索的現況與未來

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MIR: Status and Trends音樂資訊檢索的現況與未來 J.-S. Roger Jang (張智星) Multimedia Information Retrieval Lab CS Dept., Tsing Hua Univ., Taiwan http://www.cs.nthu.edu.tw/~jang

  2. Outline • Intro. to music information retrieval (MIR) • Our work on MIR (with demos) • Query by singing/humming (QBSH) • Singing voice separation • Conclusions

  3. Types of MIRSystems • Text-based MIR • Text input • 歌名、歌手、歌詞、作詞者、作曲者 • Metadata: 類別、情緒、口水歌 • Content-based MIR • Symbolic input • Music score info: 音符、節拍、和弦等 • Acoustic input • By example: 原曲輸入 • By humans: 哼唱、口哨、敲擊、鼓聲

  4. Span of MIRResearch • Content analysis • Audio music • Low-level feature extraction • High-level feature representation • Symbolic music • High-level feature representation • Retrieval methods • Text-based information retrieval • Data clustering • Pattern recognition • Distance measures

  5. MIR Methods for Audio Music • Audio features • Low-level features • MFCC, spectral flux, rolloff freq, … • High-level features • Pitch, onset, beat, tempo, chord, key, … • Vocal extraction • Others • Collaborative filtering • Retrieval methods • Clustering • K-means, VQ, hierarchical clustering • Classification • SVM, GMM, LSA, HMM, ANN… • Distance measure • DTW, KL, cosine similarity, edit distance • Others: Learning to rank

  6. MIR Major Events • ISMIR/MIREX • Int. Sym. on music information retrieval, since 2000 • Music Information Retrieval Evaluation eXchange, since 2005 • ICMC • Int. Computer Music Conference, since 1974 • ICASSP • Int. Conf. on Acoustics, Speech, and Signal Processing , since 1976

  7. ISMIR Growth: 2000-2009

  8. ISMIR Locations 2003, Baltimore 2005, London 2006, Victoria 2008, Philadelphia 2009, Kobe 2007, Vienna 2002, Paris 2004, Barcelona 2001, Bloomington 2000, Plymouth

  9. State-of-the-Art MIR: Tasks at MIREX • Audio music • High-level feature identification • Audio onset detection • Audio beat tracking • Audio tempo extraction • Audio key detection • Audio chord estimation • Multiple fundamental frequency estimation & tracking • Audio structural segmentation • Classification • Artist • Genre • Mood • Retrieval • Audio cover song identification • Audio tag classification • Audio music similarity and retrieval • Alignment • Real-time audio to score Alignment (a.k.a score following) • Symbolic music • Symbolic melodic similarity • Symbolic music similarity and retrieval • Hybrid • Query by singing/humming • Query by tapping

  10. MIREX: 2005 - 2008

  11. Our Work on MIR • QBSH: Query by Singing/Humming (哼唱檢索) • Singing voice separation (人聲抽取) • Audio melody extraction(主旋律抽取)

  12. Introduction to QBSH • QBSH: Query by Singing/Humming • Input: Singing or humming from microphone • Output: A ranking list retrieved from the song database • Overview • First paper: Around1994 • Extensive studies since 2001 • State of the art: QBSH tasks at ISMIR/MIREX

  13. Challenges in QBSH Systems • Reliable pitch tracking for acoustic input • Input from mobile devices or noisy karaoke bar • Song database preparation • MIDIs, singing clips, or audio music • Efficient/effective retrieval • Karaoke machine: ~10,000 songs • Internet music search engine: ~500,000,000 songs

  14. QBSH: Goal and Approach • Goal: To retrieve songs effectively within a given response time, say 5 seconds or so • Our strategy • Multi-stage progressive filtering • Indexing for different comparison methods • Repeating pattern identification

  15. Flowchart of QBSH • Two steps • Pitch tracking • Comparison methods

  16. Frame Blocking for Pitch Tracking Overlap Zoom in 256 points/frame 84 points overlap 11025/(256-84)=64 pitch/sec Frame

  17. ACF: Auto-correlation Function 1 128 Frame s(n): Shifted frame s(n-h): h=30 acf(30) = inner product of overlap part = dot(abs(s(30:256), s(1:227)) Pitch period acf(h): 30

  18. Frequency to Semitone Conversion • Semitone : A music scale based on A440 • Reasonable pitch range: • E2 - C6 • 82 Hz - 1047 Hz ( - )

  19. Example of Pitch Tracking

  20. Typical Result of Pitch Tracking Pitch tracking via autocorrelation for茉莉花 (jasmine)

  21. Comparison of Pitch Vectors Yellow line : Target pitch vector

  22. Scale the query linearly to match the candidate A typical example of linear scaling Linear Scaling (LS)

  23. Linear Scaling (LS) • Characteristics • One-shot for dealing with key transposition • Efficient and effective • Some indexing methods • Cannot deal with large tempo variations • #1 method for task 2 in QBSH/MIREX 2006 • Typical mapping path

  24. DTW Path of “Match Beginning”

  25. DTW Path of “Match Anywhere”

  26. DTW Path of “Match Anywhere”

  27. QBSH at MIREX 2006 • 比賽方式:由主辦單位來測試每一個參賽團隊之程式碼的辨識效能。參加隊伍來自全球各地,包含澳洲、德國、法國、芬蘭、台灣、烏拉圭、荷蘭、中國等。 • 語料: • 人聲哼唱的測試資料包含 2797 首 wav 檔案(長度8秒,8KHz/8Bit),118 人所錄製,含 48 首兒歌,可自由下載。 • 歌曲資料庫包含 2048 首單音的 midi 檔案,除前述48首兒歌外,其餘歌曲由主辦單位提供,不公開。 • 評比項目: • 以 2797 wav 檔案為輸入來檢索 2048 midi 檔案:評比標準為 mean reciprocal rank,我們達到 0.883(第三名,全球共有13隊參賽) • 以 2797 wav 檔案為輸入來檢索其他 2797 wav 檔案:評比標準為 mean precision,我們達到 0.926(第一名,全球共有10隊參賽)

  28. Demos of QBSH • Real-time pitch tracking demo • SAP toolbox (http://mirlab.org/jang/matlab/toolbox/sap) • goPtbyAcf.mdl • Demo of QBSH • http://mirlab.org/new/mir_products.asp#miracle • Most successful QBSH application • http://www.midomi.com

  29. Singing Voice Separation • Characteristics • Easier on karaoke stereo songs • Harder for monaural polyphonic songs • Important step for a number of MIR applications • Demo clips • http://sites.google.com/site/unvoicedsoundseparation/

  30. On-going Research at AIST, Japan • Systems for listening to singing voices • LyricSynchronizer: Automatic sync. of lyrics with polyphonic music recordings • Singer ID: Singer identification • MiruSinger: Singing skill visualization/training • Hyperlinking Lyrics: Creating hyperlinks between phrases in song lyrics • Breath Detection: Automatic detection of breath sounds in unaccompanied singing voice

  31. On-going Research at AIST, Japan (II) • Systems for music information retrieval based on singing voices • VocalFinder: Music information retrieval based on singing voice timbre • Voice Drummer: Music notation of drums using vocal percussion input • Systems for singing synthesis • SingBySpeaking: Speech-to-singing synthesis • VocaListener: Singing-to-singing synthesis

  32. The Grand Challenges of MIR • Polyphonic audio music transcription • Analogy to the problem of image understanding over semitranslucent overlayed images • 困難度如同觀察水波而得知烏龜或青蛙游過

  33. Conclusions • MIR research is on the rise! • MIR research over audio music (which account for 86% of MIREX tasks from 2005~2008) • High-level feature identification • Applications to genre/mood/tag classification/retrieval • Preexisting approaches shed lights on MIR. • Speech recognition/synthesis • Text information retrieval • Music theory

  34. References • J. S. Downie, D. Bryd, T. Crawford, “Ten Years of ISMIR: Reflections on Challenges and Opportunities”, Keynote talk, Kobe, ISMIR 2010. • M. A. Casey, R. Veltkamp, M. Goto, M. Leman, C. Rhodes, and M. Slaney, “Content-Based Music Information Retrieval: Current Directions and Future Challenges”, Proceedings of IEEE, Vol. 96, No. 4, April 2008. • J.-S. R. Jang and H.-R. Lee, "A General Framework of Progressive Filtering and Its Application to Query by Singing/Humming", IEEE Transactions on Audio, Speech, and Language Processing, No. 2, Vol. 16, PP. 350-358, Feb 2008. • Z.-S. Chen, and J.-S. R. Jang, "On the Use of Anti-word Models for Audio Music Annotation and Retrieval", IEEE Transactions on Audio, Speech, and Language Processing, 2009. • C.-L. Hsu and J.-S. R. Jang, "On the Improvement of Singing Voice Separation for Monaural Recordings Using the MIR-1K Dataset", IEEE Transactions on Audio, Speech, and Language Processing, 2009. • Masataka Goto, Takeshi Saitou, Tomoyasu Nakano, and Hiromasa Fujihara, “Singing Information Processing Based on Singing Voice Modeling”, PP. 5506-5509, ICASSP 2010.

More Related