1 / 19

WP4 – Sound Object Representation

WP4 – Sound Object Representation. Enabling Access to Sound Archives through Integration, Enrichment and Retrieval. Introduction to Workpackage-Overview. Objectives: How to represent audio for the purposes of efficient querying. Segmentation of audio streams.

Download Presentation

WP4 – Sound Object Representation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. WP4 – Sound Object Representation Enabling Access to Sound Archives through Integration, Enrichment and Retrieval

  2. Introduction to Workpackage-Overview • Objectives: • How to represent audio for the purposes of efficient querying. • Segmentation of audio streams. • Distinct objects may then be recognized using musical instrument identification and speaker identification techniques . • Identification of higher level features • Speech related- Gender, Emotion, Laughter and Language • Music related- tempo, beat detection, rhythm… • Tasks: • T 4.1 Audio stream segmentation- Speech/music separation… • T 4.2 Source separation- Instrument Identification, Speaker Identification • T 4.3 Sound object identification • T4.5: Transcription • Music transcription • High level speech phonetics & characteristics

  3. Deliverables and Milestones • Deliverables • D4.1 Prototype segmentation, separation and speaker/instrument identification system (Month 14) • D4.2 Prototype transcription system (Month 27) • D4.3 Final report on sound object representations (Month 30) • Milestones and expected result • M4.1- Month 6: Speech/music separation methods implemented and tested • M4.2 - Month 10: Initial results on identification of sound objects, prototype segmenter and separator • M4.3 – Month 18: Identification of speech characteristics from segmented, separated audio streams • M4.4 – Month 24: Transcription of monophonic music from segmented, separated audio streams • M4.5 – Month 28: Testing and evaluation of complete system

  4. Workpackage Progress – Speech Related • Prototype for speaker segmentation is ready. • Preliminary prototype for SID is ready. • Pre-processing module implemented for ED and SID: Energy based Voice Activity Detector. • ED, Laughter DLL is ready (NICE’s API). • LID algorithm evaluated on English UK corpus. We got (achieved ?) over 85% accuracy (explain more this point ?). • Trained on a testbed representing atleast 10 (European) languages • On going research on speaker identification (outlier detection and exclusion, how to deal with multi-speaker?).

  5. Contributions and Connections with Other Workpackages • This WP provides many inputs to other WPs and relies on few outputs from other WPs • WP2 • The sound objects extracted in WP4 populate the ontology devised in WP2 • WP3 • Sound object recognition used to enable enhanced retrieval • Retrieval of speakers • Retrieval of key speech and music features • WP5 • Sound objects used both in archiving and as access tools • Source separation • Audio enhancement

  6. Upcoming Work Plan Months 12-24 – Speech Related • Speaker Identification • Retrieval of speakers (for use in WP3) • Research on Outlier detection and exclusion • Research on new scoring methods • How to Deal with Multiple Targets in Speaker Identification? • ED, Laughter and Gender • VAMP API • On going research on robust methods. • LID • Build robust model for English UK and implementation.

  7. DemonstrationSpeaker Identification

  8. DemonstrationSpeaker segmentation

  9. Music Transcription • Reasonable accuracy detection in: • Onset detection • Tempo detection • Key detection • Monophonic pitch detection • Unsolved or unexplored research areas: • Ornamentation detection • Time signature detection • Segmentation: • Bar line detection • Music Structure Detection

  10. ROLL CUT STRIKE Music Transcription: Ornamentation detection Gainza, M. and E. Coyle. Automating Ornamentation Transcription. In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '07)

  11. Music Transcription: Time Signature Detection • Music is highly repetitive: chorus, phrases, bars… • The method utilises a multi-resolution audio similarity matrix to detect repetitive musical bars by building templates of time signature candidates • The method only depends on musical structure, and does not depend on the presence of percussive instruments or strong musical accents

  12. Music Transcription: Time Signature Detection Gainza, M. and E. Coyle. Time Signature Detection by Using a Multi-Resolution Audio Similarity Matrix. In Audio Engineering Society 122nd Convention. 2007. Vienna.

  13. Bar line prediction Bar length Bar line aligment Song [p1, p2... pn] ASM Anacrucis [b1, b2... bn] Onset detector Music Transcription: Bar line Segmentation • Detects the musical bar length and the anacrusis using Audio Sim. Matrix • Predicts and aligns the position of future bars by using an Onset Detector Gainza, Mikel; Barry, Dan ; Coyle, Eugene Automatic Bar Line Segmentation. In Audio Engineering Society 123nd Convention, New York, 2007

  14. Anacrucis Bar length Music Transcription: Bar line Segmentation

  15. Azimugram S A,T N basis func B1,T Segments Song ADDRESS PCA ICA Orthogonality enforcement Music Transcription: Music Structure Segmentation • There are many mid-level representations: spectrogram, chromagram, MFCC… • Novel mid-level representation: Azimugram time-azimuth representation of a stereo field • System based on the assumption that each section type (e.g: chorus) have a unique source location-intensity profile.

  16. Intro Verse Chorus Music Transcription: Music Structure Segmentation Audio Signal Azimugram Segmentation Barry, Dan; Gainza, Mikel; Coyle, Eugene. Music Structure Segmentation using the Azimugram in conjunction with Principal Component Analysis. In AES 123nd Convention, New York, 2007

  17. Upcoming Work Plan Months 12-24 • Assess the robustness of the ornamentation detector for a variety of instruments • Dynamically adapt time signature and bar line detections to tempo variations • Assess the best mid-level representation for music segmentation • Combine the music structure and bar line segmentation systems. Thus, a segment is aligned to the bar lines • Incorporate knowledge of music structure (e.g.: 8 bars per section…) • Migrate all MATLAB applications to C++

  18. ALL - Workpackage progress Silence to silence segmentation – ALL • Start – stop segmentation • Threshold algorithm – ALL use this, it is sufficient for speech wave energy under the threshold value is silence • Multi-threshold there are different threshold values for different situations • Trained HMMmanually segmented sample for the training Usage • Preparation phase for the manual segmentation of the training corpus

  19. ALL - Workpackage progress Speech – non speech segmentation – ALL • Trained HMM with gaussian mixture distribution • Trained for: • Speech • Music • Singing • Whistle • …. • Using 26 dimensions MFCC feature vectors Usage • speech – non-speech segmentation filters the input for the speech recognition

More Related