1 / 67

Sound Source Separation using 3D Correlogram, Fuzzy Logic, and Neural Networks

Sound Source Separation using 3D Correlogram, Fuzzy Logic, and Neural Networks. A RESEARCH PROJECT Eduardo Dias Trama. Table of Contents. INTRODUCTION PROJECT OVERVIEW THE PREPROCESSOR THE LEARNING PROCESSOR THE SEPARATION PROCESSOR PROJECT EXPERIMENTS CONCLUSION. INTRODUCTION.

penney
Download Presentation

Sound Source Separation using 3D Correlogram, Fuzzy Logic, and Neural Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sound Source Separation using 3D Correlogram,Fuzzy Logic, and Neural Networks A RESEARCH PROJECT Eduardo Dias Trama

  2. Table of Contents • INTRODUCTION • PROJECT OVERVIEW • THE PREPROCESSOR • THE LEARNING PROCESSOR • THE SEPARATION PROCESSOR • PROJECT EXPERIMENTS • CONCLUSION

  3. INTRODUCTION • Overview of sound source separation • Sound separation methods • Related applications of sound separation

  4. Overview of sound source separation • What is sound separation? • Psychoacoustic properties • Timbre • How can sound be modeled?

  5. Sound separation methods • CASA (Computational Auditory scene Analysis), Marrian • Spatial and Periodicity-and-Harmonicity • CASA: 3D Correlogram analysis • Blind source separation and prediction-driven

  6. Related applications of sound separation • Sound and voice recognition • Noise removal • Compression

  7. PROJECT OVERVIEW • Overview • Auditory model analysis • Sound data library and classification • Sound data matching • Complete sound separation system

  8. Overview • What is a piano sound? • Memory • Clustering

  9. Auditory model analysis • Properties • Grouping • Past knowledge • Correlation

  10. Sound data library and classification • Sound memory • How much information is needed for later analysis? • Does it matter if audio data is compressed? • Structure of classification

  11. Sound data matching

  12. Complete sound separation system

  13. THE PREPROCESSOR • The Cochlea Filter Model • Correlogram • 3-D Correlogram

  14. The Cochlea Filter Model • Filtering: basilar membrane (BM) • Detection: inner hair cell (IHC) • Compression: automatic gain control (AGC) • Cochleagram

  15. Lyon cochlear model

  16. Correlogram • Short time auto-correlations of the neural firing rates as a function of cochlear place (best frequency) versus time • Correlogram movie

  17. Correlogram • Speech processing • Extract the formants of voiced and unvoiced sounds • Short duration • Auto-correlation window size Window size

  18. Correlogram Frame • Vertical axis shows low to high frequencies from bottom to top • Horizontal axis represents the lag or time delay

  19. Correlogram Frame • Dark areas in the image show activity in the Correlogram frame • Vertical lines: cochlear channels firing in the same period

  20. Correlogram Frame • Horizontal bands are indicators of large amounts of energy within a frequency band

  21. Slaney, Lyon structure to compute a Correlogram

  22. 3-D Correlogram • A series of Correlograms over time • Frequency information comes from a cochlea filter bank • A finite time/frequency analysis • It depends on the initial time

  23. Daniel Ellis signal-processing front-end implementation

  24. THE LEARNING PROCESSOR • Creating the network input • Classification • Artificial neuron network fuzzy classification

  25. Creating the network input • Responsible for learning each Correlogram frame of a selected sound • It should be exposed to many small variations of the target (selected) sound • The total number of neural nets (NN) is: NN = FB x CF

  26. Signal path to the network input

  27. Class Family Length Frequency range Number of Correlogram frames Sufficient to classify one particular sound Make the matching process faster Intensive parallel processing Classification

  28. Figure of a parallel neural network classification

  29. Artificial neuron network fuzzy classification • Fuzzy IF-THEN rules to describe a classifier • An adaptive-network-based fuzzy classifier to solve fuzzy classification problems • ANFIS (adaptive-network-based fuzzy inference system)

  30. Block diagram of a general fuzzy inference system

  31. THE SEPARATION PROCESSOR • Choosing method for sound matching • The Matching Fuzzy Logic sound library • Sound separation

  32. Choosing method for sound matching • Preamble, search, matching and interpolation • Target and precision • Fuzzy clustering algorithms

  33. The Matching Fuzzy Logic sound library • A set of fuzzy sound elements will be used for matching (FIS) • The initial values for search need to be determined by external inputs • ANFIS (Adaptive Neuro-Fuzzy Inference Systems)

  34. Sound separation • Search, match and extract • Step 1: Input process • Step 2: Classification • Step 3: Choosing what to separate • Step 4: Dynamics and pitch extraction • Step 5: Re-synthesis

  35. Step 1: Input process • Analog to digital conversion • Cochlea filter bank • Cochleagram • Correlogram frames • Neuro-Fuzzy input matrix

  36. Step 2: Classification

  37. Step 3: Choosing what to separate • Rule 1: Assume that human auditory system can recognize one or more sounds from the audio input mixture • Rule 2: One recognizable audio should be selected for separation • Rule3: Assume that complete or partial information of selected audio class must exist in sound library

  38. Step 4: Dynamics and pitch extraction

  39. Step 5: Re-synthesis • Re-synthesis of selected sound Correlogram frames at unit pitch • Apply dynamics to each Correlogram frame • Correlogram frame inversion

  40. PROJECT EXPERIMENTS • Experiment setup • Experiment procedures • Experiment results

  41. Experiment setup

  42. Experiment procedures • Recorded wave data:5 sec. @ 44100 Hz sample rate, 16 bits resolution, and two channels (stereo) • Down-sampled to 11025 Hz to one channel • Mixed combinations without delay • Mixed combinations with 0.5 sec. delay

  43. Experiment results • Single Sound Source • Two sound source without delay • Two sound source with delay • Modeling ANFIS for Correlogram frames • Correlogram frame channel training (classification) • Correlogram frame channel evaluation (matching)

  44. Single Sound Source

  45. Single Sound Source

  46. Single Sound Source

  47. Two sound source without delay

  48. Two sound source without delay

  49. Two sound source without delay

  50. Two sound source with delay

More Related