analysis and digital implementation of the talk box effect
Download
Skip this Video
Download Presentation
Analysis and Digital Implementation of the Talk Box Effect

Loading in 2 Seconds...

play fullscreen
1 / 14

Analysis and Digital Implementation of the Talk Box Effect - PowerPoint PPT Presentation


  • 62 Views
  • Uploaded on

Analysis and Digital Implementation of the Talk Box Effect. Yuan Chen Advisor: Professor Paul Cuff. Introduction. What is a talk box? Allows a musician to add diction and intelligibility to an instrument’s sound Motivation? Popular as an analog device Application of signal processing

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Analysis and Digital Implementation of the Talk Box Effect' - apollo


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
introduction
Introduction
  • What is a talk box?
    • Allows a musician to add diction and intelligibility to an instrument’s sound
  • Motivation?
    • Popular as an analog device
    • Application of signal processing
  • Goals?
    • Analyze output
    • Digital implementation

Figure 1 – Talk Box

background speech and intelligibility
Background – Speech and Intelligibility
  • Human speech production of convolution between source and filter (1)
    • Not really time invariant
    • Only valid for voiced speech
  • Frequencies of formant peaks account for intelligibility of speech (Lingard, McLoughlin)
    • Most important are F2, F3 formants which occur in frequency band 800 Hz – 3 kHz
complex cepstrum
Complex Cepstrum
  • Formant peaks arise from , need a way to “deconvolve”
  • Intuitively source excitation varies quickly in frequency, vocal tract response varies slowly in frequency (Deller)
  • Complex Cepstrum (eq. 2) (Deller):
  • Apply a low quefrency lifter to separate source and filter
analysis results vowel sounds
Analysis Results – Vowel Sounds
  • Talk box most successfully impresses F2, F3 peaks
    • Relative Error in peak frequency: F1 – 19.6%, F2 – 9.33%, F3 – 6.22%
    • Error due to inability to replicate sound
  • For voice, ~90% of energy in 0 Hz – 1000 Hz
  • For talk box, ~10% of energy in 0 Hz – 1000 Hz
design overview
Design Overview
  • Problem definition:
  • Implement in MATLAB
vocal tract impulse response extraction
Vocal Tract Impulse Response Extraction
  • Calculate cepstrum (eq. 3):
  • Lifter: Eliminate all quefrency above cutoff nc (eq. 4)
  • From liftered cepstrum, invert to calculate impulse/frequency response (eq. 5):
impulse response preprocessing
Impulse Response Preprocessing
  • Calculated impulse response has too high low frequency (0 – 1000 Hz) magnitude
  • Different frames of speech have different energy levels
    • Speech input should not directly determine output amplitude
  • Normalize, preprocess in frequency domain (eq. 6):
synthesis
Synthesis
  • 50% overlap between successive frames
  • Define system response to be linear interpolation of vocal tract impulse responses in overlapping region (eq. 7):
  • α: relative index (eq. 8)
  • p: frame index (eq. 9)
synthesis1
Synthesis
  • From causality, output at time n0 depends only on input occurring no later than n0
  • From finite-length impulse response, output at time n0 depends only on input occurring no earlier than n0 – M + 1
  • Closed Form expression for y(n) (eq.11):
performance
Performance
  • F2, F3 peaks on vowel speech inputs:
    • Static implementation relative error: 3.0% F2, 3.5% F3
    • Dynamic implementation relative error: 3.7% F2, 3.2% F3
  • Qualitatively, output has similar intelligibility to analog talk box
  • Dynamic implementation can produce voiced non-vowel phonemes and whole words
    • Not always consistent, depends on alignment in time
performance issues
Performance Issues
  • Even with linearly-interpolated system impulse response, noticeable transitions between frames
  • Computationally expensive: 2 FFTs, 2 IFFTs per frame
    • In MATLAB, computation time takes longer than duration of the frame
  • Performance dependent on alignment of input signals
conclusions and further considerations
Conclusions and Further Considerations
  • Dynamic implementation closely models performance of analog talk box:
    • Can produce vowels and voiced phonemes
    • Real-time setup
  • Demonstrate possibility of fully digital implementation of talk box using speech input
  • Further considerations:
    • Improve transitions between frames
    • Decrease calculation time
    • Physical implementation
ad