1 / 15

A Full Frequency Masking Vocoder for Legal Eavesdropping Conversation Recording

A Full Frequency Masking Vocoder for Legal Eavesdropping Conversation Recording. R. F. B. Sotero Filho, H. M. de Oliveira ( qPGOM ), R. Campello de Souza Signal Processing Group, Federal University of Pernambuco – UFPE E-mail: rsotero@hotmail.com.br, { hmo , ricardo }@ ufpe . br.

latif
Download Presentation

A Full Frequency Masking Vocoder for Legal Eavesdropping Conversation Recording

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Full Frequency Masking Vocoder for Legal Eavesdropping ConversationRecording R. F. B. Sotero Filho, H. M. de Oliveira(qPGOM), R. Campello de Souza Signal Processing Group, Federal University of Pernambuco – UFPE E-mail: rsotero@hotmail.com.br, {hmo,ricardo}@ufpe.br

  2. Abstract: • New approach for a vocoder • Based on: full frequency masking by octaves • Useful to save bandwidth(applications requiring intelligibility) • Recommended for: legal eavesdropping of long conversations.

  3. Introduction • Vocoder= contraction from voice encoder: • waveform not recreate the original waveform in appearance, • (but it should be perceptually similar to it) • first described by Homer Dudley at Bell Telephone Laboratory in 1939 • Parameters are extracted from the spectrumand updated every 10-25 ms • Properties of voice: • limitation of the human auditory system • physiology of the voice generation process

  4. Psycho-Acoustics of the Human Auditory System •Frequency Masking: Masking in frequency or "reduced audibility of a sound due to the presence of another" •Insensitivity to the phase: The human ear has little sensitivity to the phase of signals

  5. Simplification of the spectrum via frequency masking For each voice segment:FFT of blocklength 160 (frame of 20 ms) The spectrum is segmented into regions of influence (octaves). The range 32 - 64 Hz is removed. 64 Hz-128 Hz, 128 Hz-512 Hz, and so on. Each spectral sample corresponds to a multiple of 50 Hz

  6. Table 1. Number of spectral lines per octave • (DFT of length N=160, sample rate 8 kHz) A total of 79 frequencies (DFT with N=160) is reduced to 4 survivors! (holding less than 5% of the spectral components).

  7. Figure 1. The spectrum of a voice frame computed by the FFT: Original spectrum Simplified full-masking spectrum This technique is called full frequency masking.

  8. Signal synthesis via spectral filling The beta distribution is a probability distribution defined over 0≤x≤1, characterized by a pair of parameters α and β : P(x)=1/B(α,β) x(α-1) (1-x)(β-1), 1<α,β<+∞, whose normalized factor is B(α,β)=(Γ(α)Γ(β))/(Γ(α+β)), where Γ(.) is the generalized Euler factorial function and B(.,.) is the Beta function. Figure 2.Envelope shape of survivor tone different parameters α and b.

  9. upper limit is equivalent to the difference between the normalized cutoff frequency exceeding (fM) and lower (fm) of each octave, i.e., fM - fm. By making the fitting: newmode= (α-1)/(α+β-2) (fM - fm)+ fm. To fulfill the spectral algorithm each frame: P(x)= 1/(fM - fm)(α+β-2) (x- fm)(α-1) (fM -x)(β-1).

  10. Figure 3. Full masking and spectral filling ( – piece of speech from radio) (vocoderwith Hamming windowing)  A few audio files generated by this vocoder are available at the URL http://www2.ee.ufpe.br/codec/vocoder.html

  11. Quantization and Coding of Speech Signals The maximum excursion of the full-spectrumwas divided into 256 intervals of equal length, each represented by one byte. No negative samples to be quantized => the quantizer cannot be bipolar. • Table 2. Bit allocation in a voice frame (20 ms). • The required number of bits is expressed as A + P, where A is the number of bits for spectral line amplitude and P the number of bits to express the relative position within the octave.

  12. Each voice frame needs 50 bits (18 for identifying positions and 32 for identifying masking tones), The vocoder rate is 50 bits/20 ms=2.5 kbps The binary format .voz The representation of a voice frame in this format (extension .voz): The 50 bits are distributed into four sub blocks, indicating the value of the spectral sample followed by its respective position in the spectrum. The voice files registered in the .wav format are converted to this binary format, by a Matlab routine.

  13. Figure 4. Frame of files in the format .voz (20 ms). Intelligibility and voice quality versus bit rate Voice quality is estimated using the "Mean Opinion Score (MOS)" • Table 3. MOS scores for the voice signals synthesized by four different techniques

  14. Conclusions New vocoder: voice signal using fewer samples of the spectrum. Voice(acceptable quality) at a rate of a few kbits/s. A new technique of spectral filling: not helpful in improving the voice quality, but naturalness APPLICATIONS • maintenance voice channels in large plants • speaker recognition system • monitoring voice conversation from authorized eavesdropping • THAT’S ALL FOLKS! TKS.

  15. Pre-signal processing • Shannon sampling theorem (a signal band limited to fm Hz is sampled at a rate of at least 2fm equally spaced samples per second). LPF. • Voice Segmentation and Windowing • partition of the speech signal into pieces (stationary frames): (~10 - 40 ms). • Hamming window chosen due to softness at the edges. • Pre-emphasis • -6dB/octave, radiated from the lips during speech. This spectral distortion can be eliminated by applying a filter response approximately +6 dB/octave • y(n)= x(n)-a.x(n-1), for 1 ≤ n < M, where M is the number of samples of x(n), y(n) is the emphasized signal and the constant "a" is normally set 0.95.

More Related