1 / 24

What is speech coding?

EE2F1 Multimedia (1): Speech & Audio Technology Lecture 9: Speech Coding Martin Russell Electronic, Electrical & Computer Engineering School of Engineering The University of Birmingham. What is speech coding?. Digitisation of speech for transmission or storage

urian
Download Presentation

What is speech coding?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EE2F1Multimedia (1): Speech & Audio TechnologyLecture 9: Speech CodingMartin RussellElectronic, Electrical & Computer EngineeringSchool of EngineeringThe University of Birmingham

  2. What is speech coding? • Digitisation of speech for transmission or storage • Aim to minimise bits per second (bps)… …while preserving speech quality: • intelligibility and naturalness • Main kinds of speech coding scheme: • waveform coder • vocoder

  3. Approaches • Waveform coding • Work for all audio signals • Generic methods for bit reduction • Exploit properties of human hearing • Vocoders • Optimised for speech coding • Assume that the signal to be encoded is speech

  4. Waveform coders • PCM (Pulse Code Modulation) • DPCM (Differential PCM) • ADPCM (Adaptive Differential PCM) • Delta modulation (1 bit ADPCM)

  5. Pulse Code Modulation (PCM) Quantization error • How many quantization points? • How many samples per second (sample rate)? Sample point Quantization point

  6. Differential PCM Quantization error • Encode the differences between values at successive quantisation points Sample point Quantization point

  7. Adaptive DPCM • Use small number of bits to encode differences in DPCM • Adjust quantisation step size to accommodate large changes in the signal

  8. Delta Modulation • 1 bit ADPCM • Sequence of ‘all 1s’ or ‘all 0s’ indicates need to change step size 1 1 1 0 0 0 0 0 ‘Slope Overload’ indicated by excessive use of1s or 0s

  9. Waveform coding summarised • PCM, with 8 bits per sample (amplitude compression) and 8 kHz sampling rate, gives a bit rate of 64 kbps • DPCM (aka. Delta PCM), difference between samples needs fewer bits for same accuracy • Adaptive DPCM, scaling of bits varied, depending on dynamic range • Delta modulation = 1-bit DPCM • can adapt step size to avoid slope overload • gives reasonable intelligibility at just 16 kbps

  10. Vocoders • Coders designed specifically for speech • Sometimes called analysis-synthesis coders • Exploit source-filter model of speech

  11. Vocoders • Encoding • Estimate and encode source • Estimate and encode vocal tract filter • Store as feature vector • Transmission • Transmit at low data rate (~50-100 vectors per second) • Can do this because of relatively slow movement of vocal tract • Decoding • Recover source information • Recover vocal tract filter information • Convert into synthesiser control parameters • Synthesise speech

  12. Example: Channel Vocoder 19 band-pass filters, spanning 0-4 kHz centre-frequencies arranged non-linearly on frequency axis bandwidths increase with frequency, like ear’s critical bands Energies from filter outputs averaged over 20 ms

  13. Example: Channel Vocoder Spectrum shape (Filter-bank energies) coded by DPCM Combined with binary ‘voiced/unvoiced’ flag plus estimate of fundamental frequency f0if ‘voiced’ 1 ‘frame’ of data (48bits) transmitted 50 times per second 2,400 bps

  14. Example: Channel Vocoder Voiced/unvoiced flag plus f0 used to select source Spectrum shape decoded and used to configure filterbank

  15. Example: Channel Vocoder Analyser Synthesiser

  16. Linear Predictive Coding (LPC) • Basic idea • Assume that value of speech signal at time t can be written as a weighted sum of its values at times t-1, t-2,…, t-N • Nth order Linear Predictive Coding (LPC) • The coefficients a0,…,aNcan be thought of as the parameters of a digital filter (lecture 3) • They define the vocal tract filter at time t • Used in LPC vocoder

  17. Finite Impulse Response (FIR) digital filter  y(n) x(n) Z-1 a1 Z-1 a2 Z-1 aN Z-1

  18. LPC Vocoders • Quality of LPC vocoded speech depends critically on the quality of the excitation signal • Two particular forms of LPC used for speech coding in GSM mobile phones • RELP: Residual Excited LPC • CELP: Codebook Excited LPC

  19. Example: CELP Vocoders • Vocal tract filter: • LPC analysis conducted over short (~20ms) section of speech to give LPC coefficients • Source • Excitation source estimated over window • Compared with a finite set of ‘reference’ excitation signals e1,…,eC. • Code for most similar reference transmitted • The set of references is called a codebook • Hence Codebook Excited LPC

  20. Formant Vocoder • A formant vocoder exploits the importance of F1, F2 and F3 for speech perception • Formant frequencies, amplitudes and bandwidths estimated and used to model vocal tract filter • Transmitted, with V/UV and f0 information at 50-100 frames per second • Speech decoded using a formant synthesiser • Using 5-6 bits for each of the 10 control parameters results in 2.5-6 kbps bit rate

  21. Input Speech Output Speech “recce report…” “recce report…” Speech Speech Recognizer Synthesiser Phone-level Phone-level transcription transcription r E k i r @ p O t .. r E k i r @ p O t .. 50 bps Transmitter Receiver Recognition-Synthesis Coder

  22. Recognition-synthesis coders • New technology – still in research labs • Very low data rates: • Sounds of English (~46 phonemes) can be specified using 6 bits • Talking at 8 phonemes per second, the linguistic content can be encoded in just 50 bps! • Computationally complex

  23. Use of ‘knowledge’ • Bit rates reduced by exploiting properties of the the speech signal: • waveform coders: limited bandwidth • vocoders: signal contains resonances • recognition-synthesis: signal is speech • Highest-level models give lowest bit rates • Paralinguistic properties of the speech are sacrificed: • speaker’s identity • state of health • emotional/psychological state

  24. Summary of coding • Waveform coders • PCM, DPCM, ADPCM • Delta modulation • Vocoders • Channel vocoder, RELP, CELP • Segment vocoder • Recognition-synthesis coders • Trade-offs

More Related