1 / 35

Concepts of Multimedia Processing and Transmission

Concepts of Multimedia Processing and Transmission. IT 481, Lecture #4 Dennis McCaughey, Ph.D. 25 September, 2006. Introduction to Linear Systems. The Modified Discrete Cosine Transform (MDCT) was introduced in the lecture on MP3 encoding

luigi
Download Presentation

Concepts of Multimedia Processing and Transmission

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Concepts of Multimedia Processing and Transmission IT 481, Lecture #4 Dennis McCaughey, Ph.D. 25 September, 2006

  2. Introduction to Linear Systems • The Modified Discrete Cosine Transform (MDCT) was introduced in the lecture on MP3 encoding • How does it relate to the Discrete Cosine Transform (DCT) and why are we concerned? • The DCT and DCT are important enablers in data compression of both audio and video. • The DCT is a special case of the Discrete Fourier Transform (DFT), a key component in digital signal processing • The Fast Fourier Transform (FFT) is a computationally efficient form of the DFT IT 481, Fall 2006

  3. Linear System Definition IT 481, Fall 2006

  4. Linear System Response to a Series of Sampled data Inputs IT 481, Fall 2006

  5. Linear System Input/Output This is denoted as the convolution of f(t) and h(t) IT 481, Fall 2006

  6. Fourier Transform - Non-periodic Signal • Let g(t) be a continuous & non-periodic function of t • The Fourier Transform of g(t) is • Where w = 2pf is the radial frequency in unit of radian/sec, and f the frequency in unit of Hz • The Inverse Fourier Transform is IT 481, Fall 2006

  7. Fourier Transform Example IT 481, Fall 2006

  8. Relationship Between the Fourier Transform and Convolution IT 481, Fall 2006

  9. A Very Important Property IT 481, Fall 2006

  10. Convolution Sum Example ng = nf + nh -1 f(k) = h(k) =0 for k >2 IT 481, Fall 2006

  11. Integer Arithmetic Example • Multiplication of 2 Integers is a form of discrete convolution IT 481, Fall 2006

  12. Discrete Convolution in Matrix Form IT 481, Fall 2006

  13. Enter the Discrete Fourier Transform IT 481, Fall 2006

  14. Discrete Fourier Transform (DFT) • A discrete-time version of the Fourier Transform that can be implemented in digital domain • Given an N-point time-sampled sequence {x0, x1 ,…, xN-1}, the DFT is described by a transform pair with complexity O(N2) • Furthermore, IT 481, Fall 2006

  15. Fast Fourier Transform (FFT) • FFT is a computationally efficient algorithm O(Nlog2N). Recall DFT transform • Let • It can be shown that • Where Gn and Hn are two half-sized DFTs of even and odd terms IT 481, Fall 2006

  16. The FFT Efficient Implementation • Each half-size DFT can in turn be divided into a pair of quarter-size DFTs. • End result is a partition and reordering of time domain inputs using what is known as bit-reverse addressing • Each stage of the DFT consists of N complex multiply-accumulates in a straight forward implementation • Further simplification from eight to six real operations by the “butterfly” • Further simplification when time-domain sequence is real IT 481, Fall 2006

  17. The FFT Structure IT 481, Fall 2006

  18. The Discrete Cosine Transform (DCT) • DCT is a Fourier-related transform similar to the discrete Fourier transform (DFT), but using only real numbers • It is equivalent to a DFT of roughly twice the length, operating on real data with even symmetry (since the Fourier transform of a real and even function is real and even) IT 481, Fall 2006

  19. The Modified Discrete Cosine Transform (MDCT) The MDCT is 50% overlapped making it very useful for quantization as it effectively removes the otherwise easily detectable blocking artifact between blocks IT 481, Fall 2006

  20. In Matrix Notation(2 Length-8 Blocks) IT 481, Fall 2006

  21. Fourier Transform Summary • Physical Interpretation • Describes the frequency content of a real-world signal • For real-world signals, frequency content tails off as frequencies get higher • Mathematical Interpretation • Convolution in time domain becomes multiplication in frequency domain • A matrix that diagonalizes a circulant convolution matrix • DCT is a special case of the DFT IT 481, Fall 2006

  22. Adaptive Transform Coding (ATC) • Another frequency domain technique for bit rate rage of 9.6 – 20 Kbps and involves block transformation of windowed input segment of speech waveform • Each segment is represented by a set of transformed coefficients which are quantized and transmitted in lieu of the signal itself • At receiver, quantized coefficients are inverse-transformed to get back to original waveform • The most attractive and frequently used transformed is the Discrete Cosine Transform (DCT) and corresponding Inverse Discrete Cosine Transform (IDCT) IT 481, Fall 2006

  23. ATC Practicality • Bit allocation among different coefficients are varied adaptively from frame to frame while keeping the total number of bits constant • Time-varying statistics controls the bit allocation procedure and has to be transmitted as side information (an overhead of about 2 Kbps) • Side information is also used to determine the step size of various coefficient quantizers • In practice, the DCT and IDCT are not directly evaluated using the formulation here but rather by computationally efficient algorithm such as the FFT IT 481, Fall 2006

  24. Source Coding - Vocoders • A class of speech coding system that analyze the voice signal at the transmitter, derive the parameters and transmit them to the receiver at which voice is synthesized using these parameters • All vocoders attempt to model the speech generation process by a dynamic system and quantify the physical parameters of the system • In general much more complex than waveform coders and achieve very high economy in transmission bit rate • They tend to be less robust and performance are very much speaker-dependent IT 481, Fall 2006

  25. Channel Vocoder • The first among many analysis-synthesis systems that was demonstrated • Frequency domain vocoder that determine the envelope of the speech signal for a number of frequency bands and then sample, encode and multiplex these samples with the encoded outputs of the other filters • The sampling is done synchronously every 10 ms to 30 ms • Along with energy information about each band, the voiced/unvoiced decision, the pitch frequency for voiced speech are also transmitted IT 481, Fall 2006

  26. Cepstrum Vocoder • The cepstrum vocoder separates the excitation and vocal tract spectrum by the Inverse Fourier transform of the log magnitude spectrum of the signal • The low frequency coefficients in the cepstrum correspond to the vocal tract spectral envelope • High frequency excitation coefficients form periodic pulse train at multiples of the sampling period • At the receiver, the vocal tract cepstral coefficients are Fourier transformed to produce the vocal impulse response • By convolving the impulse response with a synthetic excitation signal, the original speech is reconstructed IT 481, Fall 2006

  27. Linear Predictive Coders (LPC) • The time-domain LPC extracts the significant features of speech from its waveform. • Computationally intensive but by far the most popular among the class of low bit rate vocoders. It’s possible to transmit good quality voice at 4.8 Kbps and poorer quality voice at lower rates • LPC models the vocal tract as an all-pole digital filter, and uses a weighted sum of past p samples to estimate the present sample (10 p 15), with en being the prediction error IT 481, Fall 2006

  28. LPC Coefficients • The LPC coefficients an are found by solving the system of equations • Where Cmk are the correlation coefficients computed from the m-th and k-th lags of sn • A matrix inversion is needed hence high computational load • The reflection coefficient, a related set of coefficients are transmitted in practice IT 481, Fall 2006

  29. LPC Transmitted Parameters • Reflection coefficients can be adequately represented by 6 bits • For q = 10 predictor, needs 72 bits per frame • 60 bits for coefficients • 5 bits for a gain parameter and 6 bits for a pitch period • If parameters are estimated every 15 – 20 msec • Resulting bit rate has a range of 2400 – 4800 bps • Additional saving can be achieved via a non-linear transformation of the coefficients prior to coding to reduce sensitivity to quantization error IT 481, Fall 2006

  30. LPC Receiver Processing • At the receiver, the coefficients are used for a synthesis filter. • Various LPC methods differ based on how the synthesis filter is excited • Multi-pulse Excited LPC: typically 8 pulses with proper positions are used as excitation • Code-Excited LPC (CELP): transmitter searches its code book for a stochastic excitation to the LPC filter that gives the best perceptual match to the sound. The index to the code book is then transmitted • CELP coders are extremely complex and can require more than 500 MIPS • However, high quality is achieved when excitation is code at 0.25 bits/sec and transmission bit rate as low as 4.8 Kbps IT 481, Fall 2006

  31. Various LPC Vocoders IT 481, Fall 2006

  32. ITU-T Speech Coding Standards IT 481, Fall 2006

  33. Speech Coder Performance • Objective measure: how well the reconstructed speech signal quantitatively approximates original version? • Mean Square Error (MSE) distortion • Frequency weighted MSE • Segmented Signal to Noise Ration (SNR) • Subjective measure: conducted by playing the sample to a number of listeners to judge the quality of the speech • Overall quality, listening efforts, intelligibility, naturalness • Diagnostic Rhyme Test (DRT): most popular for intelligibility • Diagnostic Acceptability Measure (DAM) evaluates acceptability of speech coding system • Mean Opinion Score (MOS) the most popular ranking system IT 481, Fall 2006

  34. Mean Opinion Score (MOS) • Most popular ranking system IT 481, Fall 2006

  35. MOS for Speech Coders IT 481, Fall 2006

More Related