1 / 33

Introduction to Image and Video Coding Algorithms

Introduction to Image and Video Coding Algorithms. Outline. Transform-based Image and Video Coding Linear Transformation – DCT Quantization Scalar Quantization Vector Quantization Entropy Coding Video Coding – Motion Compensation. Transform-based Image Coding. Input Image. Binary bit

Olivia
Download Presentation

Introduction to Image and Video Coding Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Image and Video Coding Algorithms

  2. Outline • Transform-based Image and Video Coding • Linear Transformation – DCT • Quantization • Scalar Quantization • Vector Quantization • Entropy Coding • Video Coding – Motion Compensation

  3. Transform-based Image Coding Input Image Binary bit stream Linear Transform Quanti- zatioin Entropy Coding

  4. If the signal is formatted as a vector, a linear transform can be formulated as a matrix-vector product that transform the signal into a different domain. Examples: K-L Expansion Discrete Fourier Transform Discrete cosine transform Discrete wavelet transform Energy compaction property: The transformed signal vector has few, large coefficients and many nearly zero small coefficients. These few large coefficients can be encoded efficiently with few bits while retaining the majority of energy of the original signal. Linear Transform

  5. An image is a 2D signal of pixel intensities (including colors). A block-based image coding scheme partitions the entire image into 8 by 8 or 16 by 16 (or other size) blocks. Then the coding algorithm is applied to individual blocks independently. Blocks may be overlapping or non-overlapping. Advantage: parallel processing can be applied to process individual blocks in parallel. For hand-held devices, only one block needs be loaded into main memory each time. Block-based Image Coding

  6. Quantization DC DPCM Matrix 8x8 DC Huffman block DCT Q AC Zig Zag Huffman Scan AC Code books JPEG Encoding Process JPEG Image Coding Algorithms

  7. JPEG Decoding DC DC IDPCM 8x8 Huffman block IDCT IQ AC Huffman AC JPEG Decoding Process

  8. Color sub-sampling A color image is converted from RGB to YUV color space. Each pixel in each dimension is 1 byte. Sub-sample U-V planes: 4:1:1 scheme. For every 16 by 16 block of a color image, six 8 by 8 blocks are encoded. Level shifting: Each pixel value is subtracted by 128 so it ranges (–128, 127). Pre-Processing Four 88 blocks of luminance pixels, plus two 88 sub-sampled chrominance components makes a 16 by 16 macro-block

  9. 88 two-dimensional separable DCT: DCT is chosen because it leads to superior energy compaction for natural images. F(0,0): DC coefficient ranges (-128x64/4,127x16) needs 12 bits to represent (including sign bit). 12 bits are more than enough for the remaining AC coefficients (u > 0, or v > 0) Discrete Cosine Transform

  10. Inverse DCT (IDCT) • 88 two-dimensional separable IDCT: • IDCT can be computed using the same routine as DCT

  11. DCT Basis Functions

  12. Quantization of DCT Coefficients

  13. DC coding: All DC coefficients of each 8 by 8 blocks of the entire image are combined to make a sequence of DC coefficients. Next, DPCM is applied: DiffDC(blocki) = DC(blocki) – DC(blocki–1) Then DiffDCs will be encoded using Hoffman entropy Example: Original: 1216  1232  1224  1248  1248  1208 After DPCM: 1216  +16  -8  +24  0  -40 DPCM of DC coefficients

  14. Entropy coding: Task: to assign a variable-length binary code to a finite set of alphabets. Goal: to minimize the average length (number of bits) per alphabet. Approach: Shorter code for alphabet occurred more frequently. Longer for infrequent ones. Optimal solution: When the averaged code length approaches the entropy of the source. Huffman coding: Code words are derived from a (perhaps) un-balanced binary tree. Arithmetic coding is another entropy coding method. Huffman Entropy Coding

  15. Encoding and decoding of Huffman code is done via look-up table. In JPEG, DC coefficients (after DPCM) are first grouped according to their magnitudes. Each category is assigned as a symbol and a Hoffman table is given. For example, –7 to –4 and 4 to 7 are listed as category 3 which has a code "00“. If the number is positive, the binary representation of the number will be append to the Hoffman code of the category number directly. For example, 6 is encoded as 00110. If the number is negative, the appended code is the 1’s complement of that number. For example, -5 is encoded as 00010. Question: Given such a table, how to devise a dedicated hardware to implement the encoding procedure? Huffman Encoding of DC Coefficients

  16. JPEG Huffman Table: Categories

  17. Example: -9: category 4. Hence Base code = 101 1’s complement of (-9) = 1C(1001) = 0110 Code word = 101 + 0110 = 1010110 Note that category 3 occurs most frequent and hence has shortest base code word. JPEG DC Entropy Coding

  18. AC coefficients are first weighted with a quantization matrix: C(i,j)/q(i,j) = Cq(i,j) Then quantized. Then they are scanned in a zig-zag order into a 1D sequence to be subject to AC Huffman encoding. Question: Given a 8 by 8 array, how to convert it into a vector according to the zig-zag scan order? What is the algorithm? 1 2 6 7 15 16 28 29 3 5 8 14 17 27 30 43 4 9 13 18 26 31 42 44 10 12 19 25 32 41 45 54 11 20 24 33 40 46 53 55 21 23 34 39 47 52 56 61 22 35 38 48 51 57 60 62 36 37 49 50 58 59 63 64 AC Coefficients Zig-Zag scan order

  19. The symbols for encoding AC coefficient consists both the number of significant bits, as well as runs of 0s preceding the nonzero AC coefficient. For example, 5 0 2 0 0 –1 is encoded as: 100101 11100110 110110 This is according to the table below: AC Coefficients Huffman Encoding

  20. A look-up table procedure. Challenge: How to perform decoding fast? Example: a Huffman table for six symbols: The decoding process can be modeled as a finite state machine with the following state diagram. It decodes one bit of input bit stream per clock cycle. Question: How to make this process fast enough to match any input bit rate? d 0/C,1/D 0/A 0/- 1/- 1/- a b c 0/B 1/- 0/E,1/F e Huffman Decoding

  21. Video Coding • Video coding is often implemented as encoding a sequence of images. Motion compensation is used to exploit temporal redundancy between successive frames. • Examples: MPEG-I, MPEG-II, MPEG-IV, H.323, H.263, H.263+, etc. • Existing video coding standards are based on JPEG image compression as well as motion compensation.

  22. MPEG Encoding Buffer control Current frame x(t) r Bit stream Buffer + VLC DCT Q  Q-1 IDCT ^ Q[r(t)]: reconstructed residue x(t): predicted frame + ~ x(t): reconstructed current frame ~ Motion Estimation & Compensation x(t-1) x(t) Frame Buffer This is a simplified block diagram where the encoding of intra coded frames is not shown. Motion vectors

  23. MPEG Decoding VLD: Variable Length Decoding Received bit stream Bit stream Buffer VLD Q-1 IDCT ^ Q[r(t)]: reconstructed residue x(t): predicted frame + ~ x(t): reconstructed current frame ~ Motion Compensation Frame Buffer x(t-1) Motion vectors

  24. Three types of frames: Intra (I): the frame is coded as if it is an image Predicted (P): predicted from an I or P frame Bi-directional (B): forward and backward predicted from a pair of I or P frames. A typical frame arrangement is (subscripts are used to distinguish them): I1 B1 B2 P1 B3 B4 P2 B5 B6 I2 P1, P2 are both forward-predicted from I1. B1, B2 are interpolated from I1 and P1, B3, B4 are interpolated from P1, P2, and B5, B6 are interpolated from P2, I2. Motion Estimation

  25. Forward Motion Estimation 1 2 3 4 2 4 1 3 8 5 5 6 7 8 7 6 12 11 9 9 10 11 12 10 15 13 16 13 14 15 16 14 Current frame constructed From different parts of reference frame Reference frame

  26. MAD: Mean absolute difference between the I,j-th pixel of the current block x(i,j) and the (I+m,j+n)-th pixel of the reference frame. (-pm,n  p) is the motion vector corresponding to the macro-block. M and N are search range. It is similar to DPCM in the temporal domain, and has less to do with object motion. motion vector current block search area current frame reference frame Block Motion Estimation

  27. Video sequence : Tennis frame 0 Prepared by Surin Kittitornkun

  28. Video sequence : Tennis frame 1 Prepared by Surin Kittitornkun

  29. Frame Difference Prepared by Surin Kittitornkun

  30. What is motion estimation? Prepared by Surin Kittitornkun

  31. What is motion compensation ? Prepared by Surin Kittitornkun

  32. Motion Compensated Frame Difference Prepared by Surin Kittitornkun

  33. Do h=0 to Nh-1 Do v=0 to Nv-1 MV(h,v)=(0,0) Dmin(h,v)= Do m=-p to p (-1) Do n=-p to p (-1) MAD(m,n)=0 Do i=hN to hN+N-1 Do j=vN to vN+N-1 MAD(m,n)= MAD(m,n) +|x(i,j)-y(i+m,j+n)| End do j End do i If Dmin(h,v)>MAD(m,n) Dmin(h,v)=MAD(m,n) MV(h,v)=(m,n) End if End do n End do m End do v End do h 6-Level Nested Do Loop

More Related