Loading in 2 Seconds...

Motion Compensated Prediction and the Role of the DCT in Video Coding

Loading in 2 Seconds...

162 Views

Download Presentation
##### Motion Compensated Prediction and the Role of the DCT in Video Coding

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Motion Compensated Prediction and the Role of the DCT in**Video Coding Michael Horowitz Applied Video Compression michael@avcompression.com**Outline**• Overview block-based hybrid motion compensated predictive video coding • ITU-T standards H.261, H.263, H.264 • ISO/IEC standards: MPEG-1, MPEG-2 & MPEG-4 • Survey motion estimation & compensation • Discrete cosine transform (DCT) • Coding efficiency • Computational complexity • Perceptual implications**Block-Based Hybrid Motion Compensated Predictive Coding**• Video picture partitioned into macroblocks • Macroblock (MB) has three components • One luma • “Y”, represents “lightness” • 16x16 luma samples • Two chroma • “Cb” & “Cr”, represent color • 16x16, 8x16, or 8x8 chroma samples**4:4:4**4:2:2 4:2:0 Y Cb Cr Block-Based Hybrid Motion Compensated Predictive Coding (continued) • Human Visual System more sensitive to luma • Chroma frequently sub-sampled • Sub-sampling examples • Two coding modes for macroblocks**Motion Estimate**Location of input MB Search Region Reference Picture Inter-Picture Macroblock Coding • Estimate motion of blocks from picture-to-picture • Search previously coded (reference) pictures • Encode • Location of motion estimate (motion vector) • Difference between input MB and motion estimate**Intra-Picture Macroblock Coding**• Input MB coded using intra-picture prediction • Prediction derived from spatially adjacent MBs • Earlier algorithms offer no intra-picture prediction • Significantly lower coding efficiency than inter-coded MBs at low data rates • Useful when motion estimate is poor • Can be used to stop error propagation**Survey Motion Estimation and Motion Compensation**• Motion models • Translational (focus of talk) • Location of kth motion compensated block • (Xk,Yk) is location of kth input block • (MVx,k,MVy,k) is motion vector (MV) for kth block • Affine motion models • Rotation • Scaling • Video standards do not use affine models**Motion Estimation**• Estimate inter-picture block translation • Luma samples (and sometimes chroma) • Example • Distortion: Sum of Absolute Differences (SAD) • Low complexity • Commonly used in real-time production encoders • Find (MVx,k, MVy,k) that minimizes SAD between • Input block sk(i,j) • Motion compensated prediction in reference picture r(i,j) • Subject to search range**Motion Estimation (continued)**• Fast motion estimation algorithms (Xk,Yk) Reference Picture r(i,j) Sample Locations Search Range**z(x1,y1)**z(x2,y1) z(x*,y*) z(x1,y2) z(x2,y2) Fractional Sample Motion Estimation • Estimate content between samples • Example: bilinear interpolation x1 ≤ x* < x2 and y1 ≤ y* < y2 fx= (x*-x1)/(x2-x1) fy= (y*-y1)/(y2-y1) z(x*,y*) = (1-fx)(1-fy)z(x1,y1) + fx(1-fy) z(x2,y1) + fxfy z(x2,y2) + (1-fx)fyz(x1,y2)**Fractional Sample Motion Estimation(continued)**• H.261 • No fractional sample motion estimation • MPEG-1, MPEG-2 and H.263 • 1/2-sample, bilinear interpolation • H.264 | MPEG-4 AVC & SVC • Luma • 1/2-sample, 6-tap interpolation • 1/4-sample, simple average • Chroma (1/8-sample, bilinear)**Fractional Sample Motion Estimation(continued)**• Coding efficiency gain H.263, [from Wang 2002]**Multiple Motion Vectors per MB**• One motion vector for each sub-block • H.264 results [Bjontegaard 2001]**Multiple Reference Pictures [Wiegand, Zhang, & Girod 1997]**• Coding gains • Uncovered areas • More integer motion vector estimates Integer sample location Integer sample location t-3 t-2 t-1 t0 Direction of motion**Multiple Reference Pictures(Continued)**• H.263 Annex U [Horowitz 2000]**Multi-Hypothesis Motion Compensated Prediction[Flierl,**Wiegand & Girod 1998] • Linear combination of multiple predictions • One motion vector for each prediction • Bi-predicted pictures are special case (2 MVs) • Predictions may be forward & backward in time**Multi-Hypothesis for H.263**• Sequences Mobile & Calendar and Foreman • Results [Flierl 1998]**Overlapped Block Motion Compensation[Orchard & Sullivan**1994] • Special case of multi-hypothesis coding • H.263 advanced prediction mode (Annex F) • Overlapped block motion compensation • 1 coded + 2 “derived” motion vectors • Non-uniform spatial weighting of samples • 4 motion vectors per macroblock**Rate-Distortion Optimization**• MV resulting in lowest distortion often not optimal • Goal: Find best tradeoff between distortion and rate • Strategy [Everett III 1963], [Shoham & Gersho 1988] • Minimize Jk for each block k separately, using common Total bit-rate Total distortion Distortionfor block k Ratefor block k**Perceptual Tuning**• Prevent transparent foreground macroblocks • Blurring of fast moving objects • Deblocking filter • Artifacts in the motion wake**Coding Summary**• Macroblock-based coding • Two basic macroblock coding modes • Inter-coded MB motion compensated prediction • Intra-coded MB**1-D Discrete Cosine Transform**• Type IIforward DCT [Ahmed et al. 1974] • Type IIinverse DCT**2-Dimensional DCT**• Forward • Inverse**Why Choose the DCT?**• Coding efficiency • Computational complexity • Perceptual implications**^**X1 ^ X ^ X2 Coding Efficiency X1 Q1 • Source X = [X1, X2] • Xi is a Gaussian random variable • Mean = 0, Variance = i2 • Rate of quantizer Qi is Ri(bits / index) • Total rate R = R1 + R2 X X2 Q2**Coding Efficiency (continued)**• Distortion • Square error • High-rate assumption • High-rate implies R ≥ 3 bits / sample • Often works well for lower rates • Asymptotic Quantization Theory [Gray & Neuhoff 1998] • Total distortion**Rate Allocation Problem**• What is smallest D = D* subject to ? • Find optimal value for • Minimizing D with respect to R1 yields**Rate Allocation Problem(continued)**It follows that and which implies**^**^ ^ X X3 X12 ^ X2 ^ X1 Generalize for k Quantizers • Rate • Distortion • Recall X1 Q1 X12 X2 X Q2 X3 Q3**Generalization (continued)**• 2 quantizers with subject to • Minimize with respect to R3**Generalization (continued)**• It follows that • Generalize to k quantizers by induction**Optimal Rate and Distortion [Huang & Schultheiss 1963]**• Rate • Distortion**Observations and Comments**• #1 Optimal rate for Qi proportional to • #2 Optimal distortion • #3 In practice, systems use positive [Segall 1976] integer [Farber & Zeger 2005]Ri**Question**• Given Gaussian source X & fixed encoder structure (i.e., k scalar quantizers) how can we minimize D subject to ?**^**^ ^ ^ ^ ^ ^ X1 X2 Xk Y2 X Yk Y1 Transform Coding [Kramer & Mathews 1956] Y1 X1 T T-1 Q1 • For orthogonal T Y2 X2 X Q2 … Yk Xk Qk**Fact 1**• Karhunen-Loeve Transform (KLT) produces smallest . [Huang et al. 1963] • a) Gaussian input random variables • b) High-rate quantizers • c) Rate of each quantizer is arbitrary real value • d) Square error distortion measure**Fact 2**• The autocorrelation matrix of the KLT transform vector is diagonal. • KLT coefficients are uncorrelated • There is no general theorem stating uncorrelated quantities can be more efficiently quantized than correlated ones**Fact 3**• If KLT produces , orthogonal produces ≥ then for & Energy compaction**Practical Considerations**• KLT impractical for many systems • Computational complexity • Transform is signal dependent • Compute and apply transform for each input • Consider Fourier based transforms • Fast algorithms exist • Examine loss of coding efficiency resulting from loss of energy compaction**Energy Compaction of Some Discrete Transforms**• 1x32 block in natural images [Lohscheller]**2-D Energy Compaction [from Hedberg & Nilsson 2004]**• KLT DCT • DFT**Computational Complexity**• Recall DCT may be derived from DFT • First N coefficients of 2N-point DFT • Requires appropriate input sequence symmetry • Requries scaling [Tseng & Miller 1978] wherefm is mth DFT coefficient • Leverage FFT to compute DCT**Computational Complexity(continued)**• 1-D 8-point DCT from 16-point DFT • 13 mults, 29 adds [Arai et al. 1988] • 8 final scaling multiplies rolled into quantization • Net 5 mults, 29 adds best known • Fast 2-D DCT (8x8) • Separable [from Pennebaker & Mitchell 1992] • 80 mults, 464 adds best known • Non-separable [Feig 1992] • 54 mults, 416 adds, 6 shifts**Perceptual Implications**• Contrast sensitivity of HVS • See last page of handout [Barlow & Mollen 1982] • Perceptually tuned quantization tables [Watson] • Filter coefficients prior to quantization • Shape frequency content of source • Exploit HVS contrast sensitivity**Concluding Summary**• Motion estimation & compensation • Translation-based motion models • Fractional sample motion estimation • Multiple motion vectors per macroblock • Multiple reference pictures • Multi-hypothesis motion compensated prediction • Overlapped block motion compensation**Concluding Summary**• DCT • Near optimal R-D performance for wide range of sources (Gaussian, high-rate assumptions) • Simple relationship to DFT fast • Perceptual relevance**References**• N. Ahmed, T. Natarajan, and K. R. Rao, “Discrete cosine transform,” IEEE Trans. Comput., vol. C-23, pp. 90–93, Jan. 1974. • Y. Arai, T. Agui, M. Nakajima, “A Fast DCT-SQ Scheme for Images”, Trans. of the IEICE.E 71(11):1095(Nov.1988). E. Feig, S. T.Winograd, “Fast Algorithms for Discrete Cosine Transform”, IEEE Trans. Signal Proc., 40, 2174-2193 (1992). • H. B. Barlow and J. D. Mollon, The Senses. Cambridge: Cambridge University Press, 1982. • G. Bjontegaard “Objective simulation results”, Document VCEG-M34, Video Coding Experts Group (VCEG),Thirteenth Meeting: Austin, Texas, USA, 2-4 April, 2001 • H. Everett III, “Generalized Lagrangian Multiplier Method for Solving Problems of Optimum Allocation of Resources,” Operations Research, vol. 11, pp. 399-417, 1963. • B. Farber and K. Zeger, “Quantization of Multiple Sources Using Integer Bit Allocation" Data Compression Conference (DCC) Salt Lake City, Utah, March 2005 (to appear).**References (continued)**• M. Flierl, T. Wiegand, B. Girod, “Locally Optimal Design Algorithm for Block-Based Multi-Hypothesis Motion-Compensated Prediction,” Proc. of the IEEE Data Compression Conference (DCC'98), pp. 239-248, Snowbird, USA, Apr. 1998. • A. Gersho and R. M. Gray, Vector Quantization and Signal Compression, Kluwer Academic Publishers, Boston, 1992. • B. Girod, Lecture for EE368b, Video and Image Compression Stanford University. • R. M. Gray and D. L. Neuhoff, "Quantization," IEEE Transactions on Information Theory, vol. 44, pp. 2325-2384, Oct. 1998. • R. M. Haralick, “A Storage Efficient Way to Implement the Discrete Cosine Transform”, IEEE Transactions on Computers, 25 (6) (1976) 764–765. • H. Hedberg, and P. Nilsson, “A Survey of Various Discrete Transforms used in Digital Image Compression Algorithms,” Proceedings of the Swedish System-On-Chip Conference 2004, Bastad, Sweden, April 13-14, 2004.