1 / 41

Implementation of H.264 Based System on Multi-DSPs Board

Implementation of H.264 Based System on Multi-DSPs Board. 陳奕安 2008.02.13. Outline. System description Architecture MEX Board TMSDM642 Communication interface Software development Error resilience. Architecture. MEX Board 1. PC 1. Capture Frame. H.264 Encode. Send to Network.

zarola
Download Presentation

Implementation of H.264 Based System on Multi-DSPs Board

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Implementation of H.264 Based System on Multi-DSPs Board • 陳奕安 • 2008.02.13

  2. Outline • System description • Architecture • MEX Board • TMSDM642 • Communication interface • Software development • Error resilience

  3. Architecture MEX Board 1 PC 1 CaptureFrame H.264 Encode Send to Network PC 2 MEX Board 2 PC 2 Display H.264 Decode Receive from Network

  4. MEX Board • MEX board is composed of : • 4 DSP TMS320DM642 for data stream compression (video/audio) and its memory. • 2 FPGA for flexible architecture • 8 video chips SA6711H(ADC) • 4 audio stereo chip CS4221(ADC)

  5. MEX Board 4 DM642 Video/Audio Chip 2FPGA Block Diagram of MEX board[1]

  6. MEX Board Block Diagram Block Diagram of MEX board[1]

  7. TMS320DM642 • TMS320DM642 • Performance : 4000-4800 MIPS • Two Level Cache : • L2: 256 KB, L1P: 16 KB, L1D: 16 KB • 3 Video Ports • 8-Bit McASP • Ethernet MAC • 32-Bit HPI • 66 MHz PCI • 64-Bit EMIF DSP DM642block diagram[2]

  8. TMS320DM642 • Peripherals will be used: • Enhanced DMA (EDMA) • Video ports (VP0~VP2) • Inter-integrated circuit (I2C) bus • External memory interface (EMIF) • Ethernet media access controller(EMAC) • Management data input/output (MDIO)

  9. Outline • System description • Communication interface • Host/ MEX Communication • Video capturing/ Displaying • Network Transmit • Software development • Error resilience

  10. Host/ MEX Communication MEX Start EDMA Unreset DSP1 FIFO Clear PCI Interrupt Set DSP FIFO Direction Set FIFO Full Flag value DSP FIFO is reset DSP started : fill memory Initialize transfer DSP to PCI transfer request Start Transfer Transfer finished PC PCI started : wait for interrupt Initialize transfer PCI to DSP start transfer request Wait for transfer finished Transfer finished Set transfer size Set PCI FIFO direction Select DSP data sources Set transfer destination address Start PCI FIFO Clear DSP Interrupt Data transfer from the 4 DSP (SDRAM)to PCI[1]

  11. Video Capture MEX Board I2C BUS DM642 Camera Video Chip SAA7113H (ADC) VP0 DMA VP1 VP2 NTSC : Analog / 525-line per frame / 30 frames per second or PAL : Analog / 625-line per frame / 25 frames per second ITU656 : Digital / for PAL or NTSC Raw Data

  12. TMS320DM642 Video Port [3]

  13. Network Architecture MEX Board 1 DM642 PHY LXT971ALC EMAC MDIO MEX Board 2 RJ45 DM642 PHY LXT971ALC EMAC MDIO

  14. TMS320DM642 EMAC • DM642 Networking Using EMAC and MDIO DM642 Networking [4]

  15. Outline • System description • Communication interface • Software development • H.264 Codec • Optimization • Parallelization • Memory Issue • Error resilience

  16. H.264 Encoder Block Diagram

  17. H.264 Decoder Block Diagram

  18. Optimization on Single Chip Realization and Optimization of DSP Based H.264 Encoder [5] • Optimization of H.264 on DSP platform • Code transplant and primary optimization • Optimization of the key module • Using TI C64x IMAGLIB • Data scheduling and storage allocation • Data scheduling with EDMA • Storage allocation (Code section/Data section)

  19. Parallelization on Chips • One GOP in one DSP • Each DSP handles IPPP… or IBBPBB... . No dependences are between group of pictures (GOPs). • One Frame / One macroblck in one DSP • Each DSP handle one frame or one macroblock. Dependences are between frames and macroblocks.

  20. Macroblock Dependencies • Data dependencies induced by inter-prediction: • Motion vector MVcur are predicted from MVA~D Reference frame Current frame MVD MVB MVC MVA MVcur Data dependencies induced from MV prediction [6]

  21. Macroblock Dependencies • Data dependencies induced by intra-prediction: • Left, upper-left, upper, and upper-right MBs Data dependencies induced from intra prediction [6]

  22. Macroblock Dependencies • Data dependencies induced by deblocking filter: • Top 4 rows of pixels and leftmost 4 columns Data dependencies induced from deblocking filter [6]

  23. Macroblock Dependencies • Possible spatial data dependencies for a macroblock Intra Pred. MV Pred. Intra Pred. MV Pred. Deblocking Fitler Intra Pred. MV Pred. Intra Pred. MV Pred. Deblocking Fitler Current MB Possible spatial data dependencies for a macroblock [6]

  24. Macroblock Dependencies • Macroblock Dependencies: • Data dependencies between frames • Data dependencies between MB rows in the same frame • Data dependencies in the same MB row

  25. Wave-front parallelization • Partition for MB region Wave-front of Macro-block Region Partition [7]

  26. Wave-front parallelization • Partition for frames Wave-front of Frame Partition [7]

  27. Memory Issue • Limited memory of DM642 • Use memory buffer to reduce memory access L1P Cache Direct Mapped 16Kbytes Total peripherals DM642 DSP Core L2 Cache/ Memory 256Kbytes Total EDMA Controller L1D Cache 2-way Set Associated 16Kbytes Total Two-level cache architecture of DM642

  28. Memory Issue • Memory hierarchy for inter prediction Memory hierarchy [8]

  29. Memory Issue • Slice memory bufferfor intra prediction and deblocking filter Slice Memory [9]

  30. Outline • System description • Communication interface • Software development • Error resilience • Error-Resilience Tools in H.264/AVC • Error resilience of JM source code

  31. Error Resilience Tools in H.264/AVC • Redundant slices (RSs) [10] • For a MB, an encoder can place redundant representation of the same MBs into the same it stream. • e.g. • One slice is coded using different quantization parameter (QP). • If the slice of low QP is available, the decoder discards the RS; otherwise, the RS is reconstructed by the decoder Slice AQP1 Decoder Slice AQP2

  32. Error Resilience Tools in H.264/AVC • Parameter sets[10] • Including picture size, entropy coding method, MV resolution, and so on. • Sequence parameter set (SPS) • Containing all information related to the picture sequence between two IDR (Instantaneous Decoding Refresh ) pictures. • Picture parameter set (PPS) • Containing all information related to all slices in a picture. • e.g. Sending multiple copies of SPSs so to enhance the arrival rate. • e.g. SPSs can be sent out-of-band.

  33. Error Resilience Tools in H.264/AVC • Flexible macro-block ordering (FMO) [10] • 7 modes • Overhead bits highly depends on the picture format, the content, and the QP. • < 5% penalty at QP = 16; on average 20% at QP = 28. 6 modes of FMO [10]

  34. Error Concealment  of H.264/AVC • Error concealment scheme provided in JM • Intra • Inter Error concealment for macro-blocks [11]

  35. Future Work • Optimization the H.264 codec for real time • Implementation of different concealment methods • Proposed corresponding error resilience methods

  36. Reference • [1] VITEC MULTIMEDIA, “MEX User manual Revision 1.7”. • [2] Texas Instruments, Incorporated “TMS320C64x DSP Generation Product Bulletin” (sprt236) • [3] Texas Instruments, Incorporated “TMS320DM64x Video Port to Video Port Communication.” (spraaf3) • [4] Texas Instruments, Incorporated “TMS320C6000 DSP Ethernet Media Access Controller (EMAX) Management Data Input Output Module Reference Guide.” (spru628a) • [5] Zhe Wei and Canhui Cai  “Realization and Optimization of DSP Based H.264 Encoder “, ISCAS 2006 Circuits and Systems, May 2006 • [6] Chen, Y., Li, E., Zhou, X., Ge, S. “Implementation of H. 264 Encoder and Decoder on Personal Computers.” Journal of Visual Communications and Image Representation 17 (2006) • [7] Zhuo Zhao, and Ping Liang, “Data partition for wave-front parallelization of H.264 video encoder”, 31st IEEE International Conference on Acoustics, Speech, and Signal Processing (2006) • [8] Denolf, K. De Vleeschouwer, et al,, “Memory centric design of an MPEG-4 video encoder” , IEEE Trans. CSVT, Vol. 15, No. 5, pp. 609-619, May 2005. • [9] Tsu-Ming Liu et al., “A 125μW, Fully Scalable MPEG-2 and H.264/AVC Video Decoder for Mobile Applications,” ISSCC Digest of Technical Papers, pp. 402-403, Feb. 2006. • [10]S. Wenger, “H.264/AVC over IP,” IEEE Trans. Cir. Syst. Video Technol., vol. 13, pp. 645–656, July 2003. • [11] "Non-normative error concealment algorithms,ITU-T VCEG-N62[S】,2001一O9

  37. Frame partitions Macroblock partitions H.264 Partitions 16x16 blocks 8x8 blocks 4x4 blocks

  38. H.264 Intra-Mode Decision

  39. H.264 Intra-Mode Decision 4*4 horizontal 16*16 plane

  40. 15 Step 1. Unsymmetrical-cross search 10 Step 2-1. local full-search around the starting point 5 Step 2-2. Uneven multi-hexagon search 0 Step 3-1. Extended Hexogon-based search The search will continue until the minimal matching error point is the center of the new hexagon. -5 -10 Step 3-2. Center biased search. -15 -15 -10 -5 0 5 10 15 step3-1 step1 step2-1 step3-2 step2-2 Fast integer & fractional pixel motion estimation Cover both small motion and large motions, the search point which gives the smallest matching error from one step is the starting point of next step. Assume the guessed starting point is (0,0). Around 130 points searched in this algorithm, the save is (33x33-130)/(33x33)90%! If there are 3 starting points are tried, the save is around 64%! Integer pixel search scheme

  41. Fast integer & fractional pixel motion estimation Best matching integer point coming from integer motion search • Search its 1/2 -pixel neighbors • Search its 1/4-pixel neighbors • Search its 1/8-pixel neighbors The optimal point is the search center of next step search. Fractional pixel search scheme

More Related