1 / 34

Implementation and Parallelization of H.264 Based System on Multi-DSPs Board

Implementation and Parallelization of H.264 Based System on Multi-DSPs Board. 陳奕安 2008.06.11. Outline. System Architecture Multithreading of this system Reference framework 5 Parallelism of H.264 Memory issue. System Architecture. MEX Board 1. PC 1. Capture Frame. H.264 Encode.

eryk
Download Presentation

Implementation and Parallelization of H.264 Based System on Multi-DSPs Board

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Implementation and Parallelization of H.264 Based System on Multi-DSPs Board • 陳奕安 • 2008.06.11

  2. Outline • System Architecture • Multithreading of this system • Reference framework 5 • Parallelism of H.264 • Memory issue

  3. System Architecture MEX Board 1 PC 1 CaptureFrame H.264 Encode Send to Network PC 2 MEX Board 2 PC 2 Display H.264 Decode Receive from Network

  4. System Architecture H.264 Encode Processing task TX networking task Input task Camera H.264 Decode processing task Output task RX networking task Computer

  5. Host/ MEX Communication MEX Set DSP FIFO Direction Set FIFO Full Flag value DSP FIFO is reset Start EDMA Unreset DSP1 FIFO Clear PCI Interrupt DSP started : fill memory Initialize transfer DSP to PCI transfer request Start Transfer Transfer finished PC PCI started : wait for interrupt Wait for transfer finished Transfer finished Initialize transfer PCI to DSP start transfer request Set transfer size Set PCI FIFO direction Select DSP data sources Set transfer destination address Start PCI FIFO Clear DSP Interrupt Data transfer from the 4 DSP (SDRAM)to PCI[7]

  6. Host/ MEX Communication Data Image

  7. System Architecture H.264 Encode Processing task TX networking task Input task Camera H.264 Decode processing task Output task RX networking task Computer

  8. Networking of H.264 Video • H.264 High Level Architecture Application Supplemental Enhancement Information Reconstructed picture Video Coding Layer Parameter Sets VCL Data Network Abstraction Layer NAL-unit BitstreamAdoption Packet Adoption AVC / H.264 Transport H.320 System MPEG-2 System AVC Storage RTPPayload H.264 VCL and NAL[6]

  9. Networking of H.264 Video Video Packet Application layer • Video Packetization Session layer RTP header Video Packet NAL-Unit of H .264 Transport layer TMS320C600 Network Developer’s Kit UDP header RTP header Video Packet Network layer IP header UDP header RTP header Video Packet Data link layer MAC header IP header UDP header RDP header Video Packet Physical layer

  10. System Architecture H.264 Encode Processing task TX networking task Input task Camera H.264 Decode processing task Output task RX networking task Computer

  11. I/O buffer management • Input buffers • Output buffers Inputing Inputing Head Tail Inputing Head Tail Head Inputing Outputing Tail Outputing Tail Head Head Tail Outputing

  12. I/O buffer management • Input / output buffers Outputing Tail Head Head Inputing Tail Outputing Head Tail Tail Inputing Head Tail Tail Inputing Head Inputing Tail Head Head Head Inputing Tail Head Outputing Outputing Outputing Head Tail

  13. System Architecture • Multithreading of this system H.264 Encode Processing task TX networking task Input task Camera H.264 Decode processing task Output task RX networking task Computer

  14. Reference framework for DSP • Reference framework 5 • DSP/BIOS, • TMS320 DSP Algorithm Standard • Processing flow of RF5 Split Joint F0 V0 task F1 V1 cell F2 V2 channel Fi, Vi XDAIS algorithm 14

  15. Reference framework for DSP • Data communication of RF5 • SIO : Task & Device • SCOM : Task & Task data buffer device driver task data pointer SIO object data buffer task writer task reader task SCOM queue data pointer SCOM message

  16. Reference framework for DSP • Data communication of RF5 • ICC : Cell& Cell 1 2 3 out in out in out in cell data pointer data buffer ICC object describing a buffer element in an a list of pointers to ICC objects

  17. Reference framework for DSP • Application Control of RF5 • Task Receiving both SCOM messages and control messages task SCOM queuefor data messages SCOM message MBX mailbox for control messages

  18. System Architecture • The present system Input task H.264 Encode Processing task Frame i Slice NAL Frame i+1 Rx TX networking task Control task

  19. System Architecture • Multithreading of this system Input task H.264 Encode Processing task MB Frame i MB NAL Frame i+1 MB Rx TX networking task Control task

  20. Parallelizing H.264 • Task-level Decomposition • Divide the algorithm into balance tasks • Accelerate each task • Data-level Decomposition • GOP-level Parallelism • Frame-level Parallelism • Slice-level Parallelism • Macroblock-level Parallelism

  21. H.264 Encoder Block Diagram Dn + X Fn (Current) T Q Reorder Entropy encode NAL - ME Inter F’n-1 (reference) MC P Choose Intra prediction Intra prediction Intra D’n + F’n (reconstructed) Filter T -1 Q-1 uF’n -

  22. H.264 Decoder Block Diagram Inter F’n-1 (reference) MC P Intra prediction Intra D’n + F’n (reconstructed) Filter T -1 Q-1 Reorder Entropy decode uF’n NAL -

  23. Task-level Decomposition • Task profile for H.264 [2]

  24. Parallelizing H.264 • H.264 data structure Video Sequence GOP0 GOP1 GOP2 … GOPn Group of picture …. Fn F2 F1 F0 Slice Slice 0 MB0 MB1 MB2 … MBn Slice 1 Slice 2 Cb …. Cr Slice 3 Frame Y Macroblock

  25. Data-level Decomposition • GOP-level Parallelism • High latency, large memory • Frame-level Parallelism • I, P, B frame imbalance • Slice-level Parallelism • Bitrates increase • Macroblock-level Parallelism

  26. Macroblock-level Parallelism • Spatial parallelism • Temporal parallelism • Spatial & temporal parallelism • Possible data dependencies for macroblock frame i frame i + 1 Intra Pred. MV Pred. Intra Pred. MV Pred. Deblocking Fitler Intra Pred. MV Pred. search window Intra Pred. MV Pred. Deblocking Fitler Current MB 26

  27. Macroblock-level Parallelism • Spatial parallelism MBs processed MBs processing MBs to be process

  28. Macroblock-level Parallelism • Temporal parallelism frame i + 1 frame i MBs processed MBs processing MBs to be process

  29. Macroblock-level Parallelism • Spatial & temporal parallelism frame i + 1 frame i

  30. System Architecture • Multithreading of this system Input task H.264 Encode Processing task MB Frame i MB NAL Frame i+1 MB Rx TX networking task Control task

  31. Memory Issue • Limited memory of DM642 • Use memory buffer to reduce memory access L1P Cache Direct Mapped 16Kbytes Total peripherals DM642 DSP Core L2 Cache/ Memory 256Kbytes Total EDMA Controller L1D Cache 2-way Set Associated 16Kbytes Total Two-level cache architecture of DM642

  32. Memory Issue • Memory hierarchy for inter prediction Memory hierarchy [4]

  33. Memory Issue • Slice memory bufferfor intra prediction and deblocking filter Slice Memory [5]

  34. Reference • [1] Texas Instruments, Incorporated “Reference Frameworks for eXpressDSP Software: RF5, An Extensive, High-Density System.” (spru795a) • [2] TC Chen, HC Fang, CJ Lian, CH Tsai “Algorithm analysis and architecture design for HDTV applications - a look at the H.264/AVC video compressor system “IEEE CIRCUITS & DEVICES MAGAZINE MAY/JUNE 2006 • [3] CorMeenderinck, ArnaldoAzevedo and Ben Juurlink“Parallel Scalability of Video Decoders” April 29, 2008. • [4] Denolf, K. De Vleeschouwer, et al,, “Memory centric design of an MPEG-4 video encoder” , IEEE Trans. CSVT, Vol. 15, No. 5, pp. 609-619, May 2005. • [5] Tsu-Ming Liu et al., “A 125μW, Fully Scalable MPEG-2 and H.264/AVC Video Decoder for Mobile Applications,” ISSCC Digest of Technical Papers, pp. 402-403, Feb. 2006. • [6] T. Wiegand et al., “Overview of H.264/AVC Video Coding Standard”, IEEE Trans. on Circ. and Sys. For Video Technology, Vol. 13, No. 7, pp. 560–576, July 2003.1 • [7] VITEC MULTIMEDIA, “MEX User manual Revision 1.7”.

More Related