1 / 38

Efficient Real-Time Multicore Image Processing on TI C66x final presentation

Efficient Real-Time Multicore Image Processing on TI C66x final presentation. Yaron Doweck Yael Einziger Supervisor: Mike Sumszyk Spring 2011 Semester Project. Learn to use the new TI C6678 multi-core platform and to exploit its abilities and advantages.

lev
Download Presentation

Efficient Real-Time Multicore Image Processing on TI C66x final presentation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Efficient Real-Time Multicore Image Processing on TI C66xfinal presentation YaronDoweck Yael Einziger Supervisor: Mike Sumszyk Spring 2011 Semester Project

  2. Learn to use the new TI C6678 multi-core platform and to exploit its abilities and advantages. • Implement a real-time tracking algorithm using multi-core programming and VLIB. • Create a framework for multi-core, Ethernet video streaming and DSP-FPGA communication. Project Goal

  3. Keystone Architecture • SYS/BIOS • VLIB • Tracking Algorithm • Tracking System • Performance Analysis • Encountered Difficulties • Future Projects Contents

  4. In the first part of the project, the main goal was learning: • The C6678 platform • TI development enviroments • The Multi Core SDK • The SYS\BIOS Real-Time OS Learning the Platform

  5. 8 cores External Memory Controller • 3EDMA • Controller Multicore Navigator KeyStone Architecture • Network Coprocessor Semaphore Module

  6. DDR3: Up to 10666MB/s Shared Memory: 4 access ports, each up to 16000MB/s TeraNet KeyStone Architecture TeraNet Switch Fabric: Up to 256GB/s

  7. C66 DSP Up to 20 GFLPOS @ 1.25GHz 32KB L1D Cache\SRAM 32000MB/s 32KB L1P Cache\SRAM 32000MB/s C66 CorePac 512KB L2 Cache\SRAM 16000MB/s

  8. Comparison with Previous Generations

  9. KeystoneArchitecture • SYS/BIOS • VLIB • Tracking Algorithm • Tracking System • Performance Analysis • Encountered Difficulties • Future Projects Contents

  10. SYS/BIOS is an advanced real-time light operating system from Texas Instruments. • It is designed for use in embedded applications that need real-time scheduling and synchronization. • SYS/BIOS is delivered as a set of pre-compiled packages that provide the modules that make up the OS. • Each can module is loaded and configured separately (only the selected modules are loaded making the OS as light as possible). SYS/BIOS

  11. Main SYS/BIOS modules used in the project: • BIOS – Manages the OS. • Task – Creating and managing threads. • HWI – Hardware Interrupts. • Semaphore - Creating and managing semaphore. • IPC – Inter Processor Communication. • Timestamp - Provides timestamp service for performance analysis. SYS/BIOS

  12. KeystoneArchitecture • SYS/BIOS • VLIB • Tracking Algorithm • Tracking System • Performance Analysis • Encountered Difficulties • Future Projects Contents

  13. VLIB is an extensible library of more than 40 software kernels that are optimized for TI's C64+ digital signal processor (DSP) core. • These kernels execute background modeling and subtraction, object feature extraction, tracking, recognition and low-level pixel processing to provide a foundation for video analytics applications development. VLIB

  14. TI has also provided developers with a bit-exact version of the library for testing and debugging in PC (Windows) environment. • VLIB’s version used in this project is an unofficial release compiled with C66x support obtained from TI Video Surveillance team (VLIB’s developers). VLIB

  15. KeystoneArchitecture • SYS/BIOS • VLIB • Tracking Algorithm • Tracking System • Performance Analysis • Encountered Difficulties • Future Projects Contents

  16. 1 2 Tracking Algorithm

  17. Unlike moving objects, the background of the image doesn’t change. • However, there are still some small variations along time due to luminosity change, camera noise, trees, etc. • Hence, by studying the variation along time of each pixel, we can deduce whether it belongs to a moving object or to the background. • API: VLIB_subtractBackgroundS16 Statistical Background Subtraction

  18. Groups foreground pixels that have other foreground pixels as 8-connected neighbors, and labels discrete groupings as components. • Once accomplished, component properties can be measured and used to extract foreground information. These properties include bounding box, centroid and area. • API: VLIB_createConnectedComponentsList Connected Components Labeling Binary Foreground Image Connected Components Labeling

  19. Tracker association is done by matching each component to the closest tracker for previous image. • If no existing tracker is close enough, a new tracker is associated with the component. • After all components are associated, any left trackers are discarded. Tracking

  20. KeystoneArchitecture • SYS/BIOS • VLIB • Tracking Algorithm • TrackingSystem • Performance Analysis • Encountered Difficulties • Future Projects Contents

  21. The developed tracking system demonstrates multicore programming on the DSP using it’s powerful features: • Network coprocessor. • EDMA engine. • Multicore Navigator. • Synchronization modules. • Event-driven operations. Tracking System

  22. 1* * (1) and (4) where implemented using openCV2.3 with 2 separate threads. 2 Tracking SystemGeneral flow 3 4*

  23. From PC Image DDR3 Ethernet Controller Packet DMA To PC EDMA3 Packet DMA Shared Memory Double Buffer Trackers’ data Tracking SystemDSP data flow Cache Controller L1 Cache Cache Controller Processing

  24. CORE 0 Notify that foreground image is ready CORE 1 Processing Message Processing DDR3 SHARED MEMORY Packet DMA EDMA3 (to Ethernet) Tracking SystemShared Memory Queue Background Model List of Trackers Image at T Image at T-1 Binary foreground Images Var Mean

  25. CORE 0 Each block is event driven. That is, it wakes up only when a specific event happens. EDMA Interrupt Statistical Background Subtraction Tracking SystemPipeline Image Processing Semaphore Sync. CORE 1 Multicore Message Connected Component and Tracking Multicore Messaging Service

  26. KeystoneArchitecture • SYS/BIOS • VLIB • Tracking Algorithm • Tracking System • Performance Analysis • Encountered Difficulties • Future Projects Contents

  27. Ethernet: PCDSP10MB/s > 34FPS DDR3 Ethernet Controller Packet DMA DDR3 memory throughput: 10666MB/s via EDMA3. To PC Packet DMA Shared Memory Double Buffer Performance Analysis SRAM (local\shared) memory throughput: 16000MB/s, direct access. Processing L1 Cache Cache Controller

  28. In conclusion, The system can process frame size of 120x160 or 240x320 at up to 30FPS. • Shared SRAM size: L2 double buffering requires 2Byte/pixel. Gaussian model (for background subtraction) requires 4Bytes/pixel. Largest frame size possible: 240x320. • Webcam: Up to 30FPS. Frame size 120x160, 240x320 or 480x640. • By processing only a part of the image at a time, the size of the double buffer can be significantly reduced allowing larger frame size. Performance Analysis

  29. SHARED MEMORY Queue Background Model List of Connected Components Image at T (Char) Image at T-1 (Char) Binary foreground Images Mean (Short) Var (Short) Implementation of Connected Component algorithm will be more complicated Memory can be significantly reduced by processing a part of the frame at a time Performance AnalysisReduced Memory Analysis

  30. VLIB is optimized for TI C64+ DSP. • As a part of the performance analysis, VLIB’s performance on the C66 core and on the C64+ core was compared: VLIB’s Performance *Since Connected Components requires a lot of memory, the image was located in L1 but the Connected Components buffer was located in L2 memory.

  31. KeystoneArchitecture • SYS/BIOS • VLIB • Tracking Algorithm • Tracking System • Performance Analysis • Encountered Difficulties • Future Projects Contents

  32. Documentation: • Incomplete platform’s documents. • Lack of documentation for the MCSDK examples. • Had to learn by Trail-and-Error method. • Posted questions on TI’s E2E forums. Encountered Difficulties

  33. Software Bugs: • Software bugs in the development environment. • Software bugs in the MCSDK examples. • Repeatedly updated software versions. • Some bugs are still unsolved. • Unable to receive large UDP transmits. Encountered Difficulties

  34. TI’s E2E forums were highly effective in solving problems. • The posted questions were answered by TI’s employees almost immediately. TI Support Forums

  35. KeystoneArchitecture • SYS/BIOS • VLIB • Tracking Algorithm • Tracking System • Performance Analysis • Encountered Difficulties • Future Projects Contents

  36. Tracking system can be enhanced to support larger frame size. • Motion estimation (e.g. Kalman filter) can be added for better tracking capabilities. Possible Improvements

  37. The final report was written as a user’s guide to the DSP, the development environment and the tracking system. • The program on the DSP’s side is highly modular. Can be easily adapted for any type of multi core pipeline processing. The Project as a Framework

  38. Thank You for Listening

More Related