1 / 26

Efficient Real-Time Multicore Image Processing on TI C66x midterm presentation

Efficient Real-Time Multicore Image Processing on TI C66x midterm presentation. Yaron Doweck Yael Einziger Supervisor: Mike Sumszyk Spring 2011 Semester Project. Project Goals Development Tools Learning Steps What’s next. Contents.

mahlah
Download Presentation

Efficient Real-Time Multicore Image Processing on TI C66x midterm presentation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Efficient Real-Time Multicore Image Processing on TI C66xmidterm presentation YaronDoweck Yael Einziger Supervisor: Mike Sumszyk Spring 2011 Semester Project

  2. Project Goals • Development Tools • Learning Steps • What’s next Contents

  3. Learn to use the new TI C66 platform and to exploit its abilities and advantages. • Implement a Real-Time computer vision algorithm using multi-core programming. Project Goal

  4. Project Goals • Development Tools • Learning Steps • What’s next Contents

  5. Hardware: TMS320C6678 Multicore Fixed and Floating-Point Digital Signal Processor • Software: Code Composer Studio v5 with BIOS MCSDK 2.0 Development tools

  6. 8 C66x CorePac DSP’s • Based on TI’s Keystone Multicore Architecture • 320 GMAC/160 GFLOP @ 1.25GHz • 32KB L1P, 32KB L1D, 512KB L2 Per Core • 4MB Shared L2 • 64-Bit DDR3 Interface (DDR3-1600) TMS320C6678

  7. Project Goals • Development Tools • Learning Steps • What’s next Contents

  8. CCS Simulator and Profiler • Cache configuration • DMA data transfer • Interrupts • Fixed and Floating point libraries (DSPlib, IMGlib, Vlib,…) • SYS/BIOS • Multi-core programming Learning steps

  9. The CCS V5 can simulate the C6678 processor and some peripherals. • The profiler analyzes execution time and statistics for functions and code lines. Step 1: CCS Simulator and Profiler

  10. Graph viewer – enables to view data from memory in time or frequency domain. • Image Analyzer – enables to view an image stored in memory or file. Supports grayscale, RGB and YUV color formats. Step 1: CCS Simulator and Profiler

  11. 32 KB L1P cache. L1P is read-allocate and direct mapped. • 32 KB L1D cache. L1D is read-allocate, write-back and 2-way set associative. • Each can be configured as 0, 4, 8, 16 or 32 KB cache. • 512KB L2 cache. L2 is read and write allocate and 4-way set associative. • L2 can be configured as 0, 32, 64, 128, 256 or 512 KB cache. • All configurations can be done during run time. Step 2: Cache

  12. Achievements: • Configuring different L1 and L2 cache sizes during or before run time. • Using L1 and L2 as SRAM memory (fully SRAM or part SRAM and part cache). • Controlling variable locations (L1,L2 or DDR3 memories). Step 2: Cache

  13. C66xx Processors has 3 EDMA3 controllers, each with 64 DMA channels + 8 QDMA channels. • EDMA3 supports data transfer to\from cache, shared memory or external memory. • EDMA3 supports the use of hardware interrupts. • In addition, each core has a faster IDMA controller for internal transfers. Step 3: DMA

  14. Achievements: • Using IDMA to transfer data inside a core (L2↔L1). • Using EDMA3 to transfer data to\from L1, L2 and DDR3. Step 3: DMA

  15. The interrupt controller supports up to 128 system events. They consist of both internally-generated events (within the C66x CorePac) and chip-level events. Step 4: Interrupts

  16. The interrupt controller outputs 15 signals to the core from the event inputs: • One maskablehardware exception • 12 maskablehardware interrupts • One non-maskable signal • One reset signal Step 4: Interrupts

  17. Achievements: • Configuring manually triggered events. • Configuring EDMA transfer completion routine using EDMA system event. Step 4: Interrupts

  18. DSPLib – an optimized DSP function library that includes general-purpose signal-processing routines for real-time applications. Step 5: Libraries LPF

  19. IMGLib – an optimized image/video processing function library that includes general-purpose image/video processing routines for real-time applications. Histogram Derivative Step 5: Libraries Edge Detection

  20. Some more libraries • VLib – a collection of computer vision algorithms that are optimized for TI DSPs. • IQMath – a collection of highly optimized fixed point arithmetic, trigonometric and mathematical functions. typically used in real-time applications. • fastMath – optimized arithmetic and trigonometric functions for floating point devices. Step 5: Libraries

  21. Achievements: • Using DSPLib for a simple signal-processing application with floating point arrays. • Using IMGLib for a simple image-processing application. Still left: • Studying VLib, IQMath and fast Math Libraries. • Compare actual running time to the running time specified in the User Guide. Step 5: Libraries

  22. SYS/BIOS is a real time operating system designed to be used by applications that require real-time scheduling and synchronization. • SYS/BIOS provides preemptive multi-threading, hardware abstraction, real-time analysis, and configuration tools. • SYS/BIOS is designed to minimize memory and CPU requirements on the target. Step 6: SYS/BIOS

  23. Achievements: • Using SYS/BIOS modules to configure DSP’s memory (cache sizes, memory sections, heap and stack size). • Running a multi-threaded program with shared variables protection. Still left: • Using SYS/BIOS modules to configure DSP peripherals (LAN, SRIO, PCIe). Step 6: SYS/BIOS

  24. CCS Simulator and Profiler - done • Cache configuration - done • DMA data transfer - done • Interrupts - done • Fixed and Floating point libraries (DSPlib, IMGlib, Vlib,…) – In Progress • SYS/BIOS – In Progress • Multi-core programming Learning steps

  25. Project Goals • Development Tools • Learning Steps • What’s next Contents

  26. Implementation of a bidirectional data flow between DDRIII and L1, possibly through L2. (3 weeks) • Performance analysis (throughput, latency and accuracy) when using floating point versus fixed point libraries. (2 weeks) • Usage of hardware semaphores for parallel data access and Multicore Navigator for enabling messages communication between different cores. (4 weeks)  What’s next

More Related