efficient real time multicore image processing on ti c66x final presentation
Download
Skip this Video
Download Presentation
Efficient Real-Time Multicore Image Processing on TI C66x final presentation

Loading in 2 Seconds...

play fullscreen
1 / 38

Efficient Real-Time Multicore Image Processing on TI C66x final presentation - PowerPoint PPT Presentation


  • 75 Views
  • Uploaded on

Efficient Real-Time Multicore Image Processing on TI C66x final presentation. Yaron Doweck Yael Einziger Supervisor: Mike Sumszyk Spring 2011 Semester Project. Learn to use the new TI C6678 multi-core platform and to exploit its abilities and advantages.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Efficient Real-Time Multicore Image Processing on TI C66x final presentation' - robin-payne


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
efficient real time multicore image processing on ti c66x final presentation

Efficient Real-Time Multicore Image Processing on TI C66xfinal presentation

YaronDoweck Yael Einziger

Supervisor: Mike Sumszyk

Spring 2011

Semester Project

project goal
Learn to use the new TI C6678 multi-core platform and to exploit its abilities and advantages.
  • Implement a real-time tracking algorithm using multi-core programming and VLIB.
  • Create a framework for multi-core, Ethernet video streaming and DSP-FPGA communication.
Project Goal
contents
Keystone Architecture
  • SYS/BIOS
  • VLIB
  • Tracking Algorithm
  • Tracking System
  • Performance Analysis
  • Encountered Difficulties
  • Future Projects
Contents
learning the platform
In the first part of the project, the main goal was learning:
    • The C6678 platform
    • TI development enviroments
    • The Multi Core SDK
    • The SYS\BIOS Real-Time OS
Learning the Platform
keystone architecture
8 cores

External Memory Controller

  • 3EDMA
  • Controller

Multicore Navigator

KeyStone Architecture
  • Network Coprocessor

Semaphore Module

keystone architecture1
DDR3: Up to 10666MB/s

Shared Memory: 4 access ports, each up to 16000MB/s

TeraNet

KeyStone Architecture

TeraNet Switch Fabric: Up to 256GB/s

c66 corepac
C66 DSP

Up to 20 GFLPOS @ 1.25GHz

32KB L1D Cache\SRAM

32000MB/s

32KB L1P Cache\SRAM

32000MB/s

C66 CorePac

512KB L2 Cache\SRAM

16000MB/s

contents1
KeystoneArchitecture
  • SYS/BIOS
  • VLIB
  • Tracking Algorithm
  • Tracking System
  • Performance Analysis
  • Encountered Difficulties
  • Future Projects
Contents
sys bios
SYS/BIOS is an advanced real-time light operating system from Texas Instruments.
  • It is designed for use in embedded applications that need real-time scheduling and synchronization.
  • SYS/BIOS is delivered as a set of pre-compiled packages that provide the modules that make up the OS.
  • Each can module is loaded and configured separately (only the selected modules are loaded making the OS as light as possible).
SYS/BIOS
sys bios1
Main SYS/BIOS modules used in the project:
  • BIOS – Manages the OS.
  • Task – Creating and managing threads.
  • HWI – Hardware Interrupts.
  • Semaphore - Creating and managing semaphore.
  • IPC – Inter Processor Communication.
  • Timestamp - Provides timestamp service for performance analysis.
SYS/BIOS
contents2
KeystoneArchitecture
  • SYS/BIOS
  • VLIB
  • Tracking Algorithm
  • Tracking System
  • Performance Analysis
  • Encountered Difficulties
  • Future Projects
Contents
slide13
VLIB is an extensible library of more than 40 software kernels that are optimized for TI's C64+ digital signal processor (DSP) core.
  • These kernels execute background modeling and subtraction, object feature extraction, tracking, recognition and low-level pixel processing to provide a foundation for video analytics applications development.
VLIB
slide14
TI has also provided developers with a bit-exact version of the library for testing and debugging in PC (Windows) environment.
  • VLIB’s version used in this project is an unofficial release compiled with C66x support obtained from TI Video Surveillance team (VLIB’s developers).
VLIB
contents3
KeystoneArchitecture
  • SYS/BIOS
  • VLIB
  • Tracking Algorithm
  • Tracking System
  • Performance Analysis
  • Encountered Difficulties
  • Future Projects
Contents
statistical background subtraction
Unlike moving objects, the background of the image doesn’t change.
  • However, there are still some small variations along time due to luminosity change, camera noise, trees, etc.
  • Hence, by studying the variation along time of each pixel, we can deduce whether it belongs to a moving object or to the background.
  • API: VLIB_subtractBackgroundS16
Statistical Background Subtraction
connected components labeling
Groups foreground pixels that have other foreground pixels as 8-connected neighbors, and labels discrete groupings as components.
  • Once accomplished, component properties can be measured and used to extract foreground information. These properties include bounding box, centroid and area.
  • API: VLIB_createConnectedComponentsList
Connected Components Labeling

Binary Foreground Image

Connected Components Labeling

tracking
Tracker association is done by matching each component to the closest tracker for previous image.
  • If no existing tracker is close enough, a new tracker is associated with the component.
  • After all components are associated, any left trackers are discarded.
Tracking
contents4
KeystoneArchitecture
  • SYS/BIOS
  • VLIB
  • Tracking Algorithm
  • TrackingSystem
  • Performance Analysis
  • Encountered Difficulties
  • Future Projects
Contents
tracking system
The developed tracking system demonstrates multicore programming on the DSP using it’s powerful features:
      • Network coprocessor.
      • EDMA engine.
      • Multicore Navigator.
      • Synchronization modules.
      • Event-driven operations.
Tracking System
tracking system general flow
1*

* (1) and (4) where implemented using openCV2.3 with 2 separate threads.

2

Tracking SystemGeneral flow

3

4*

tracking system dsp data flow
From PC

Image

DDR3

Ethernet Controller

Packet DMA

To PC

EDMA3

Packet DMA

Shared Memory Double Buffer

Trackers’ data

Tracking SystemDSP data flow

Cache Controller

L1 Cache

Cache Controller

Processing

tracking system shared memory
CORE 0

Notify that foreground image is ready

CORE 1

Processing

Message

Processing

DDR3

SHARED MEMORY

Packet DMA

EDMA3

(to Ethernet)

Tracking SystemShared Memory

Queue

Background Model

List of Trackers

Image

at T

Image at T-1

Binary foreground Images

Var

Mean

tracking system pipeline image processing
CORE 0

Each block is event driven. That is, it wakes up only when a specific event happens.

EDMA

Interrupt

Statistical Background Subtraction

Tracking SystemPipeline Image Processing

Semaphore Sync.

CORE 1

Multicore

Message

Connected Component

and Tracking

Multicore Messaging Service

contents5
KeystoneArchitecture
  • SYS/BIOS
  • VLIB
  • Tracking Algorithm
  • Tracking System
  • Performance Analysis
  • Encountered Difficulties
  • Future Projects
Contents
performance analysis
Ethernet: PCDSP10MB/s > 34FPS

DDR3

Ethernet Controller

Packet DMA

DDR3 memory throughput: 10666MB/s via EDMA3.

To PC

Packet DMA

Shared Memory Double Buffer

Performance Analysis

SRAM (local\shared) memory throughput: 16000MB/s, direct access.

Processing

L1 Cache

Cache Controller

performance analysis1
In conclusion, The system can process frame size of 120x160 or 240x320 at up to 30FPS.
    • Shared SRAM size: L2 double buffering requires 2Byte/pixel. Gaussian model (for background subtraction) requires 4Bytes/pixel. Largest frame size possible: 240x320.
    • Webcam: Up to 30FPS. Frame size 120x160, 240x320 or 480x640.
  • By processing only a part of the image at a time, the size of the double buffer can be significantly reduced allowing larger frame size.
Performance Analysis
performance analysis reduced memory analysis
SHARED MEMORY

Queue

Background Model

List of Connected Components

Image

at T

(Char)

Image at T-1

(Char)

Binary foreground Images

Mean (Short)

Var (Short)

Implementation of Connected Component algorithm will be more complicated

Memory can be significantly reduced by processing a part of the frame at a time

Performance AnalysisReduced Memory Analysis
vlib s performance
VLIB is optimized for TI C64+ DSP.
  • As a part of the performance analysis, VLIB’s performance on the C66 core and on the C64+ core was compared:
VLIB’s Performance

*Since Connected Components requires a lot of memory, the image was located in L1 but the Connected Components buffer was located in L2 memory.

contents6
KeystoneArchitecture
  • SYS/BIOS
  • VLIB
  • Tracking Algorithm
  • Tracking System
  • Performance Analysis
  • Encountered Difficulties
  • Future Projects
Contents
encountered difficulties
Documentation:
  • Incomplete platform’s documents.
  • Lack of documentation for the MCSDK examples.
  • Had to learn by Trail-and-Error method.
  • Posted questions on TI’s E2E forums.
Encountered Difficulties
encountered difficulties1
Software Bugs:
  • Software bugs in the development environment.
  • Software bugs in the MCSDK examples.
  • Repeatedly updated software versions.
  • Some bugs are still unsolved.
        • Unable to receive large UDP transmits.
Encountered Difficulties
ti support forums
TI’s E2E forums were highly effective in solving problems.
  • The posted questions were answered by TI’s employees almost immediately.
TI Support Forums
contents7
KeystoneArchitecture
  • SYS/BIOS
  • VLIB
  • Tracking Algorithm
  • Tracking System
  • Performance Analysis
  • Encountered Difficulties
  • Future Projects
Contents
possible improvements
Tracking system can be enhanced to support larger frame size.
  • Motion estimation (e.g. Kalman filter) can be added for better tracking capabilities.
Possible Improvements
the project as a framework
The final report was written as a user’s guide to the DSP, the development environment and the tracking system.
  • The program on the DSP’s side is highly modular. Can be easily adapted for any type of multi core pipeline processing.
The Project as a Framework
ad