Efficient real time multicore image processing on ti c66x final presentation
Download
1 / 38

Efficient Real-Time Multicore Image Processing on TI C66x final presentation - PowerPoint PPT Presentation


  • 75 Views
  • Uploaded on

Efficient Real-Time Multicore Image Processing on TI C66x final presentation. Yaron Doweck Yael Einziger Supervisor: Mike Sumszyk Spring 2011 Semester Project. Learn to use the new TI C6678 multi-core platform and to exploit its abilities and advantages.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Efficient Real-Time Multicore Image Processing on TI C66x final presentation' - robin-payne


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Efficient real time multicore image processing on ti c66x final presentation

Efficient Real-Time Multicore Image Processing on TI C66xfinal presentation

YaronDoweck Yael Einziger

Supervisor: Mike Sumszyk

Spring 2011

Semester Project


Project goal

Project Goal


Contents

  • Keystone Architecture exploit its abilities

  • SYS/BIOS

  • VLIB

  • Tracking Algorithm

  • Tracking System

  • Performance Analysis

  • Encountered Difficulties

  • Future Projects

Contents


Learning the platform

Learning the Platform


Keystone architecture

8 cores learning:

External Memory Controller

  • 3EDMA

  • Controller

Multicore Navigator

KeyStone Architecture

  • Network Coprocessor

Semaphore Module


Keystone architecture1

DDR3: Up to 10666MB/s learning:

Shared Memory: 4 access ports, each up to 16000MB/s

TeraNet

KeyStone Architecture

TeraNet Switch Fabric: Up to 256GB/s


C66 corepac

C66 DSP learning:

Up to 20 GFLPOS @ 1.25GHz

32KB L1D Cache\SRAM

32000MB/s

32KB L1P Cache\SRAM

32000MB/s

C66 CorePac

512KB L2 Cache\SRAM

16000MB/s



Contents1

  • Keystone learning:Architecture

  • SYS/BIOS

  • VLIB

  • Tracking Algorithm

  • Tracking System

  • Performance Analysis

  • Encountered Difficulties

  • Future Projects

Contents


Sys bios

  • SYS/BIOS is an advanced real-time light operating system from Texas Instruments.

  • It is designed for use in embedded applications that need real-time scheduling and synchronization.

  • SYS/BIOS is delivered as a set of pre-compiled packages that provide the modules that make up the OS.

  • Each can module is loaded and configured separately (only the selected modules are loaded making the OS as light as possible).

SYS/BIOS


Sys bios1

Main SYS/BIOS modules used in the project: from Texas Instruments.

  • BIOS – Manages the OS.

  • Task – Creating and managing threads.

  • HWI – Hardware Interrupts.

  • Semaphore - Creating and managing semaphore.

  • IPC – Inter Processor Communication.

  • Timestamp - Provides timestamp service for performance analysis.

SYS/BIOS


Contents2

  • Keystone from Texas Instruments. Architecture

  • SYS/BIOS

  • VLIB

  • Tracking Algorithm

  • Tracking System

  • Performance Analysis

  • Encountered Difficulties

  • Future Projects

Contents


  • VLIB from Texas Instruments. is an extensible library of more than 40 software kernels that are optimized for TI's C64+ digital signal processor (DSP) core.

  • These kernels execute background modeling and subtraction, object feature extraction, tracking, recognition and low-level pixel processing to provide a foundation for video analytics applications development.

VLIB


  • TI from Texas Instruments. has also provided developers with a bit-exact version of the library for testing and debugging in PC (Windows) environment.

  • VLIB’s version used in this project is an unofficial release compiled with C66x support obtained from TI Video Surveillance team (VLIB’s developers).

VLIB


Contents3

  • Keystone from Texas Instruments. Architecture

  • SYS/BIOS

  • VLIB

  • Tracking Algorithm

  • Tracking System

  • Performance Analysis

  • Encountered Difficulties

  • Future Projects

Contents


Tracking algorithm

1 from Texas Instruments.

2

Tracking Algorithm


Statistical background subtraction

  • U from Texas Instruments. nlike moving objects, the background of the image doesn’t change.

  • However, there are still some small variations along time due to luminosity change, camera noise, trees, etc.

  • Hence, by studying the variation along time of each pixel, we can deduce whether it belongs to a moving object or to the background.

  • API: VLIB_subtractBackgroundS16

Statistical Background Subtraction


Connected components labeling

  • G from Texas Instruments. roups foreground pixels that have other foreground pixels as 8-connected neighbors, and labels discrete groupings as components.

  • Once accomplished, component properties can be measured and used to extract foreground information. These properties include bounding box, centroid and area.

  • API: VLIB_createConnectedComponentsList

Connected Components Labeling

Binary Foreground Image

Connected Components Labeling


Tracking

Tracking


Contents4

  • Keystone the closest Architecture

  • SYS/BIOS

  • VLIB

  • Tracking Algorithm

  • TrackingSystem

  • Performance Analysis

  • Encountered Difficulties

  • Future Projects

Contents


Tracking system

Tracking System


Tracking system general flow

1 the closest *

* (1) and (4) where implemented using openCV2.3 with 2 separate threads.

2

Tracking SystemGeneral flow

3

4*


Tracking system dsp data flow

From PC the closest

Image

DDR3

Ethernet Controller

Packet DMA

To PC

EDMA3

Packet DMA

Shared Memory Double Buffer

Trackers’ data

Tracking SystemDSP data flow

Cache Controller

L1 Cache

Cache Controller

Processing


Tracking system shared memory

CORE 0 the closest

Notify that foreground image is ready

CORE 1

Processing

Message

Processing

DDR3

SHARED MEMORY

Packet DMA

EDMA3

(to Ethernet)

Tracking SystemShared Memory

Queue

Background Model

List of Trackers

Image

at T

Image at T-1

Binary foreground Images

Var

Mean


Tracking system pipeline image processing

CORE 0 the closest

Each block is event driven. That is, it wakes up only when a specific event happens.

EDMA

Interrupt

Statistical Background Subtraction

Tracking SystemPipeline Image Processing

Semaphore Sync.

CORE 1

Multicore

Message

Connected Component

and Tracking

Multicore Messaging Service


Contents5

  • Keystone the closest Architecture

  • SYS/BIOS

  • VLIB

  • Tracking Algorithm

  • Tracking System

  • Performance Analysis

  • Encountered Difficulties

  • Future Projects

Contents


Performance analysis

Ethernet the closest : PCDSP10MB/s > 34FPS

DDR3

Ethernet Controller

Packet DMA

DDR3 memory throughput: 10666MB/s via EDMA3.

To PC

Packet DMA

Shared Memory Double Buffer

Performance Analysis

SRAM (local\shared) memory throughput: 16000MB/s, direct access.

Processing

L1 Cache

Cache Controller


Performance analysis1

  • In conclusion, The system can process frame size of 120x160 or 240x320 at up to 30FPS.

    • Shared SRAM size: L2 double buffering requires 2Byte/pixel. Gaussian model (for background subtraction) requires 4Bytes/pixel. Largest frame size possible: 240x320.

    • Webcam: Up to 30FPS. Frame size 120x160, 240x320 or 480x640.

  • By processing only a part of the image at a time, the size of the double buffer can be significantly reduced allowing larger frame size.

Performance Analysis


Performance analysis reduced memory analysis

SHARED MEMORY or 240x320 at up to 30FPS.

Queue

Background Model

List of Connected Components

Image

at T

(Char)

Image at T-1

(Char)

Binary foreground Images

Mean (Short)

Var (Short)

Implementation of Connected Component algorithm will be more complicated

Memory can be significantly reduced by processing a part of the frame at a time

Performance AnalysisReduced Memory Analysis


Vlib s performance

  • VLIB is or 240x320 at up to 30FPS.optimized for TI C64+ DSP.

  • As a part of the performance analysis, VLIB’s performance on the C66 core and on the C64+ core was compared:

VLIB’s Performance

*Since Connected Components requires a lot of memory, the image was located in L1 but the Connected Components buffer was located in L2 memory.


Contents6

  • Keystone or 240x320 at up to 30FPS.Architecture

  • SYS/BIOS

  • VLIB

  • Tracking Algorithm

  • Tracking System

  • Performance Analysis

  • Encountered Difficulties

  • Future Projects

Contents


Encountered difficulties

Documentation: or 240x320 at up to 30FPS.

  • Incomplete platform’s documents.

  • Lack of documentation for the MCSDK examples.

  • Had to learn by Trail-and-Error method.

  • Posted questions on TI’s E2E forums.

Encountered Difficulties


Encountered difficulties1

Software Bugs: or 240x320 at up to 30FPS.

  • Software bugs in the development environment.

  • Software bugs in the MCSDK examples.

  • Repeatedly updated software versions.

  • Some bugs are still unsolved.

    • Unable to receive large UDP transmits.

Encountered Difficulties


Ti support forums

TI Support Forums


Contents7

  • Keystone or 240x320 at up to 30FPS.Architecture

  • SYS/BIOS

  • VLIB

  • Tracking Algorithm

  • Tracking System

  • Performance Analysis

  • Encountered Difficulties

  • Future Projects

Contents


Possible improvements

Possible Improvements


The project as a framework

The Project as a Framework


Thank You for Listening the development environment and the tracking system.


ad