an fpga co processor for statistical pattern recognition applications l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
An FPGA Co-Processor for Statistical Pattern Recognition Applications PowerPoint Presentation
Download Presentation
An FPGA Co-Processor for Statistical Pattern Recognition Applications

Loading in 2 Seconds...

play fullscreen
1 / 46

An FPGA Co-Processor for Statistical Pattern Recognition Applications - PowerPoint PPT Presentation


  • 121 Views
  • Uploaded on

An FPGA Co-Processor for Statistical Pattern Recognition Applications. Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory FAMU-FSU College of Engineering Department of Electrical and Computer Engineering. Project Goal.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'An FPGA Co-Processor for Statistical Pattern Recognition Applications' - marrim


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
an fpga co processor for statistical pattern recognition applications

An FPGA Co-Processor for Statistical Pattern Recognition Applications

Jason Isaacs and Simon Y. Foo

Machine Intelligence Laboratory

FAMU-FSU College of Engineering

Department of Electrical and Computer Engineering

project goal
Project Goal
  • To develop and implement a real-time image content analysis system using an FPGA Co-processor.

Isaacs 248

outline
Outline
  • Pattern Recognition
  • Image Database
  • System Layout
  • Image Content Analysis
  • Hardware Implementation
  • Conclusions
  • Future Work

Isaacs 248

pattern recognition overview
Pattern Recognition Overview
  • Pattern Recognition: “the act of taking raw data and taking an action based on the category of the pattern.”
  • Common Applications: speech recognition, fingerprint identification (biometrics), DNA sequence identification
  • Related Terminology:
    • Machine Learning: The ability of a machine to improve its performance based on previous results.
    • Machine Understanding: acting on the intentions of the user generating the data.
  • Related Fields: artificial intelligence, signal processing and discipline-specific research (e.g., target recognition, speech recognition, natural language processing).

Isaacs 248

design flow

Start

Collect Data

Choose Features

Choose Model

Train Classifier

Evaluate Classifier

End

Design Flow
  • Key issues:
  • “There is no data like more data.”
  • Perceptually-meaningful features?
  • How do we find the best model?
  • How do we estimate parameters?
  • How do we evaluate performance?

Isaacs 248

common misconceptions
Common Misconceptions
  • I got 100% accuracy on...
  • Almost any algorithm works some of the time, but few real-world problems have ever been completely solved.
  • Training on the evaluation data is forbidden.
  • Once you use evaluation data, you should discard it.
  • My algorithm is better because...
  • Statistical significance and experimental design play a big role in determining the validity of a result.
  • There is always some probability a random choice of an algorithm will produce a better result.

Isaacs 248

system layout
System Layout

View Source

<…jpg>

URL

Gigabit Ethernet

Spider

Dual P4 - XP

Analyze and Classify

32/64 bit PCI

Store Original Image and Class Vector

Isaacs 248

slide8

Classification System

Spider (Webbot)

Text Content

Classifier

Image

Classifier

Image

Text Search

Hyperlinks

HTML

Download

Video

Classifier

Video

Audio

Audio

Classifier

WEB

URL

URL List

URL Feature Vector

Current research focused on RED path

Isaacs 248

image database web mining for images
Image Database: Web-Mining for Images
  • Images are an important class of data.
  • The Web is presently regarded as the largest global multimedia data repository, encompassing different types of images in addition to other multimedia data types.
  • To search the web for images, a crawler(also called a spider, mobile agent, or bot) is utilized.
  • src="home_page/images/rover_spin.jpg" alt="&quot;
  • width="124" height="70"></a><a
  • href="images/home_page/pgt_in_use.jpg"><img src="images/home_page/pgt_in_use_small.jpg"
  • The agent searches HTML documents for strings of type jpg, gif, and tif, stores the image and url.

Isaacs 248

web mining example software process
[root@Nebula getURL]# ./getImages

Enter URL: eng.fsu.edu

./getURL http://www.eng.fsu.edu > out.txt

images/index_01.jpg

images/index_02_new_2.jpg

images/index_03.jpg

images/index_04.jpg

images/index_05.jpg

images/index_06.jpg

images/index_07.jpg

images/index_08_new.jpg

images/index_01.jpg length: 19

./getURL http://www.eng.fsu.edu/images/index_01.jpg > images/engA.jpg

images/index_02_new_2.jpg length: 25

./getURL http://www.eng.fsu.edu/images/index_02_new_2.jpg > images/engB.jpg

images/index_03.jpg length: 19

./getURL http://www.eng.fsu.edu/images/index_03.jpg > images/engC.jpg

images/index_04.jpg length: 19

./getURL http://www.eng.fsu.edu/images/index_04.jpg > images/engD.jpg

images/index_05.jpg length: 19

./getURL http://www.eng.fsu.edu/images/index_05.jpg > images/engE.jpg

images/index_06.jpg length: 19

./getURL http://www.eng.fsu.edu/images/index_06.jpg > images/engF.jpg

images/index_07.jpg length: 19

./getURL http://www.eng.fsu.edu/images/index_07.jpg > images/engG.jpg

images/index_08_new.jpg length: 23

./getURL http://www.eng.fsu.edu/images/index_08_new.jpg > images/engH.jpg

Web Mining Example: Software Process

Isaacs 248

web mining example images
Web Mining Example Images
  • Example results from our “getImages” software are shown to the right
  • These are from the news.bbc.co.uk website (more interesting than the ones from our engineering site)
  • Can prove useful when looking for faces or particular objects, such as the space shuttle
  • We are able to search either a particular group of sites, randomly search all known sites (not limited to US or Western Europe) , or search all pages within a certain domain, say nytimes.com

Isaacs 248

example image objects
Example Image Objects
  • These are sample objects that could be the target objects of a specific search. These particular objects are from the COIL database.
  • They are used to train the analysis system

Isaacs 248

image analysis
Image Analysis

Implementation Model for Image Recognition

SIGNAL

PREPROCESING

X*

FEATURE

EXTRACTION

Y

PATTERN

RECOGNITION

W*

MATCHED

VECTOR

W

Q

Stored Patterns

Observed input, RGB image X

Recognized

Image

Feature Extraction is the process of determining a vector Y

that represents an observed input X that enables accurate implementation of pattern recognition schemes. For this process, a mapping takes place such that X* is mapped to a vector Y.

Isaacs 248

5x5 scaled spatial filters used for feature extraction
5x5 Scaled Spatial FiltersUsed for Feature Extraction

% Gabor Filter 1

gabor1 = [-16 -19 -20 -19 -16;...

-36 -43 -46 -43 -36;...

0 0 0 0 0;...

36 43 46 43 36;...

16 19 20 19 16];

gaborDiv = 1/1000;

mask = zeros(5,5,1);

mask(:,:,1) = gabor1;

maskDiv = [gaborDiv];

Isaacs 248

wavelet review

y

where

(

t

)

is

the

"

mother

"

wavelet

ò

=

×

y

-

W

(

a

,

b

)

f

(

t

)

(

)

dt

t

b

a

Scale1

Scale2

Wavelet Review

Wavelet Transform:

The Wavelet Transform has variable window lengths that allow it greater flexibility when analyzing signals. Therefore, it becomes an attractive tool for signal analysis.

Isaacs 248

wavelet review16

Given a basis function :

The dilation operation is indicated by :

Then, a mother wavelet is defined by :

Wavelet Review

Isaacs 248

fir coefficients for daubechies 7
FIR Coefficients for Daubechies “7”

g(n) : high pass filter

h(n) : low pass filter

Isaacs 248

fir implementation
FIR Implementation
  • Approximation is down-sampled and input to next level.
  • Detail is stored as coefficients.

Isaacs 248

the spectral histogram representation
The Spectral Histogram Representation
  • Properties
    • A spectral histogram is translation invariant.
    • A spectral histogram is a nonlinear operator.
    • With sufficient filters, a spectral histogram can uniquely represent any image up to a translation.
    • All the images sharing a spectral histogram define an equivalence class.
  • Preprocessing step in classification
    • Choose N image filter kernels to convolve with the image.
    • Perform the convolutions, generating n resultant responses.
    • For each response, generate a response image histogram.
    • Concatenate each of the histograms and send to the classifier.

Isaacs 248

the spectral histogram representation20
The Spectral Histogram Representation
  • 1st step – choose N image filter kernels to convolve with the image.
    • Filter kernels chosen carefully from several image filter banks including intensity: δ(x,y), differencing or gradient filters, laplacian of gaussian filters:
    • Where t determines the scale of the filter, and finally the gabor filter defined by sine and cosine components:
  • 2nd step – perform the convolutions, generating n resultant responses.
    • To calculate each response pixel value, roughly m x n multiplies and adds must be performed, where m x n is the dimension of chosen kernel. Here m = n.
    • Thus for an M x N image a total of [k*M*N*(n)4]multiplies and adds must be performed, where subscript k implies the kth filter.

Isaacs 248

feature vector
Feature Vector
  • Our feature vector is comprised of the spectral histograms of the images resulting from filtering
  • The feature vector is laid out as follows

Gabor Features | Haar Features | LoG Features| Wavelet Features

Isaacs 248

pattern recognition neural decision tree
Pattern Recognition:Neural Decision Tree
  • After the feature vectors have been created they are sent back to the host PC and tested against a Neural Decision Tree to determine the presence of selected objects or textures, e.g. faces, cars, or brick.

Isaacs 248

artificial neural network model

input

output

hidden

x0

S0

Y0

.

.

.

.

.

.

.

.

.

Feature Vector

Number of Branches at Node n

Yk

x80

S7

k

i

j

FeedforwardNeural Network Model

Artificial Neural Network Model
  • Each node in the tree is comprised of an artificial neural network that is trained to separate the input into k classes. As the tree is traversed the leaf nodes represent objects or textures of interest.

Isaacs 248

other pattern recognition techniques
Other Pattern Recognition Techniques
  • Density Estimation
  • Histogram Approach
  • Parzen-window method
  • Kn-Nearest-Neighbor Estimation
  • Principal Components Analysis
  • Fisher Linear Discriminant
  • MDA

Our future work aims at creating a library of generic modules implementing all of these discrimination techniques. These methods were supposed to have been completed prior to this submission but have been delayed.

Isaacs 248

summary of these techniques
Summary of These Techniques
  • Kn-Nearest-Neighbor Estimation
    • To estimate p(x) from n training samples, we center a cell about x and let it grow until it captures kn samples, where kn is some specified function of n.
    • These samples are the kn nearest-neighbors of x.
    • If the density is high near x, the cell will be relatively small
    • Therefore, good resolution.
  • Component Analysis and Discriminants
    • How to reduce excessive dimensionality? Answer: Combine features.
    • Linear methods project high-dimensional data onto lower dimensional space.
    • Principal Components Analysis (PCA) - seeks the projection which best represents the data in a least-square sense.
    • Fisher Linear Discriminant - seeks the projection that best separates the data in a least-square sense

Isaacs 248

summary of these techniques continued
Summary of These Techniques Continued
  • Generalized Linear Discriminant Functions
    • The linear discriminant function g(x) can be written as
    • By adding d(d+1)/2 additional terms involving the products of pairs of components of x, we obtain the quadratic discriminant function
    • The separating surface defined by g(x)=0 is a second-degree or hyperquadric surface.
    • By continuing to add terms such as we can obtain the class of polynomial discriminant functions.

Isaacs 248

so why move to hardware
So, Why Move to Hardware?
  • Speed of classification is limited in software and with such a large database (Web), the faster the better.
  • For example, given a 128x128 8-bit gray scale image, the number of computations required to generate the spectral histogram for 10 5x5 filters is roughly 410k multiplies and 410k adds.
  • This is the main computational bottleneck.
  • A general purpose -processor can only perform one or two multiply/adds simultaneously (depending on the processor)
  • Some FPGAs allow for up to 88 simultaneous multiply operations and many adds to be performed in one or two clock cycles.
  • The filtering algorithm is inherently parallelizable, therefore well suited for a pipelined hardware implementation.

Isaacs 248

target hardware avnet s virtex ii pro board
Target Hardware:Avnet’s Virtex II Pro Board
  • Uses Virtex II Pro XC2VP20
  • Many Options for I/O.
  • 32 Bit PCI Bus has Data Throughput of Over 100 MB per Second.

Isaacs 248

hardware vs software tradeoffs
Hardware vs. Software Tradeoffs
  • Not all tasks have such a drastic speedup in hardware.
    • Memory Accesses
      • Only one address per clock cycle can be read in SDRAM, Flash, or SRAM.
      • We require more than 32-bits per action, so we waist time reading data.
      • Possible to store more data in BRAM to create an initial data stack that would overcome future read times.
  • Combine hardware and software for optimal ease of design and speed of execution.
    • Need to determine optimal compromise.

Isaacs 248

11x11 filter model top level
11x11 Filter Model Top Level

This 4 11x11 Filter bank design was the first test design. We felt that an 11x11 kernel would allow for the best representation of our Filter bank set.

Isaacs 248

filter model filter mac system
Filter Model: Filter MAC System

An addressable shift register (ASR) implements the input delay buffer. The address port runs n times faster than the data port, where n is the number of filter taps. The filter coefficients are stored in a ROM configured to use block memory.

A down sampler reduces the capture register sample period to the output sample period. The block is configured with latency to obtain the most efficient hardware implementation. The down sampling rate is equal to the coefficient array length.

A comparator generates the reset and enable pulse for the accumulator and capture register. The pulse is asserted when the address is 0 and is delayed to account for pipeline stages.

Isaacs 248

device utilization summary four 11x11 image filters
Device Utilization Summary:Four 11x11 Image Filters
  • Selected Device : 2vp20ff896-6
  • Number of Slices: 7913 out of 9280 85%
  • Number of Slice Flip Flops: 10644 out of 18560 57%
  • Number of 4 input LUTs: 8770 out of 18560 47%
  • Number of bonded IOBs: 67 out of 556 12%
  • Number of GCLKs: 1 out of 16 6%
  • =============================================
  • TIMING REPORT
  • Clock Information:
  • -----------------------------------+------------------------+-------+
  • Clock Signal | Clock buffer(FF name) | Load |
  • -----------------------------------+------------------------+-------+
  • clk | BUFGP | 15322 |
  • -----------------------------------+------------------------+-------+
  • Timing Summary:
  • ---------------
  • Speed Grade: -6
  • Minimum period: 4.542ns (Maximum Frequency: 220.192MHz)
  • Minimum input arrival time before clock: 3.006ns
  • Maximum output required time after clock: 3.615ns
  • Maximum combinational path delay: No path found

The4 11x11 Filter bank design device utilization left little room for other logic our target device. Since, we felt that an 11x11 kernel would allow for the best representation of our Filter bank set we decided to target additional devices to leave our options open.

Isaacs 248

device utilization summary six 11x11 image filters with new target
Device Utilization Summary:Six 11x11 Image Filters with New Target
  • Selected Device : 4vsx55ff1148-11
  • Number of Slices: 9543 out of 24576 38%
  • Number of Slice Flip Flops: 11616 out of 49152 23%
  • Number of 4 input LUTs: 9816 out of 49152 19%
  • Number of bonded IOBs: 99 out of 642 15%
  • Number of GCLKs: 1 out of 32 3%
  • Number of DSP48s: 66 out of 512 12%
  • ==============================================
  • TIMING REPORT
  • Clock Information:
  • -----------------------------------+------------------------+-------+
  • Clock Signal | Clock buffer(FF name) | Load |
  • -----------------------------------+------------------------+-------+
  • clk | BUFGP | 18732 |
  • -----------------------------------+------------------------+-------+
  • Timing Summary:
  • ---------------
  • Speed Grade: -11
  •   Minimum period: 6.632ns (Maximum Frequency: 150.790MHz)
  • Minimum input arrival time before clock: 3.217ns
  • Maximum output required time after clock: 3.546ns
  • Maximum combinational path delay: No path found

This 6 11x11 Filter bank design device utilization left more room for other logic our new target device. However, we did not possess this device and therefore had to consider our in house options. Thus, we moved toward a more V2P20 friendly design.

Isaacs 248

device utilization summary 5x5 with 10 histograms
Device Utilization Summary: 5x5 with 10 Histograms
  • Selected Device : 2vp20ff896-6
  • Number of Slices: 8775 out of 9280 94%
  • Number of Slice Flip Flops: 10768 out of 18560 58%
  • Number of 4 input LUTs: 10274 out of 18560 55%
  • Number of bonded IOBs: 343 out of 556 61%
  • Number of MULT18X18s: 50 out of 88 56%
  • Number of GCLKs: 1 out of 16 6%
  • ===============================================
  • TIMING REPORT
  • Clock Information:
  • -----------------------------------+------------------------+-------+
  • Clock Signal | Clock buffer(FF name) | Load |
  • -----------------------------------+------------------------+-------+
  • clk | BUFGP | 16755 |
  • -----------------------------------+------------------------+-------+
  • Timing Summary:
  • ---------------
  • Speed Grade: -6
  • Minimum period: 4.758ns (Maximum Frequency: 210.172MHz)
  • Minimum input arrival time before clock: 2.987ns
  • Maximum output required time after clock: 6.322ns
  • Maximum combinational path delay: No path found

Note that a pipelined implementation without explicit use of the embedded multipliers exceeds the number of slices at 108%.

Isaacs 248

mcode block for histogram bin sorter
Mcode Block for Histogram Bin-Sorter
  • function [bin10,bin9,bin8,bin7,bin6,bin5,bin4,bin3,bin2,bin1] = xhist(input1)
  • bin10 = 0;bin9 = 0;bin8 = 0;bin7 = 0;bin6 = 0;
  • bin5 = 0;bin4 = 0;bin3 = 0;bin2 = 0;bin1 = 0;
  • if input1 >= 224; bin10 = 1;
  • elseif input1 >=180; bin9 = 1;
  • elseif input1 >=158; bin8 = 1;
  • elseif input1 >=136; bin7 = 1;
  • elseif input1 >=114; bin6 = 1;
  • elseif input1 >=92; bin5 = 1;
  • elseif input1 >=70; bin4 = 1;
  • elseif input1 >=48; bin3 = 1;
  • elseif input1 >=26; bin2 = 1;
  • else bin1 = 1; end;

Isaacs 248

modelsim waveform snapshot
ModelSim Waveform Snapshot

Histogram results for Gabor Filter 2 with Bin Ranges shown on the previous slide. Also, note that there is a 16 clock cycle delay before the bin sort result is posted.

Isaacs 248

conclusions future work
Conclusions/Future Work
  • In addition to the other pattern recognition techniques mentioned above, we intend optimize the PC/FPGA interfacing to create our own low-cost integrated system.
    • Our problems currently reside on the PCI interface design shipped with the Avnet Development Board. We are working hard to resolve this issue, but in the end we may have to consider another board.
  • We also wish to time the results (how many images can we process per second); is it real-time?
  • Possibly move to a board with better interfacing tools, as well as faster interfacing via PCI-X or PCI express, or DMA capabilities.
  • Finally, optimize calculating efficiency of the image analysis algorithm, i.e., consider a multi-stage pipeline with more efficient memory access algorithms.
  • The ultimate goal is to do real time search and recognition utilizing FPGAs as co-processors.

Isaacs 248