by nikhil suryanarayanan n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Efficient Architectures for Eigen Value Decomposition PowerPoint Presentation
Download Presentation
Efficient Architectures for Eigen Value Decomposition

Loading in 2 Seconds...

play fullscreen
1 / 48

Efficient Architectures for Eigen Value Decomposition - PowerPoint PPT Presentation


  • 348 Views
  • Uploaded on

By NIKHIL SURYANARAYANAN. Efficient Architectures for Eigen Value Decomposition. Outline. Motivation Eigen Value Decomposition & Applications Exact Jacobi Parallel Decomposition using Systolic Array Optimization of Systolic Array Interconnect optimized Systolic Array Conclusion.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Efficient Architectures for Eigen Value Decomposition' - Gabriel


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
outline
Outline
  • Motivation
  • Eigen Value Decomposition & Applications
  • Exact Jacobi
  • Parallel Decomposition using Systolic Array
  • Optimization of Systolic Array
  • Interconnect optimized Systolic Array
  • Conclusion
motivation
Motivation
  • Required in various fields
  • High Performance and Real time applications demands Hardware implementation
  • SDMA Communication
  • Realizing optimal architectures with respect to speed and power for respective applications
eigen value decomposition
Eigen Value Decomposition
  • Angle of Arrival Estimation
  • Face Detection
  • Image Compression
  • Eigen Beam-forming
  • Signal Subspace Estimation
  • PCA
  • MUSIC & ESPRIT
evd methods
EVD Methods
  • Exact Jacobi
  • Systolic Array
  • Approximate Jacobi
  • Algebraic Method (only for 3x3 matrix)
eigen value decomposition evd
Eigen Value Decomposition (EVD)
  • Special case of Singular Value Decomposition (SVD) where the Matrix is Square-Symmetric
  • Consider a Matrix AЄRmxn

SVD: A = UDVT

EVD: A = UDUT

where,

DЄRmxn is diagonal matrix,

UЄRmxn & VЄRmxn are orthogonal

cordic
CORDIC
  • COordinate Rotation DIgital Computer
  • Set of Shift Add Algorithms for computing Sine, Cosine, Arc, Hyperbolic, Coordinate Rotation etc
  • Eliminates complex computations
  • Single Shift-Add Multiplier, ROM/RAM for lookup & Basic Logic gates
  • Hardware friendly
  • Iterative Algorithm
cordic modules
CORDIC Modules

ArcTan Module

Used to compute the tan-1 / angle for constructing the Jacobi Rotation Matrix

Sine/Cosine Module

cos sin

-sin cos

2x2 matrix is constructed using the angle from the ArcTan module

exact jacobi
Exact Jacobi
  • Aims at annihilating the off diagonal elements using a series of orthogonal transformations
  • A(k+1) = JTpq A(k) Jpq,

where A(0)=A

Jpq is called the Jacobi Rotation Defined by the parameter (c s, -s c)

exact jacobi1
Exact Jacobi
  • A=UDVT
  • UTAV=D After n iterations,
  • Ai+1=JiTAiJi
  • Repeating for all possible pairs, A can be effectively diagonalized

1......0......0......0

. . . .

0......c......s......0 p

. . . .

0......-s......c......0 q

. . . .

0......0......0......1

pq

limitations of exact jacobi implementation
Limitations of Exact Jacobi Implementation
  • Jacobi iterations are serial
  • Inability to derive parallelism as iterations have large inter-loop Data Dependency
  • Inability to pipeline
  • Every iteration involves transfer of 4N-4 matrix elements to the processor
  • Even though it is “MATRIX” operation, parallelism cannot be derived
how to parallelize
How to parallelize?
  • Systolic Array
  • Solve 2x2 EVD sub problems
  • For a matrix of size N we have N/2xN/2 EVD sub problems
  • If N=6; possible sets are

{ (1,2), (3,4) }

{ (1,3), (2,4) }

{ (1,4), (2,3) }

  • Parallel Reordering
systolic array for evd
Systolic Array for EVD

PE

PE

PE

PE

PE

PE

PE

PE

PE

structure of pe
Structure of PE

CORDIC

ATAN

CORDIC

ROT

REG

REG

data exchange sequence
Data Exchange Sequence

βin

αin

α

β

PEij

γ

δ

γin

δin

data exchange pe 11
Data Exchange PE11

βin

αin

α

β

γ

δ

γin

δin

data exchange pe 1j
Data Exchange PE1j

βin

αin

α

β

γ

δ

γin

δin

data exchange pe i1
Data Exchange PEi1

βin

αin

γ

δ

α

β

γin

δin

slide20

Data Exchange PEij

βin

αin

γ

δ

β

α

γin

δin

array cycle 12
Array Cycle ∆ = 1

DATA EXCHANGE

1

1

1

1

1

1

1

1

1

1

1

array cycle 13
Array Cycle ∆ = 1

DATA EXCHANGE

1

2

1

1

2

1

1

2

1

1

2

1

1

2

array cycle 14
Array Cycle ∆ = 1

DATA EXCHANGE

1

2

1

2

1

2

2

1

2

2

1

2

1

2

1

array cycle 15
Array Cycle ∆ = 1

DATA EXCHANGE

1

2

1

2

2

1

2

2

1

2

1

array cycle 16
Array Cycle ∆ = 1

DATA EXCHANGE

1

3

2

1

3

2

1

3

2

1

3

2

1

3

array cycle 17
Array Cycle ∆ = 1

DATA EXCHANGE

1

3

2

3

1

3

3

1

3

3

1

3

2

3

1

array cycle 18
Array Cycle ∆ = 1

DATA EXCHANGE

1

3

1

3

3

1

3

3

1

3

1

staggered processing
Staggered Processing?
  • Not realistic to broadcast row and column angles in real time
  • ∆ij is the distance of the processor Pij from the diagonal
  • Also Pij needs data from neighbors Pi+-1,j+-1 (1< i, j < n/2)
  • Can be made faster by allowing off-diagonal PE to allow execution as soon as the diagonal PE produce angles
optimizations
Optimizations

CYCLE 2

1’

1

CYCLE 1

1

1’

1

1

1’

1

1

1’

1

1

1’

Improves the utilization time for each PE from 1/3 rd to 2/3 rd

comparisons matrix 8x8
Comparisons…. Matrix 8x8
  • Iterations for Convergence ≈ 3
  • Additions ≈ 3500
  • Multiplications ≈ 7000
  • Swaps/Exchange ≈ 0
  • Slower
  • Iterations for Convergence ≈ 22-25
  • Additions ≈ 1500 (less than half)
  • Multiplications ≈ 3000
  • Swaps/Exchange = 368
  • Faster

EXACT JACOBI

SYSTOLIC ARRAY

optimized architecture
Optimized Architecture
  • In the final Stages of Analyzing a simpler Systolic Architecture

Matrix size=4x4

Φ1

Φ2

PE

PE

PE

PE

goals
GOALS
  • Achieved:
  • Pipelined Jacobi Architecture
  • S/W Implementation of Systolic Array
  • Simultaneous execution of off diagonal PE to improve timing and reduce idle time
  • Optimized Systolic Array architecture for minimum swaps and angle transmission
references
References
  • Andraka, Ray, “Survey of CORDIC algorithms for FPGA based computers”, ACM 1998
  • A Novel Implementation of CORDIC Algorithm Using Backward Angle Recoding (BAR), Yu Hen Hu & Homer H.M. Chern, IEEE Transactions on Computers, December 1996
  • Parallel Eigen Value Decomposition for Toeplitz and Related Matrices, Yu Hen Hu, IEEE Transactions-1989
  • Kim Y, Kim Y, Doyle James, “A Low Power CMOS CORDIC Processor Design for Wireless Telecommunications”, IEEE 2007
  • Hemkumar N, Masters Thesis, Rice University
  • Yang Liu et al, “Hardware Efficient Architectures for Eigen Value Computation;, EDA 2006
  • ASIC Implementation of Autocorrelation and CORDIC algorithm for OFDM based WLAN, Sudhakar Reddy & Ramchandra Reddy, European Journal of Scientific Research, 2009
  • Advanced Algorithmic Evaluation for Imaging, Communication and Audio Applications – Eigenvalue Decomposition using CATAPULT C Algorithmic Synthesis Methodology
  • Efficient Implementation of SVD on a Reconfigurable System, Christophe Bobda, Klaus Danne and Andre Linarth, Springer-Verlag Berlin Heidelberg 2003
  • Hardware Implementation of Smart Antenna Systems, H. Wang and M. Glesner, Adv in Radio Sciences 2006
  • Spectral Estimation using MUSIC Algorithm, Jawed Qumar, Nios II Embedded Processor Design Contest-2005
  • Hardware Efficient Architectures for Eigen Value Computation, Yang Liu, Christis-Savvas Bouganis, Peter Y.K. Cheung, Philip H.W. Leong, Stephen J. Motley, EDAA 2006
  • A Novel Fast Eigenvalue Decomposition based on Cyclic Jacobi Rotation and its application in eigen-beamforming, Tech Report of IEICE-Japan
  • Efficient Hardware Architectures for Eigenvector and Signal Subspace Estimation, Fan Xu & Alan Wilson, IEEE Transactions on Circuits & Systems-204
  • 16 BIT CORDIC Rotator for High Sped Wireless LAN, Koushik Maharatna, Alfonso Troya, Swapna Banerjee, Eckhard Grass, Milos Krstic, IEEE Transactions-2004
  • Survey of CORDIC Algorithms for FPGA Based computers, Ray Andraka, ACM-1998
  • Smart Antennas for Wireless Communications, Frank B Gross, Mc-Graw Hill,2005 ( Used for Facts & References for Comparison purposes and Specifications of Different wireless standards)
eigen value and eigen vector
Eigen Value and Eigen Vector
  • The non zero vector of any linear transformation when applied to the vector changes the magnitude but not the direction is an Eigen Vector
  • The scalar value associated with this vector is called the Eigen Value
  • Ax=λx
  • A is the transformation, x is the Eigen vector & λ is the corresponding Eigen Value
cordic contd
CORDIC… contd
  • Convergence depends on number of iterations
  • Unrolled for Systolic and Pipeline implementations
  • Iterative architecture unsuitable for FPGA
  • Pipelined preferred as less complex H/W & operates at data rate
  • Registers present on Logic cells in FPGAs support pipelining better
  • Runs at 52 MHz on XC4013E-2 [1]
cordic iteration equations
CORDIC Iteration Equations
  • Given’s rotation transformation

x = x cosΦ – y sinΦ

y = y cosΦ + x sinΦ

The iteration equations are given as

xi+1 = xi – yi . di . 2-i

yi+1 = yi + xi. di. 2-i

zi+1 = zi – di . tan-1(2-i)

cordic algorithms
CORDIC Algorithms

for i=1:n

x1= x – y * d * (2^-(i-1)) ;

y1= y + x * d * (2^-(i-1)) ;

angle = angle – d * (W(i)); // W(i) is the ith ROM reference

if (angle==0)

d=0;

elseif (angle>0)

d=1;

else

d=-1;

end

x=x1;

y=y1;

end

exact jacobi algorithm
Exact Jacobi Algorithm

for j=1:n-1

for i=n:-1:j+1

J = jacobi_rot( A( i, i), A( j, j), A( i, j) );

A( [ i, j], :)=J'*A( [ i, j], : );

A( : , [ i, j])=A( :, [ i, j] )*J;

end

end

REPEAT for n iterations for accuracy

evd contd
EVD …contd
  • Also UTU=I and VTV=I
  • The Diagonal matrix contains Eigen Values along its diagonals
  • U are the left Singular Vectors & V are right singular vectors
  • U = {u1,u2,………,un}
  • V = {v1,v2,………,vn}