Technion - Israel institute of technology
This presentation is the property of its rightful owner.
Sponsored Links
1 / 34

Sub - Nyquist Sampling DSP & Support Change Detector Final Presentation PowerPoint PPT Presentation


  • 83 Views
  • Uploaded on
  • Presentation posted in: General

Technion - Israel institute of technology department of Electrical Engineering . High speed digital systems laboratory. Sub - Nyquist Sampling DSP & Support Change Detector Final Presentation. Performed by: Omer Kiselov Daniel Primor. : Supervised by

Download Presentation

Sub - Nyquist Sampling DSP & Support Change Detector Final Presentation

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Sub nyquist sampling dsp support change detector final presentation

Technion - Israel institute of technology

department of Electrical Engineering

High speed digital systems laboratory

Sub -Nyquist Sampling DSP & Support Change DetectorFinal Presentation

Performed by:

OmerKiselov Daniel Primor

:Supervised by

Moshe Mishali Inna Rivkin


The whole system overview

Technion - Israel institute of technology

department of Electrical Engineering

High speed digital systems laboratory

The Whole System Overview

CTF

(Support recovery)

Analog

Back-end

(Realtime)

DSP

(Baseband)

Expand

1:q

Memory

Detector


Sub nyquist sampling dsp support change detector final presentation

Technion - Israel institute of technology

department of Electrical Engineering

High speed digital systems laboratory

The Main Objective

SUPPORT GENERATION

DSP

(Baseband)

FIFO FOR

DELAY


The block interface

Technion - Israel institute of technology

department of Electrical Engineering

High speed digital systems laboratory

The Block Interface

DSP & SUPPORT CHANGE DETECTOR

A matrix vector 432 bits

Reconstructed samples

432 bits

Support Anlysis vector

101 bits

Valid samples 1 bit

Support Changed

1 bit

First Beta

(For QR decomposition)

36 bits

A Matrix Address 9 bits

Samples Bundle 432 bits

Valid Supports 1 bit


The complex numbers

The Complex Numbers

  • To avoid all complex multiplications we changed the structures of the matrix.

  • The matrix is 4 times bigger. For every complex vector multiplication we can still multiply 1 vector with another vector the ordinary way, and get the correct results.


Inner block data paths

Technion - Israel institute of technology

department of Electrical Engineering

High speed digital systems laboratory

Inner Block Data-paths

DSP & SUPPORT CHANGE DETECTOR BLOCK

Pseudo inverse

  • The DSP Block contains 3 parallel data paths

  • The DSP is getting the matrix A and the samples bundle y and then solves an equation system to reconstruct the signal from the samples.

Real Time Vector Multiplier

Support Change Detector


Pseudo inverse data path

Technion - Israel institute of technology

department of Electrical Engineering

High speed digital systems laboratory

Pseudo Inverse Data Path

Pseudo inverse

Pseudo inverse

  • The pseudo inverse is the largest block on the FPGA. In Matlab – pseudo inverse of matrix A in simply pinv(A);

  • The options to invert a none square matrix were

    • The Known way

    • To attempt matrix decomposition to get better performance.

Real Time Vector Multiplier

Support Change Detector


Pseudo inverse inner data path

Technion - Israel institute of technology

department of Electrical Engineering

High speed digital systems laboratory

Pseudo Inverse Inner Data Path

Pseudo inverse

QR Decomposition

Inverting an upper triangular matrix

  • We chose the Algorithm which allows better performance

  • The pseudo inverse will be created from:

    • A matrix decomposition

    • Sub matrix inversion

    • Multiplying the sub matrixes

  • The pseudo inverse is the largest block on the FPGA.

  • The options to invert a none square matrix were

    • The Known way

    • To attempt matrix decomposition to get better performance.

Matrix Multiplier


In hardware

In Hardware

QR Decomposition

Matrix Multiplier

Matrix Inversion


The matrix decomposition algorithm

Technion - Israel institute of technology

department of Electrical Engineering

High speed digital systems laboratory

The Matrix Decomposition Algorithm

  • The algorithms we checked for matrix decomposition were:

    • The Cholesky decomposition – has high hardware requirements. Multiplying three matrices and inverting two and transposing is more complicated then the chosen algorithm.

    • Singular Value Decomposition – this algorithm was tossed after we saw that finding the eigenvalues of a none square matrix in VHDL is both time consuming and complicated.

    • The QR Decomposition – to decompose the matrix in to two matrices – one upper triangular and one unitary matrix. This algorithm was chosen due to the fact that unitary matrix doesn’t need inverting and that it makes the calculation much easier to understand. In Matlab it is again a single command : qr(A);

QR Decomposition

Inverting an upper triangular matrix

Matrix Multiplier


The matrix decomposition algorithm1

Technion - Israel institute of technology

department of Electrical Engineering

High speed digital systems laboratory

The Matrix Decomposition Algorithm

  • For the QR decomposition adaptation to hardware we found two algorithms:

    • Using the Gram-Schmidt process – Performing Gram-Schmidt process on the matrix and then rearranging the equation system in a suited way. this is the result of GS process:

    • and eventually we get

    • This algorithm was passed since it returned us to the same situation we came to solve – to invert a none square matrix.

  • Using Householder reflections – this is a transform similar to Gram Schmidt . We take each vector column of the matrix and perform:

  • This method has greater numerical stability than the Gram-Schmidt method. The operations per step in the iteration for a nXm

  • matrix are:

Phase 2

Phase 1

Aux1

Aux2


The qr decomposition algorithm

Technion - Israel institute of technology

department of Electrical Engineering

High speed digital systems laboratory

The QR Decomposition Algorithm

for k = 1:n-1

v = ones(n+1-k,1);

if(k<o)

v(2:n+1-k) = A(k+1:n,k);

end

Qk = eye(n);

Qk(k:n,k:n) = eye(n+1-k) - (2/(v'*v))*(v*v');

Q = Qk*Q;

end

[n,m] = size(A);

for k = 1:min(n-1,m)

v(k:n,1) = aux1(A(k:n,k));

A(k:n,k:m) = aux2(A(k:n,k:m),v(k:n,1));

A(k+1:n,k) = v(k+1:n,1);

end

Phase 2

Phase 1

B=Phase1(Acore);

Qtranse=phase2(B);

Rm=Qtranse*Acore;

Qm=Qtranse';

if (a(1) >= 0) beta = a(1) + norm(a);

else beta = a(1) - norm(a);

end

v(2:n) = 1/beta * v(2:n);

v(1) = 1;

Aux1

Aux2

beta = -2/(v'*v);

w = v'*A

A = A + beta*v*w;


Qr decomposition on fpga

QR decomposition on FPGA

Phase 2

Phase 1

Aux 2

Beta calculation unit

24 Multipliers


The qr decomposition hardware requirements

Technion - Israel institute of technology

department of Electrical Engineering

High speed digital systems laboratory

The QR Decomposition Hardware requirements

  • The QR decomposition unit – QRDEC

  • Resources:

    • 6000 ALUTs

    • 1000 registers

    • 10000 Block memory bits

    • 76 DSP block (18 bit multipliers)

  • During the implementation we transferred the Aux1 unit into the phases and created units for the beta calculation and vector multiplications.

Phase 2+Aux1/2

Phase 1+Aux1/2

ALUTs: 1300

Registers :10

ALUTs: 2500

Registers : 450

Block Memory bits 10000

Beta_calc

ALUTs: 1900

Registers :550

DSP block: 26

24_mults

ALUTs: 850

DSP block: 48

24 mults block + beta calc

Aux2

ALUTs: 1500

Registers :10


Matrix inversion algorithm

Technion - Israel institute of technology

department of Electrical Engineering

High speed digital systems laboratory

Matrix Inversion Algorithm

  • The QR decomposition unit – QRDEC

  • Resources:

    • 6000 ALUTs

    • 1000 registers

    • 10000 Block memory bits

    • 76 DSP block (18 bit multipliers)

  • During the implementation we transferred the Aux1 unit into the phases and created units for the beta calculation and vector multiplications.

  • Matrix inversion is a serious bottle neck which is extremely slow.

  • The alternative ways to invert the matrix were:

  • The Gaussian Elimination (Ordinary way) – to take a matrix and rank it all the way until we reach the identity matrix.

  • Analytic solution (adjoin method) – minor matrix multiplied by adjoin of R.

  • LU Decomposition – to decompose this matrix is a waste of time since it is already triangular so no more decomposition is required.

  • Alternative Analytic methods (the Newman series ,block wise inversion method etc.) – the amount of calculations needed is greater – plus it is still like inverting the triangular matrix the ordinary way.

  • The chosen algorithm is the Gaussian Elimination.

QR Decomposition

Inverting an upper triangular matrix

Matrix Multiplier


Matrix inversion algorithm1

Technion - Israel institute of technology

department of Electrical Engineering

High speed digital systems laboratory

Matrix Inversion Algorithm

  • The unit works on a reusable hardware.

  • There is an inner unit which invert a vector at a time.

  • The external unit inserts the vectors in a loop of the support size.

  • The Matrix we are to inverse has a said before more rows then columns. Thus in order to invert it we can just remove the rows of zeros after the support lines and then invert – making the matrix smaller and saving time.

Matrix Inversion Unit

Matrix Inversion Unit

Vector Inversion Unit


Matrix inversion unit

Technion - Israel institute of technology

department of Electrical Engineering

High speed digital systems laboratory

Matrix Inversion Unit

The Vector inverse runs on a faster clock – this work clock is a speed of 2 or three times the main clock (more if possible.

Since the multipliers only work at the rate of 50 MHz .There is also a division unit which works in 20 MHz frequency at the most.

Resources: 7000 ALUTs

880 registers

26 DSP blocks (18 bit multipliers)

for(m=1:s(2))

for(n=1:(m-1))

for(k=1:(m-1))

Rinv(n,m)=Rinv(n,m)+Rinv(n,k)*R(k,m);

end

end

for(w=1:(m-1))

Rinv(w,m)=-Rinv(w,m)/R(m,m);

end

if(R(m,m)~=0)

Rinv(m,m)=1/R(m,m);

end

end

end

Matrix Inverse:

Unit holds:

14000 memory bits

12500 registers

10000 ALUTs

30 DSP Blocks

Matrix Inversion Unit

Vector Inversion Unit


Matrix decomposition unit

Matrix Decomposition Unit

FIFO for Original R Matrix

Vector Inverter


Matrix multiplier block

Technion - Israel institute of technology

department of Electrical Engineering

High speed digital systems laboratory

Matrix Multiplier Block

Matrix Inversion Unit

Matrix Multiplier

Matrix Multiplier’s Interface

QR Decomposition

Inverting an upper triangular matrix

Vector

Multiplier

Matrix Multiplier


Matrix multiplier block1

Technion - Israel institute of technology

department of Electrical Engineering

High speed digital systems laboratory

Matrix Multiplier Block

Matrix Multiplier’s Interface

a block for deciding what matrix goes where – since the multiplier is being

used by all blocks.

Resources for the whole block:

ALUTs: 60000

Memory bits : 30000

Registers : 11000

380 DSP blocks

Matrix Multiplier

Vector

Multiplier


Interface to the matrix multiplier in hardware

Interface To The Matrix Multiplier in Hardware

RAM

Matrix Multiplier


Sub nyquist sampling dsp support change detector final presentation

Matrix Multiplier

Vector Multiplier


Vector multiplier

Vector Multiplier

DSP

DSP

DSP

DSP

DSP

DSP

DSP

DSP

DSP

DSP

DSP

DSP


Sub nyquist sampling dsp support change detector final presentation

Technion - Israel institute of technology

department of Electrical Engineering

High speed digital systems laboratory

Pseudo Inverse Resources

Pseudo inverse

Pseudo inverse

resources:

ALUTs: 80000

Memory bits : 60000

Registers : 30000

450 DSP blocks

-

<1%

=34%

=50%

Real Time Vector Multiplier

Support Change Detector


Sub nyquist sampling dsp support change detector final presentation

Technion - Israel institute of technology

department of Electrical Engineering

High speed digital systems laboratory

Real Time Multiplier

Pseudo inverse

  • Real Time Vector Multiplier

  • the real time multiplier is identical to the matrix multiplier – it multiply one vector (samples bundle) with the pseudo inverse of A.

  • ALUTs: 50000

  • Memory bits : 10000

  • Registers :1

  • 380 DSP blocks

Real Time Vector Multiplier

-

<1%

<1%

=42%

Support Change Detector


Sub nyquist sampling dsp support change detector final presentation

Technion - Israel institute of technology

department of Electrical Engineering

High speed digital systems laboratory

Support Change Detector

Pseudo inverse

Support Change Detector

Real Time Vector Multiplier

Support Change Detector


Sub nyquist sampling dsp support change detector final presentation

Support Change Detector

  • After Simulations we reached a value which for the most reaches 20% False alarms and no miss prediction with regard to the support changes. We examined this on the sample we were given and found that 0.1 is a nominal amount of energy for a signal which is not noise.

  • The support change detector is a vector multiplier – given one row of the pseudo inversed A matrix and multiply it by the signal to see if any energy there is not noise.

  • Resources: 100 ALUTs

  • 400 registers

  • 26 DSP blocks (18 bit multipliers)


Sub nyquist sampling dsp support change detector final presentation

Technion - Israel institute of technology

department of Electrical Engineering

High speed digital systems laboratory

Full System

TOP!

Total System Requirements:

ALUT’s : 75000

Memory bits 70000

Registers 30000

DSP Blocks 805

Pins : 1000

All hardware requirements given by Quartus during synthesis.

Pseudo inverse

EP3SE260

=60%

0.05%=

15%=

101%=

-

EP3SL110

=87%

0.1%=

34%=

91%=

-

Real Time Vector Multiplier

Support Change Detector


Faults in the design

Faults in the design

  • Under flow & over flow (changed the representation to a different one from the rest of the system – 18 bit 14 mantissa to 18 bit 9 mantissa)

  • A non invertible matrix – R must be invertible.

  • Zero columns in the control vector for SCD

  • Rapid support changes one after the other – compromising delays.

  • Energy remainder for SCD has no possible way to detect noise or signal.

  • More than 11 support vectors. Impossible to handle!

  • If first support are wrong.

  • Access noise – impossible to reconstruct signal.

  • Changes in the complex enhance may cause changes to the matrix’s features.


Simulation

The pseudo inverse module completed the simulation.

The support vectors and A_S were taken from the matlab simulation.

Plus the samples Yn which were multiplied in the matlab with the matrix.

Pseudo inverse takes about 200,000 clock cycles dependent on the amount of supports.

Simulation


Performance

Performance

  • The time it takes to perform pseudo inverse is dependent on the number of support vectors.

  • The maximal possible delay is for 2.5 mega samples – a FIFO at the entrance is needed.

  • The working frequencies are: for RAM management 100MHz, For secondary work clock 50 MHz, and the main clock is still 20 MHz


Future work

Future Work

  • There are still some glitches in the system. Referring mostly to the change of representation and the new RAM blocks which were inserted.

  • Errors management.

  • Handling singular cases.

  • Hardware debugging.

  • Timing simulation

  • Signal tapping

  • Full system integration.


Part b gantt chart

Part B Gantt Chart


Bibliography

Bibliography

  • M. Mishali and Y. C. Eldar, "From Theory to Practice: Sub-Nyquist sampling of Sparse Wideband Analog Signals", arXiv 0902.4291; submitted to IEEE Journal of Selected Topics on Signal Processing, Feb. 2009

  • Golub, Gene H.; Charles F. Van Loan (1996). Matrix computations (3rd ed.). Baltimore: Johns Hopkins. pp. 257–258.

  • An Efficient FPGA Implementation of Scalable Matrix Inversion Core using QR Decomposition, Ali Irturk, ShahnamMirzaei and Ryan Kastner, UCSD Technical Report, CS2009-0938.

  • Implementation of QR Decomposition Algorithms using FPGAs, Ali Irturk, MS Thesis, Department of Electrical and Computer Engineering, University of California, Santa Barbara, June 2007. Advisor: Ryan Kastner.

  • FPGA Implementation of Adaptive Weight Calculation Core Using QRD-RLS Algorithm, Ali Irturk, ShahnamMirzaei and Ryan Kastner, UCSD Technical Report, CS2009-0937.

  • Area & power efficient VLSI architecture for computing pseudo inverse of channel matrix in a MIMO wireless system . Khan, Z.; Arslan, T.; Thompson, J.S.; Erdogan, A.T.;


  • Login