Sub- Nyquist Sampling Continuous to Finite Module Final Presentation – Part B

Sub-Nyquist SamplingContinuous to Finite Module • Final Presentation – Part B Performed by : Yoni Smolin • Supervisors: Inna Rivkin & Moshe Mishali • Winter 2009 – Spring 2010

Agenda • Project overview • Algorithm Background • Implementation • Merging frame and OMP • SBR2 vs. SBR4 • Stopping Condition • Future Work

Project overview • Characterization • Implementation • Verification

Project overview • Characterization • # of samples CTF collects (Nframe) • Frame method • Compressive Sampling Algorithm • Implementation • Verification Nframe= 70

Project overview • Characterization • # of samples CTF collects (Nframe) • Frame method • Compressive sampling algorithm • Implementation • Verification Q V W

Project overview • Characterization • # of samples CTF collects (Nframe) • Frame method • Compressive sampling algorithm • Implementation • Verification OMP

Project overview • Characterization • Implementation • Algorithm adaptation for hardware • Architecture design • VHDL coding • Verification Solution Approximation Residual Update Modified Gram Schmidt Residual Update

Project overview • Characterization • Implementation • Algorithm adaptation for hardware • Architecture design • VHDL coding • Verification MAMU keep max support stop? ADD ------ SUB

Project overview • Characterization • Implementation • Algorithm adaptation for hardware • Architecture design • VHDL coding • Verification

Project overview • Characterization • Implementation • Verification • Functional • Post synthesis • On chip Modulated Wideband Converter Test Bench CTF functional model Input text files Output text files Amemory Matlab (fixed point) = VHDL Exapnd emulation & quantization

Project overview • Characterization • Implementation • Verification • Functional • Post synthesis • On chip Modulated Wideband Converter Test Bench CTF synthesized model Input text files Output text files Amemory total runtime ≤ 10 μSec Exapnd emulation & quantization

Project overview • Characterization • Implementation • Verification • Functional • Post synthesis • On chip FPGA Modulated Wideband Converter CTF shell – debug mode CTF Input text files Output text files Amemory Performance analysis GUI demo coming up soon

Sub-NyquistSystem FIFO • On every clock cycle (20MHz), the digital system solves • How ? • Expand - computes yfrom input samples. • CTF - locates non zero elements of z. • DSP - computes their values. 4 MWC Expand X3 CTF DSP 12 sampling recovery A Support ≡ S zS zS y Sampling matrix AS y sparse z

CTF – SBR4 Algorithm flow • The algorithm is performed in 2 phases: CTF Frame construction Support recovery Support frame matrix

CTF – SBR4 CTF Frame construction Support recovery • The frame matrix - a basis for recovery: Support frame matrix y70H y1H y2H … Q y2 y1 y70

CTF – SBR4 CTF Frame construction Support recovery • Support recovery – applying OMP for Q: Support frame matrix A UQ Q support

OMP – adapted for SBR4 Matching Modified Gram Schmidt Residual Update

Datapath – OMP iteration (SBR4) Matching MAMU keep max support stop ? SUB L = 101 cycles

Datapath – OMP iteration (SBR4) Gram Schmidt MAMU keep max support stop ? SUB best symmetric

Datapath – OMP iteration (SBR4) Update Residual & stopping condition MAMU keep max support stop ? SUB best symmetric

Datapath– frame construction MAMU keep max support stop ? ADD ------ SUB SUB Nframe= 70 cycles

Utilizing OMP’s hardware to calculate the frame • Logic consumption (without architecture shell): 36% “+” 32% 38% • DSP Block consumption: 772 604 • Runtime (SBR2 mode): • Q’s elements word length: 36 bits 18 bits • Support recovery performance: identical Qframe CTF OMP Qframe OMP fits in either 110 or 260 (FPGAs)

SBR4 vs. SBR2 . . . 20 MHz 20 MHz • SBR4: → Q → OMP → A . . . . . . . . . y . . . . . . z

SBR4 vs. SBR2 . . . 2 MHz 2 MHz 2 MHz 2 MHz 2 MHz 2 MHz • SBR2: → Q1 → OMP → → Q2 → OMP → … → Q10 → OMP → A . . . . . . . . . yi . . . . . . zi

Why do we need SBR2? • Under reasonable assumptions, SBR2 can handle twice as many support elements as can SBR4 :

SBR2 – algorithm flow • Task flow: CTF Frame construction Support recovery Merge requires additional hardware implemented in current hardware 1 2 3 10 … 1 2 10 • Frame: • OMP: • Merge: 1 2 9 10

SBR2 – merging concept (suggestion) Merge • On every SBR2 iteration: Sort in descending order of energy ↓ Merge with such that will contain the most energetic index pairs

CTF – runtime analysis • Upper bounds: • How can we accelerate? • Avoid decimation in Expand (in which case, SBR2 will run only 55 μSec). • In SBR2 - send intermediate support approximations to the DSP. • Optimize the OMP for shorter runtime (coming up).

OMP - stopping condition issues • The energy contribution of the i’th OMP iteration: SBR4: SBR2: • The stopping condition is: • Stop when SBR4 SBR2 Detects a decline of the energy Nsis the maximal support size

OMP - stopping condition issues • Difficulties: • Only a narrow domain of threshold’s values trigger the stopping condition on time. • This domain varies depending on the actual amount of support elements.

OMP - stopping condition issues (Original support contains 4 elements)

OMP - stopping condition issues • Difficulties: • Only a narrow domain of threshold’s values trigger the stopping condition on time. • This domain varies depending on the actual amount of support elements. • Possible Solutions: • Approximate / Learn the threshold adaptively. • Ignore the energy condition, always return Nselements.

CTF - parameters & constants • Defined on startup: • Nframe • Ns • threshold • # of SBR2 iterations • A (columns must be normalized) • Defined on synthesis: • m,q • L • Internal word length & fraction length.

Future Work – internal word length • Simulations suggest that the internal word length can be reduced from 18 to 9 bits: • The benefit - a reduction of hardware utilization: • Possible drawback: performance may decline as # of channels grows.

Future Work – scaling hardware • MaMu consumes 40% of the logic and 95% of the DSP elements required for the CTF. • Trading hardware for time, it is possible to shrink MaMu. • Expected effects: • Hardware requirements of MaMu will reduce by . • OMP’s runtime will Increase by . • fmaxmay grow (critical path is located in MaMu). • Shrinking can also be applied to the matrix adder/subtractor (19% of required logic). + + + … … … 1 2 … 12 1 2 12 MaMu

Future Work – optimizing the OMP algorithm • Accelerate the matching stage: ↓ ↓ • Expected runtimereduction: 2/3. • Save hardware at the expanse of accuracy using to approximate . This also allows to replace with .

Summary • The current implementation was verified to recover up to 4 support elements in SBR4. • Future implementation of SBR2 may allow to recover at least twice as many. • The frame calculation in SBR2 is a major bottleneck. • Optimization directions include scaling the datapath and reducing the word length.

Sub- Nyquist Sampling Continuous to Finite Module Final Presentation – Part B