1 / 37

Implementing algorithms for advanced communication systems -- My bag of tricks

Implementing algorithms for advanced communication systems -- My bag of tricks. Sridhar Rajagopal Electrical and Computer Engineering. This work is supported by Nokia, TI, TATP and NSF. Motivation. Build wireless multimedia communication systems - K bps to M bps

jamil
Download Presentation

Implementing algorithms for advanced communication systems -- My bag of tricks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Implementing algorithms for advanced communication systems -- My bag of tricks Sridhar Rajagopal Electrical and Computer Engineering This work is supported by Nokia, TI, TATP and NSF

  2. Motivation • Build wireless multimedia communication systems • - Kbps to Mbps • Sophisticated algorithms - exponential complexity • Approaches: • Sub-optimal algorithms - O(n2,n3) complexity • Better hardware implementations needed

  3. Contributions • Develop algorithms suitable for implementation • Bit-level extensions to microprocessors • Pipelining to reduce latency and memory • On-line arithmetic for Most Significant Digit First Computations.

  4. Outline • Advanced communication systems • Algorithms for efficient implementation • Pipelining • On-line arithmetic • Bit-level extensions to microprocessors • Summary

  5. Communication System - Physical layerTransmitter Information bits (from higher layers) Digital Antenna Analog RF unit D/A Coding Spreading +1

  6. Communication System - Physical layerChannel Multipath reflections, attenuations, noise, multiple user interference

  7. Communication System - Physical layerReceiver Antenna Digital Analog +1 Detection Decoding Information bits (to higher layers) RF unit A/D Channel estimation

  8. Questions • Higher data rates => sophisticated algorithms • => strain on hardware => lower data rates • 1.Which is the best algorithm to use for implementation? • 2.How to best do the digital part? • - VLSI, DSP, FPGA, microprocessor • - combination of these?

  9. Outline • Advanced communication systems • Algorithms for efficient implementation • Pipelining • On-line arithmetic • Bit-level extensions to microprocessors • Summary

  10. Multiuser Channel Estimation Algorithm = {+1, -1} : Training/Tracking bits = 8-bit integer (complex) : Received signal N = spreading gain (typically fixed ,e.g: 32) K = number of users (variable, <=N) = Maximum Likelihood channel estimate

  11. Iterative hardware-efficient scheme Bit-streaming : suitable for tracking (window length L) Method of gradient descent Stable convergence behavior Simple fixed-point VLSI architecture

  12. Comparison of Bit Error Rates (BER) -1 10 -2 BER 10 O(K2N) MF ActMF ML ActML O(K3+K2N) -3 10 4 5 6 7 8 9 10 11 12 Signal to Noise Ratio (SNR) Simulations - Static multipath channel SINR = 0 dB Paths =3 Preamble =150 Spreading N = 31 Users K = 15

  13. Outline • Advanced communication systems • Algorithms for efficient implementation • Pipelining • On-line arithmetic • Bit-level extensions to microprocessors • Summary

  14. ri-2 ri-1 ri ri+1 User 1 time bi+1 bi Interference from future bits of other users ri Desired User Interference from previous bits of other users User j Multiuser interference

  15. Matched Filter 1 12 Stage 1 1 12 Stage 2 1 12 Stage 3 1 12 Matched Filter Bits 2-11 11 22 Stage 1 11 22 Stage 2 11 22 Stage 3 11 22 Bits 12-21 Block Based Detector

  16. Detection Matched filter Iterate for convergence

  17. ri-2 ri-1 ri ri+1 User 1 time bi+1 bi Interference from future bits of other users ri Desired User Interference from previous bits of other users User j Pipelined detection scheme

  18. 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 Pipelined Detector Matched Filter 1 2 3 4 5 6 7 8 9 10 11 12 Stage 1 Stage 2 Stage 3

  19. Chip being built as part of the Elec 422 VLSI course project

  20. Outline • Advanced communication systems • Algorithms for efficient implementation • Pipelining • On-line arithmetic • Bit-level extensions to microprocessors • Summary

  21. On-line arithmetic • Sign of dot-product computations • High precision operations done to find the sign • Can be avoided with Most Significant Digit First computation using redundant number systems

  22. Outline • Advanced communication systems • Algorithms for efficient implementation • Pipelining • On-line arithmetic • Bit-level extensions to microprocessors • Summary

  23. DSP/microprocessor implementations • Further acceleration needed for real-time performance • Matrix based massively parallel algorithms • Detection of bits {+1,-1} : bit - level operations • DSPs • Bit multiplications not needed - (add/subtract on FPGA) • Bit storage not convenient • Not fully able to exploit parallelism

  24. DSP2 FPGA1 FPGA2 Code matched filter detector PIC (Stage 1) PIC (Stage 2) Detected bits Received bits Multiuser estimation DSP1 FPGAs for acceleration • Flexibility of ASICs • Good for parallelism and bit-level operations

  25. -2 10 -3 10 -4 10 Execution time (in seconds) -5 10 Single DSP implementation 2 DSP implementation Target data rate - 128 Kbps/user 2 DSPs + 2 FPGAs -6 10 0 5 10 15 20 25 30 35 Users Multiprocessor simulations

  26. Instruction Set Extensions • To accelerate Bit level computations in Wireless • Real/Complex Integer - Bit Multiplications • Used in Multiuser Detection, Decoding • Bit - Bit Multiplications • Used in Outer Product Updates • Correlation, Channel Estimation • Complex Integer-Integer Multiplications • Useful in other Signal Processing applications • Speech, Video,,,

  27. 64-bit Register A 64-bit Register B 8 8 + + x 8 64-bit Register C SIMD Parallelism

  28. 64-bit Register D[i][j] 8 8 +/- +/- 8 8-bit Control Register b[i] 64-bit Register D[i][j] Integer - Bit Multiplications For i = 1..8, j= 1..8 D[i][j] = D[i][j] + b[i]*C[j] (Cross-Correlation) 64-bit Register C[j]

  29. Computational Savings • Avoid bit multiplications and control structures • 4 8-bit Multiply • -Latency 3 cycles • 8 8-bit Add • -Latency 1 cycle • Cross-Correlation Example • 64 multiply, 64 add

  30. Bit-Bit Multiplications D = D + b*bT Eg: Auto-Correlation b1*b2 Bit-Bit Multiplications 64-bit Register A = b1 64-bit Register B=b2 XNOR 64-bit Register C=b1*b2

  31. b(1) b(2) b(7) b(8) 8-bit to 64-bit conversions D = D + b*bT Eg: Auto-Correlation 1.2 1.1 2.1 b1 = b(1:8),b(1:8),….b(1:8) b2 = b(1)b(1)……b(8)b(8) 8-bit Register b 64-bit Register A b(1)..b(8) b(1)..b(8) b(1) b(1) b(8) b(8)

  32. Increment/Decrement D = D + b*bT Eg: Auto-Correlation 64-bit Register D 1 +/- +/- +/- 8-bit Register b1*b2 64-bit Register (D+b1*b2)

  33. ALU Multipliers Truncated Multiplier Multiplier 1 Multiplier 2 Truncated Multipliers • Many applications need approximate computations • Adaptive Algorithms :Y = Y + mu*(Y*C) • Truncate lower bits • Truncated Multipliers - half the area/half the delay • Can do 2 truncated multiplies in parallel with regular

  34. Open Questions • VLIW simulator?? • Showing performance improvement, for different algorithms • Compiler and software support

  35. Outline • Advanced communication systems • Algorithms for efficient implementation • Pipelining • On-line arithmetic • Bit-level extensions to microprocessors • Summary

  36. Conclusions • Data rates for advanced communication systems , limited by hardware, not by algorithms • Need to find efficient solutions to tackle this problem • - Hardware-software co-design • Presented my ways of attacking this problem

  37. Future Work • RENÉ: • Single re-configurable hardware to switch between 2 communication standards • Designing algorithms, conditioned on the availability of only finite precision • http://www.ece.rice.edu/~sridhar/research.htm • http://cmc.rice.edu

More Related