1 / 45

Advanced Wireless Receivers: Algorithmic and Architectural Optimizations

Advanced Wireless Receivers: Algorithmic and Architectural Optimizations. Suman Das Rice University Department of Electrical and Computer Engineering & Center for Multimedia Communication. 700. 600. 500. millions of cell-phone users. 400. 300. 200. 100. 0. 1993. 1994. 1995. 1996.

Download Presentation

Advanced Wireless Receivers: Algorithmic and Architectural Optimizations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Advanced Wireless Receivers: Algorithmic and Architectural Optimizations Suman Das Rice University Department of Electrical and Computer Engineering & Center for Multimedia Communication

  2. 700 600 500 millions of cell-phone users 400 300 200 100 0 1993 1994 1995 1996 1997 1998 1999 2000 2001 Year Source: Ericsson Introduction Wireless is one of the fastest growing industries “By 2002, a lot more cellular phones are going to have internet access than PCs.” Larry Ellison , CEO, Oracle.

  3. Wireless Cellular Ad-hoc Network Bluetooth/ Home Networks Wireless LAN Ubiquitous wireless connectivity

  4. Why advanced receiver algorithms? • The number of wireless subscribers growing • Multimedia data replacing voice traffic • Higher and varied data rate (144Kbps - 2Mbps) • Stricter quality of service (QOS) • Wireless bandwidth remains a critical resource Current generation receivers are suboptimal

  5. 0 10 -2 10 bit error rate -4 10 4 6 8 10 12 14 16 SNR (dB) Performance of advanced receivers Current receiver Advanced receiver Theoretical limit Huge performance improvement

  6. Computational requirements of advanced receivers • 15 user system transmitting at 0.5Mbps needs • ~20 Billion additions per second • ~15 Billion multiplications per second • Requires 32 bit floating point precision 50 floating point DSP-s running at 200MHz to sustain the computation!

  7. My research • Receiver design • High performance • Low complexity • Approach • Algorithmic simplification • Efficient architectural mapping

  8. Wireless channel model • Channel Effects • Background noise • Fading • Multiple paths • Multiple Users • Multiple Access Interference(MAI) Noise Direct Path Reflected Paths Base Station User 1 User 2

  9. Code Division Multiple Access (CDMA) S(t) • Wideband CDMA -technology of choice • Users distinguished by spreading sequence chip Spreading gain = 7 time bit -1 -1 1 -1 1 1 1 Received signal K: # of users P: # of paths w: attenuation t: delay b: data bits

  10. data MODULATION ENCODING SPREADING OTHER USERS TRANSMITTER detected bits of all K users DECODING DEMODULATION DETECTION CHANNEL ESTIMATION RECEIVER CDMA system • Proposed advanced/multiuser receiver modules • Designed in isolation • Suboptimal design

  11. detected bits of all K users DECODING DEMODULATION DETECTION CHANNEL ESTIMATION RECEIVER Integrated receiver design • Joint channel estimation and detection • Joint detection and decoding

  12. Why separate channel estimation and detection? Received signal Channel Estimation Chip-matched filter Code-matched filter Detection delay time bi+1 bi ri Processing Window for Chan. Est. Operate on different statistics

  13. Towards an integrated solution • Reuse computation from channel estimation step • Use same discretized filter output • Avoid alignment to bit interval of each user • Reduce computation • Save hardware

  14. delay 1 1 10 0 0 0 0 0 0-1 -1 1 -1 Components of the observation vector bit i = +1 bit i+1 = -1 wk,p -wk,p -1 -1 1 -1 1 1 11 1 -1 1 -1 -1 -1 wk,p attenuation +

  15. bit i = +1 bit i+1 = +1 -1 -1 1 -1 1 1 1-1 -1 1 -1 1 1 1 bk(i) + other users Uk Zk Matrix representation r = U Z bpreamble

  16. Efficient statistics • Parametric approach • Build channel model (number of paths) • Estimate delay, attenuation • Produce the code matched filter output • Our approach • Estimate effective spreading code (UZ) • Code matched filter y = (UZ)T r

  17. Simulation parameters • System parameters • 15 users • 3 paths • Spreading gain - 31 • Hardware platform • TI C62 and C67 EVM boards • 64 KB each internal program & data memory • 256 KB SBSRAM, 8 MB SDRAM (external) • Code-composer 1.0 to profile code

  18. Effectiveness of integrated design 0 10 Single User Multiuser -1 10 Parametric approach UZ approach Actual Parameters bit error rate -2 10 -3 10 -4 10 -2 0 2 4 6 8 10 12 14 16 -4 SNR (dB) 2dB gain in performance

  19. Computational savings • Avoid extraction of actual channel parameters • Avoid realignment of data for code-matched filtering • Reduce intermediate storage requirement • Avoid divisions (28 cycles) and square-root (38 cycles) in DSP.

  20. Fixed point behavior • Fixed point advantages • Speed Power Cost • Fixed point analysis • 12 bit of precision required instead of 32 bits! • Pack two16 bit operations in 32 bit registers • More packing with • Saturation arithmetic • User power control!

  21. Time requirement 100 90 80 68.5 70 60 Normalized time 2.39 X speedup 41.8 50 40 30 20 10 0 Unified Synch + Detect Original 16 bit fixed-point

  22. detected bits of all K users DECODING DEMODULATION DETECTION CHANNEL ESTIMATION RECEIVER Integrated receiver design • Joint channel estimation and detection • Effective spreading code approach • Optimized detector design • Joint detection and decoding

  23. Linear multiuser detector • Received signal r = (UZ) b + n • Channel estimation (UZ) • Matched filter outputy = (UZ)T r • Linear detector R b + n= y solve • R = ((UZ)TUZ) • Size of the linear system(NK) • Direct inverse takesO((NK)3) operation N block-length K # of Users

  24. Approximate it as a block-circulant system Correlation matrix isblock-Toeplitz Solve N independent order K system iteratively Outline of the Kronecker algorithm • Kronecker representation • Isolates structure and the matrix blocks • Fourier transform converts it to a block-diagonal system • Computationally optimal

  25. 90 83.1 Kbps 80 Complexity O(N2K3) Vs O(NK2 + KNlogN) 70 60 50 40 Achievable data rate (Kbps) 30 20 10.4 Kbps 10 0 Decorrelator Kronecker Speedup in detector

  26. Pipelining and parallelization • Mostly matrix based operations • Detector - iterative algorithm • Pipeline various iterations • Parallelize operations • Add more functional units • Distribute data across functional units • Distribute computations

  27. Projected computation time 600 30 adders and multipliers 564.5 Kbps. DSP + Coprocessor support 500 400 Achievable data rate (Kbps) 300 DSP only 154.3 Kbps 200 100 20.75 Kbps 0 Base Multiuser Algorithm Hardware Pipelining Pipelining + Parallelization

  28. detected bits of all K users DECODING DEMODULATION DETECTION CHANNEL ESTIMATION RECEIVER Integrated receiver design • Joint channel estimation and detection • Effective spreading code approach • Optimized detector design • Joint detection and decoding

  29. d b MODULATION ENCODING SPREADING OTHER USERS TRANSMITTER Maximum a-posteriori (MAP) decoding • Received signal: r = UZd + n • Optimum decoding rule • Constrained optimization problem • Decode all users simultaneously Exponential complexity in number of users

  30. y1 ^ b1 MF 1 Decoder 1 r yK ^ MF K Decoder K bK Single-user detection and decoding • Suboptimum alternatives • Isolate detection and decoding . . . .

  31. Decoding matched filter outputs 0 10 MF+Viterbi Optimal -1 10 -2 10 BER -3 10 -4 10 1 2 3 4 5 6 7 8 SNR(dB) Huge performance loss!

  32. User of concern c, interfering usersI r = (UZ)cdc + (UZ)IdI + z • Estimate dI • Eliminate interference: • Estimate dc for the next step = (UZ)Tc(r- (UZ)IdI) ^yc Iterative detection and decoding Complexity linear in number of users

  33. Reduction in decoding complexity • Convolutional code • Coded bits depend on past data bits • Performance improves with memory length • Viterbi algorithm for decoding • Complexity exponential in memory length • Our suboptimal approach • Maximal weight basis decoding • Complexity quadratic in memory length

  34. Joint detection and decoding performance 0 10 MF + Viterbi -1 10 Iter1 + Subopt Iter1 + Viterbi -2 Iter3 + Subopt 10 Optimal BER -3 10 Rate = 1/2 k = 7 -4 10 -5 10 1 2 3 4 5 6 7 8 SNR (dB)

  35. Joint detection and decoding • Huge performance gain. • Suboptimal approximation - • Insignificant performance loss • Significant computational gain • Architecture for suboptimal decoding? • Viterbi algorithm - butterfly architecture • Have a sliding window implementation

  36. Summary of contributions • Integrated channel estimation and detection model [wcnc] • Optimized detection algorithm [PIMRC, Tr. Com] • Fixed point implementation [ICASSP, SPIE] • Parallel architecture [Asilomar] • Joint detection and decoding [Globecom,Tr. Com] • Suboptimal decoding algorithm [Asilomar, Tr. Inf. Th.]

  37. Wireless Cellular Ad-hoc Network Bluetooth/ Home Networks Wireless LAN Future research

  38. Future research • Universal wireless receiver • Reconfigurable solution • Power efficient • Automate design? • Network level interaction • Resource allocation • Quality of service guarantee • Application level interaction

  39. Further details http://www.ece.rice.edu/~suman http://www.ece.rice.edu/CMC

  40. dodd Rate : 1/2 memory (k):2 b deven dodd systematic bits deven parity bits Convolutional codes d2 = d1 d4 = d1 + d3 d6 = d1 + d3 + d5 d8 = d3 + d5 + d7 d10 = d5 + d7 + d9

  41. Suboptimal single user channel decoder • y = (y1, …yN) • d = (d1, …dN) • Viterbi algorithm: • Complexity grows exponentially with k • If no codeword constraint d = sgn(y) • Estimated dmay not be a codeword !!

  42. d2 = d1 d4 = d1 + d3 d6 = d1 + d3 + d5 d8 = d3 + d5 + d7 d10 = d5 + d7 + d9 Maximum weight basis decoding • More variables than equations • NR independent variables N: block-length R: Rate • Choice depends on yi • y= 7.5 d = 1 • y= - 4.5 d = -1 • y = 0.5 d = ? Want to choose maximally independentsubset with largest total weight

  43. Selection of maximally independent subset • Set I = • Given y, sort the weights |yi|: i = {1..N} • While | I| < NR • Choose location from {1..N} with largest weight such that I Ue is still an independent subset of {1..N} • Set I = I Ue • .

  44. If de = sgn(ye) Suboptimal decoding algorithm • Chose M maximum independent subset • For each independent subset • Compute the codeword dI • Compute the likelihood p (y|dI) • Chose codeword with largest likelihood Decoding complexity reduced from O(2k)toO(k2)

  45. Performance improvement 0 10 MF+MAP 2stage + MAP Single User -1 10 -2 BER 10 -3 10 Performance approaches single-user bound -4 10 1 2 3 4 5 6 7 8 SNR(dB)

More Related