Advanced Wireless Receivers: Enhancing Performance with Algorithmic Innovations

Advanced Wireless Receivers: Algorithmic and Architectural Optimizations Suman Das Rice University Department of Electrical and Computer Engineering & Center for Multimedia Communication

700 600 500 millions of cell-phone users 400 300 200 100 0 1993 1994 1995 1996 1997 1998 1999 2000 2001 Year Source: Ericsson Introduction Wireless is one of the fastest growing industries “By 2002, a lot more cellular phones are going to have internet access than PCs.” Larry Ellison , CEO, Oracle.

Wireless Cellular Ad-hoc Network Bluetooth/ Home Networks Wireless LAN Ubiquitous wireless connectivity

Why advanced receiver algorithms? • The number of wireless subscribers growing • Multimedia data replacing voice traffic • Higher and varied data rate (144Kbps - 2Mbps) • Stricter quality of service (QOS) • Wireless bandwidth remains a critical resource Current generation receivers are suboptimal

0 10 -2 10 bit error rate -4 10 4 6 8 10 12 14 16 SNR (dB) Performance of advanced receivers Current receiver Advanced receiver Theoretical limit Huge performance improvement

Computational requirements of advanced receivers • 15 user system transmitting at 0.5Mbps needs • ~20 Billion additions per second • ~15 Billion multiplications per second • Requires 32 bit floating point precision 50 floating point DSP-s running at 200MHz to sustain the computation!

My research • Receiver design • High performance • Low complexity • Approach • Algorithmic simplification • Efficient architectural mapping

Wireless channel model • Channel Effects • Background noise • Fading • Multiple paths • Multiple Users • Multiple Access Interference(MAI) Noise Direct Path Reflected Paths Base Station User 1 User 2

Code Division Multiple Access (CDMA) S(t) • Wideband CDMA -technology of choice • Users distinguished by spreading sequence chip Spreading gain = 7 time bit -1 -1 1 -1 1 1 1 Received signal K: # of users P: # of paths w: attenuation t: delay b: data bits

data MODULATION ENCODING SPREADING OTHER USERS TRANSMITTER detected bits of all K users DECODING DEMODULATION DETECTION CHANNEL ESTIMATION RECEIVER CDMA system • Proposed advanced/multiuser receiver modules • Designed in isolation • Suboptimal design

detected bits of all K users DECODING DEMODULATION DETECTION CHANNEL ESTIMATION RECEIVER Integrated receiver design • Joint channel estimation and detection • Joint detection and decoding

Why separate channel estimation and detection? Received signal Channel Estimation Chip-matched filter Code-matched filter Detection delay time bi+1 bi ri Processing Window for Chan. Est. Operate on different statistics

Towards an integrated solution • Reuse computation from channel estimation step • Use same discretized filter output • Avoid alignment to bit interval of each user • Reduce computation • Save hardware

delay 1 1 10 0 0 0 0 0 0-1 -1 1 -1 Components of the observation vector bit i = +1 bit i+1 = -1 wk,p -wk,p -1 -1 1 -1 1 1 11 1 -1 1 -1 -1 -1 wk,p attenuation +

bit i = +1 bit i+1 = +1 -1 -1 1 -1 1 1 1-1 -1 1 -1 1 1 1 bk(i) + other users Uk Zk Matrix representation r = U Z bpreamble

Efficient statistics • Parametric approach • Build channel model (number of paths) • Estimate delay, attenuation • Produce the code matched filter output • Our approach • Estimate effective spreading code (UZ) • Code matched filter y = (UZ)T r

Simulation parameters • System parameters • 15 users • 3 paths • Spreading gain - 31 • Hardware platform • TI C62 and C67 EVM boards • 64 KB each internal program & data memory • 256 KB SBSRAM, 8 MB SDRAM (external) • Code-composer 1.0 to profile code

Effectiveness of integrated design 0 10 Single User Multiuser -1 10 Parametric approach UZ approach Actual Parameters bit error rate -2 10 -3 10 -4 10 -2 0 2 4 6 8 10 12 14 16 -4 SNR (dB) 2dB gain in performance

Computational savings • Avoid extraction of actual channel parameters • Avoid realignment of data for code-matched filtering • Reduce intermediate storage requirement • Avoid divisions (28 cycles) and square-root (38 cycles) in DSP.

Fixed point behavior • Fixed point advantages • Speed Power Cost • Fixed point analysis • 12 bit of precision required instead of 32 bits! • Pack two16 bit operations in 32 bit registers • More packing with • Saturation arithmetic • User power control!

Time requirement 100 90 80 68.5 70 60 Normalized time 2.39 X speedup 41.8 50 40 30 20 10 0 Unified Synch + Detect Original 16 bit fixed-point

detected bits of all K users DECODING DEMODULATION DETECTION CHANNEL ESTIMATION RECEIVER Integrated receiver design • Joint channel estimation and detection • Effective spreading code approach • Optimized detector design • Joint detection and decoding

Linear multiuser detector • Received signal r = (UZ) b + n • Channel estimation (UZ) • Matched filter outputy = (UZ)T r • Linear detector R b + n= y solve • R = ((UZ)TUZ) • Size of the linear system(NK) • Direct inverse takesO((NK)3) operation N block-length K # of Users

Approximate it as a block-circulant system Correlation matrix isblock-Toeplitz Solve N independent order K system iteratively Outline of the Kronecker algorithm • Kronecker representation • Isolates structure and the matrix blocks • Fourier transform converts it to a block-diagonal system • Computationally optimal

90 83.1 Kbps 80 Complexity O(N2K3) Vs O(NK2 + KNlogN) 70 60 50 40 Achievable data rate (Kbps) 30 20 10.4 Kbps 10 0 Decorrelator Kronecker Speedup in detector

Pipelining and parallelization • Mostly matrix based operations • Detector - iterative algorithm • Pipeline various iterations • Parallelize operations • Add more functional units • Distribute data across functional units • Distribute computations

Projected computation time 600 30 adders and multipliers 564.5 Kbps. DSP + Coprocessor support 500 400 Achievable data rate (Kbps) 300 DSP only 154.3 Kbps 200 100 20.75 Kbps 0 Base Multiuser Algorithm Hardware Pipelining Pipelining + Parallelization

detected bits of all K users DECODING DEMODULATION DETECTION CHANNEL ESTIMATION RECEIVER Integrated receiver design • Joint channel estimation and detection • Effective spreading code approach • Optimized detector design • Joint detection and decoding

d b MODULATION ENCODING SPREADING OTHER USERS TRANSMITTER Maximum a-posteriori (MAP) decoding • Received signal: r = UZd + n • Optimum decoding rule • Constrained optimization problem • Decode all users simultaneously Exponential complexity in number of users

y1 ^ b1 MF 1 Decoder 1 r yK ^ MF K Decoder K bK Single-user detection and decoding • Suboptimum alternatives • Isolate detection and decoding . . . .

Decoding matched filter outputs 0 10 MF+Viterbi Optimal -1 10 -2 10 BER -3 10 -4 10 1 2 3 4 5 6 7 8 SNR(dB) Huge performance loss!

User of concern c, interfering usersI r = (UZ)cdc + (UZ)IdI + z • Estimate dI • Eliminate interference: • Estimate dc for the next step = (UZ)Tc(r- (UZ)IdI) ^yc Iterative detection and decoding Complexity linear in number of users

Reduction in decoding complexity • Convolutional code • Coded bits depend on past data bits • Performance improves with memory length • Viterbi algorithm for decoding • Complexity exponential in memory length • Our suboptimal approach • Maximal weight basis decoding • Complexity quadratic in memory length

Joint detection and decoding performance 0 10 MF + Viterbi -1 10 Iter1 + Subopt Iter1 + Viterbi -2 Iter3 + Subopt 10 Optimal BER -3 10 Rate = 1/2 k = 7 -4 10 -5 10 1 2 3 4 5 6 7 8 SNR (dB)

Joint detection and decoding • Huge performance gain. • Suboptimal approximation - • Insignificant performance loss • Significant computational gain • Architecture for suboptimal decoding? • Viterbi algorithm - butterfly architecture • Have a sliding window implementation

Summary of contributions • Integrated channel estimation and detection model [wcnc] • Optimized detection algorithm [PIMRC, Tr. Com] • Fixed point implementation [ICASSP, SPIE] • Parallel architecture [Asilomar] • Joint detection and decoding [Globecom,Tr. Com] • Suboptimal decoding algorithm [Asilomar, Tr. Inf. Th.]

Wireless Cellular Ad-hoc Network Bluetooth/ Home Networks Wireless LAN Future research

Future research • Universal wireless receiver • Reconfigurable solution • Power efficient • Automate design? • Network level interaction • Resource allocation • Quality of service guarantee • Application level interaction

Further details http://www.ece.rice.edu/~suman http://www.ece.rice.edu/CMC

dodd Rate : 1/2 memory (k):2 b deven dodd systematic bits deven parity bits Convolutional codes d2 = d1 d4 = d1 + d3 d6 = d1 + d3 + d5 d8 = d3 + d5 + d7 d10 = d5 + d7 + d9

Suboptimal single user channel decoder • y = (y1, …yN) • d = (d1, …dN) • Viterbi algorithm: • Complexity grows exponentially with k • If no codeword constraint d = sgn(y) • Estimated dmay not be a codeword !!

d2 = d1 d4 = d1 + d3 d6 = d1 + d3 + d5 d8 = d3 + d5 + d7 d10 = d5 + d7 + d9 Maximum weight basis decoding • More variables than equations • NR independent variables N: block-length R: Rate • Choice depends on yi • y= 7.5 d = 1 • y= - 4.5 d = -1 • y = 0.5 d = ? Want to choose maximally independentsubset with largest total weight

Selection of maximally independent subset • Set I = • Given y, sort the weights |yi|: i = {1..N} • While | I| < NR • Choose location from {1..N} with largest weight such that I Ue is still an independent subset of {1..N} • Set I = I Ue • .

If de = sgn(ye) Suboptimal decoding algorithm • Chose M maximum independent subset • For each independent subset • Compute the codeword dI • Compute the likelihood p (y|dI) • Chose codeword with largest likelihood Decoding complexity reduced from O(2k)toO(k2)

Performance improvement 0 10 MF+MAP 2stage + MAP Single User -1 10 -2 BER 10 -3 10 Performance approaches single-user bound -4 10 1 2 3 4 5 6 7 8 SNR(dB)

Advanced Wireless Receivers: Enhancing Performance with Algorithmic Innovations

Advanced Wireless Receivers: Enhancing Performance with Algorithmic Innovations

Presentation Transcript

Guide to Wireless Communications

Algorithmic Sustainable Design: The Future of Architectural Theory.

CWNA Guide to Wireless LANs, Second Edition

Advanced telecommunications for wireless systems Investigating OFDM by MathCAD

3.- Wireless technologies

Lessons Learned in Building a Highly Scalable MySQL Database

CWNA Guide to Wireless LANs, Second Edition

Radio Frequency Interference Sensing and Mitigation in Wireless Receivers

The Future of Wireless

Winter 2012-2013 Compiler Principles Loop Optimizations and Register Allocation

Algorithmic Verification of Concurrent Programs

Algorithmic Testing

Algorithmic Game Theory and Internet Computing

Algorithmic Game Theory and Internet Computing

Algorithmic Game Theory and Internet Computing

Algorithmic Game Theory and Internet Computing

Algorithmic Game Theory and Internet Computing

Wireless Communications Engineering

Algorithmic Game Theory and Internet Computing

Special Topics on Wireless Ad-hoc Networks