450 likes | 538 Views
Explore advancements in wireless receiver algorithms to improve performance amidst the growing demand for higher data rates and quality of service in wireless communication. Discover how optimized receiver designs tackle complex channel effects. With a focus on advanced techniques like joint channel estimation and detection, achieve substantial performance gains and computational efficiency.
E N D
Advanced Wireless Receivers: Algorithmic and Architectural Optimizations Suman Das Rice University Department of Electrical and Computer Engineering & Center for Multimedia Communication
700 600 500 millions of cell-phone users 400 300 200 100 0 1993 1994 1995 1996 1997 1998 1999 2000 2001 Year Source: Ericsson Introduction Wireless is one of the fastest growing industries “By 2002, a lot more cellular phones are going to have internet access than PCs.” Larry Ellison , CEO, Oracle.
Wireless Cellular Ad-hoc Network Bluetooth/ Home Networks Wireless LAN Ubiquitous wireless connectivity
Why advanced receiver algorithms? • The number of wireless subscribers growing • Multimedia data replacing voice traffic • Higher and varied data rate (144Kbps - 2Mbps) • Stricter quality of service (QOS) • Wireless bandwidth remains a critical resource Current generation receivers are suboptimal
0 10 -2 10 bit error rate -4 10 4 6 8 10 12 14 16 SNR (dB) Performance of advanced receivers Current receiver Advanced receiver Theoretical limit Huge performance improvement
Computational requirements of advanced receivers • 15 user system transmitting at 0.5Mbps needs • ~20 Billion additions per second • ~15 Billion multiplications per second • Requires 32 bit floating point precision 50 floating point DSP-s running at 200MHz to sustain the computation!
My research • Receiver design • High performance • Low complexity • Approach • Algorithmic simplification • Efficient architectural mapping
Wireless channel model • Channel Effects • Background noise • Fading • Multiple paths • Multiple Users • Multiple Access Interference(MAI) Noise Direct Path Reflected Paths Base Station User 1 User 2
Code Division Multiple Access (CDMA) S(t) • Wideband CDMA -technology of choice • Users distinguished by spreading sequence chip Spreading gain = 7 time bit -1 -1 1 -1 1 1 1 Received signal K: # of users P: # of paths w: attenuation t: delay b: data bits
data MODULATION ENCODING SPREADING OTHER USERS TRANSMITTER detected bits of all K users DECODING DEMODULATION DETECTION CHANNEL ESTIMATION RECEIVER CDMA system • Proposed advanced/multiuser receiver modules • Designed in isolation • Suboptimal design
detected bits of all K users DECODING DEMODULATION DETECTION CHANNEL ESTIMATION RECEIVER Integrated receiver design • Joint channel estimation and detection • Joint detection and decoding
Why separate channel estimation and detection? Received signal Channel Estimation Chip-matched filter Code-matched filter Detection delay time bi+1 bi ri Processing Window for Chan. Est. Operate on different statistics
Towards an integrated solution • Reuse computation from channel estimation step • Use same discretized filter output • Avoid alignment to bit interval of each user • Reduce computation • Save hardware
delay 1 1 10 0 0 0 0 0 0-1 -1 1 -1 Components of the observation vector bit i = +1 bit i+1 = -1 wk,p -wk,p -1 -1 1 -1 1 1 11 1 -1 1 -1 -1 -1 wk,p attenuation +
bit i = +1 bit i+1 = +1 -1 -1 1 -1 1 1 1-1 -1 1 -1 1 1 1 bk(i) + other users Uk Zk Matrix representation r = U Z bpreamble
Efficient statistics • Parametric approach • Build channel model (number of paths) • Estimate delay, attenuation • Produce the code matched filter output • Our approach • Estimate effective spreading code (UZ) • Code matched filter y = (UZ)T r
Simulation parameters • System parameters • 15 users • 3 paths • Spreading gain - 31 • Hardware platform • TI C62 and C67 EVM boards • 64 KB each internal program & data memory • 256 KB SBSRAM, 8 MB SDRAM (external) • Code-composer 1.0 to profile code
Effectiveness of integrated design 0 10 Single User Multiuser -1 10 Parametric approach UZ approach Actual Parameters bit error rate -2 10 -3 10 -4 10 -2 0 2 4 6 8 10 12 14 16 -4 SNR (dB) 2dB gain in performance
Computational savings • Avoid extraction of actual channel parameters • Avoid realignment of data for code-matched filtering • Reduce intermediate storage requirement • Avoid divisions (28 cycles) and square-root (38 cycles) in DSP.
Fixed point behavior • Fixed point advantages • Speed Power Cost • Fixed point analysis • 12 bit of precision required instead of 32 bits! • Pack two16 bit operations in 32 bit registers • More packing with • Saturation arithmetic • User power control!
Time requirement 100 90 80 68.5 70 60 Normalized time 2.39 X speedup 41.8 50 40 30 20 10 0 Unified Synch + Detect Original 16 bit fixed-point
detected bits of all K users DECODING DEMODULATION DETECTION CHANNEL ESTIMATION RECEIVER Integrated receiver design • Joint channel estimation and detection • Effective spreading code approach • Optimized detector design • Joint detection and decoding
Linear multiuser detector • Received signal r = (UZ) b + n • Channel estimation (UZ) • Matched filter outputy = (UZ)T r • Linear detector R b + n= y solve • R = ((UZ)TUZ) • Size of the linear system(NK) • Direct inverse takesO((NK)3) operation N block-length K # of Users
Approximate it as a block-circulant system Correlation matrix isblock-Toeplitz Solve N independent order K system iteratively Outline of the Kronecker algorithm • Kronecker representation • Isolates structure and the matrix blocks • Fourier transform converts it to a block-diagonal system • Computationally optimal
90 83.1 Kbps 80 Complexity O(N2K3) Vs O(NK2 + KNlogN) 70 60 50 40 Achievable data rate (Kbps) 30 20 10.4 Kbps 10 0 Decorrelator Kronecker Speedup in detector
Pipelining and parallelization • Mostly matrix based operations • Detector - iterative algorithm • Pipeline various iterations • Parallelize operations • Add more functional units • Distribute data across functional units • Distribute computations
Projected computation time 600 30 adders and multipliers 564.5 Kbps. DSP + Coprocessor support 500 400 Achievable data rate (Kbps) 300 DSP only 154.3 Kbps 200 100 20.75 Kbps 0 Base Multiuser Algorithm Hardware Pipelining Pipelining + Parallelization
detected bits of all K users DECODING DEMODULATION DETECTION CHANNEL ESTIMATION RECEIVER Integrated receiver design • Joint channel estimation and detection • Effective spreading code approach • Optimized detector design • Joint detection and decoding
d b MODULATION ENCODING SPREADING OTHER USERS TRANSMITTER Maximum a-posteriori (MAP) decoding • Received signal: r = UZd + n • Optimum decoding rule • Constrained optimization problem • Decode all users simultaneously Exponential complexity in number of users
y1 ^ b1 MF 1 Decoder 1 r yK ^ MF K Decoder K bK Single-user detection and decoding • Suboptimum alternatives • Isolate detection and decoding . . . .
Decoding matched filter outputs 0 10 MF+Viterbi Optimal -1 10 -2 10 BER -3 10 -4 10 1 2 3 4 5 6 7 8 SNR(dB) Huge performance loss!
User of concern c, interfering usersI r = (UZ)cdc + (UZ)IdI + z • Estimate dI • Eliminate interference: • Estimate dc for the next step = (UZ)Tc(r- (UZ)IdI) ^yc Iterative detection and decoding Complexity linear in number of users
Reduction in decoding complexity • Convolutional code • Coded bits depend on past data bits • Performance improves with memory length • Viterbi algorithm for decoding • Complexity exponential in memory length • Our suboptimal approach • Maximal weight basis decoding • Complexity quadratic in memory length
Joint detection and decoding performance 0 10 MF + Viterbi -1 10 Iter1 + Subopt Iter1 + Viterbi -2 Iter3 + Subopt 10 Optimal BER -3 10 Rate = 1/2 k = 7 -4 10 -5 10 1 2 3 4 5 6 7 8 SNR (dB)
Joint detection and decoding • Huge performance gain. • Suboptimal approximation - • Insignificant performance loss • Significant computational gain • Architecture for suboptimal decoding? • Viterbi algorithm - butterfly architecture • Have a sliding window implementation
Summary of contributions • Integrated channel estimation and detection model [wcnc] • Optimized detection algorithm [PIMRC, Tr. Com] • Fixed point implementation [ICASSP, SPIE] • Parallel architecture [Asilomar] • Joint detection and decoding [Globecom,Tr. Com] • Suboptimal decoding algorithm [Asilomar, Tr. Inf. Th.]
Wireless Cellular Ad-hoc Network Bluetooth/ Home Networks Wireless LAN Future research
Future research • Universal wireless receiver • Reconfigurable solution • Power efficient • Automate design? • Network level interaction • Resource allocation • Quality of service guarantee • Application level interaction
Further details http://www.ece.rice.edu/~suman http://www.ece.rice.edu/CMC
dodd Rate : 1/2 memory (k):2 b deven dodd systematic bits deven parity bits Convolutional codes d2 = d1 d4 = d1 + d3 d6 = d1 + d3 + d5 d8 = d3 + d5 + d7 d10 = d5 + d7 + d9
Suboptimal single user channel decoder • y = (y1, …yN) • d = (d1, …dN) • Viterbi algorithm: • Complexity grows exponentially with k • If no codeword constraint d = sgn(y) • Estimated dmay not be a codeword !!
d2 = d1 d4 = d1 + d3 d6 = d1 + d3 + d5 d8 = d3 + d5 + d7 d10 = d5 + d7 + d9 Maximum weight basis decoding • More variables than equations • NR independent variables N: block-length R: Rate • Choice depends on yi • y= 7.5 d = 1 • y= - 4.5 d = -1 • y = 0.5 d = ? Want to choose maximally independentsubset with largest total weight
Selection of maximally independent subset • Set I = • Given y, sort the weights |yi|: i = {1..N} • While | I| < NR • Choose location from {1..N} with largest weight such that I Ue is still an independent subset of {1..N} • Set I = I Ue • .
If de = sgn(ye) Suboptimal decoding algorithm • Chose M maximum independent subset • For each independent subset • Compute the codeword dI • Compute the likelihood p (y|dI) • Chose codeword with largest likelihood Decoding complexity reduced from O(2k)toO(k2)
Performance improvement 0 10 MF+MAP 2stage + MAP Single User -1 10 -2 BER 10 -3 10 Performance approaches single-user bound -4 10 1 2 3 4 5 6 7 8 SNR(dB)