1 / 59

Algorithms and Architecture s for Future Wireless Base-Stations

Algorithms and Architecture s for Future Wireless Base-Stations. Sridhar Rajagopal and Joseph Cavallaro ECE Department Rice University April 19, 2000. This work is supported by Texas Instruments, Nokia, Texas Advanced Technology Program and NSF. Overview . Future Base-Stations

elmo-barr
Download Presentation

Algorithms and Architecture s for Future Wireless Base-Stations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Algorithms and Architectures for Future Wireless Base-Stations Sridhar Rajagopal and Joseph Cavallaro ECE Department Rice University April 19, 2000 This work is supported by Texas Instruments, Nokia, Texas Advanced Technology Program and NSF

  2. Overview • Future Base-Stations • Current DSP Implementation • Our Approach • Make Algorithms Computationally effective • Task Partitioning for pipelining, parallelism • Processor Design for Accelerating Wireless TI Meeting

  3. Evolution of Wireless Comm First Generation Voice Second/Current Generation Voice + Low-rate Data (9.6Kbps) Third Generation + Voice + High-rate Data (2 Mbps) + Multimedia W-CDMA TI Meeting

  4. Noise +MAI Base Station Reflected Paths Direct Path User 1 User 2 Communication SystemUplink TI Meeting

  5. Main Processing Blocks Decoding Channel Estimation Detection Baseband Layer of Base-Station Receiver TI Meeting

  6. No Multiuser Detection Proposed Base-Station TI's Wireless Basestation (http://www.ti.com/sc/docs/psheets/diagrams/basestat.htm) TI Meeting

  7. Real -Time Requirements • Multiple Data Rates by Varying Spreading Factors • Detection needs to be done in real-time • 1953 cycles available in a C6x DSP at 250MHz to detect 1 bit at 128 Kbps TI Meeting

  8. 4 Data Rate Comparisons for Matched Filter and Multiuser Detector x 10 18 16 14 Targeted Data Rate = 128Kbps 12 10 Projected (8x) Data Rates Achieved 8 Matched Filter(C64)* Multiuser Detector(C64)* 6 Matched Filter(C67) Multiuser Detector(C67) Targeted Data Rate 4 2 C67 at 166MHz 0 9 10 11 12 13 14 15 Number of Users Current DSP Implementation TI Meeting

  9. Complexity • Algorithm Choice Limited by Complexity • Multistage reduces data rate by half. • Main Features • Matrix based operations • High levels of parallelism • Bit level computations • 32x32 problem size for the Detector shown • Estimation, Decoding assumed pipelined. TI Meeting

  10. Reasons • Sophisticated, Compute-Intensive Algorithms • Need more MIPs/FLOPs performance • Unable to fully exploit pipelining or parallelism • Bit - level computations / Storage TI Meeting

  11. Our Approach • Make algorithms computationally effective • without sacrificing error rate performance • Task Partitioning on Multiple Processing Elements • DSPs : Core • FPGAs : Application Specific / Bit-level Computations • Processor with reconfigurable support and extensions for wireless TI Meeting

  12. Algorithms • Channel Estimation • Avoid inversion by iterative scheme • Detection • Avoid block-based detection by pipelining TI Meeting

  13. time bi+1 bi ri Computations Involved delay • Model • Compute Correlation Matrices Bits of K async. users aligned at times I and I-1 Received bits of spreading length N for K users TI Meeting

  14. Multishot Detection Solve for the channel estimate, Ai Multishot Detection TI Meeting

  15. Differencing Multistage Detection • Stage 0- Matched Filter • Stage 1 • Successive Stages S=diag(AHA) y - soft decision d - detected bits (hard decision) TI Meeting

  16. Iterative Scheme • Tracking • Method of Steepest Descent • Stable convergence behavior • Same Performance TI Meeting

  17. Comparison of Bit Error Rates (BER) -1 10 -2 BER 10 O(K2N) MF ActMF ML ActML O(K3+K2N) -3 10 4 5 6 7 8 9 10 11 12 Signal to Noise Ratio (SNR) Simulations - AWGN Channel Detection Window = 12 SINR = 0 Paths =3 Preamble L =150 Spreading N = 31 Users K = 15 10000 bits/user MF – Matched Filter ML- Maximum Likelihood ACT – using inversion TI Meeting

  18. 0 10 MF - Static MF - Tracking ML - Static ML - Tracking -1 10 BER -2 10 -3 10 4 5 6 7 8 9 10 11 12 SNR Fading Channel with Tracking Doppler = 10 Hz, 1000 Bits,15 users, 3 Paths TI Meeting

  19. Matched Filter 1 12 Stage 1 1 12 Stage 2 1 12 Stage 3 1 12 Matched Filter Bits 2-11 11 22 Stage 1 11 22 Stage 2 11 22 Stage 3 11 22 Bits 12-21 Block Based Detector TI Meeting

  20. 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 Pipelined Detector Matched Filter 1 2 3 4 5 6 7 8 9 10 11 12 Stage 1 Stage 2 Stage 3 TI Meeting

  21. Task Decomposition [Asilomar99] Block I Block III Block II Multistage Detector Correlation Matrices (Per Bit) Inverse Matrix Products Block IV M UX d A0HA1 O(K2N) Multistage Detection (Per Window) RbbAH = Rbr[R] O(K2N) Rbr[R] O(KN) b A0HA0 O(K2N) Rbr[I] O(KN) M UX Data’ RbbAH = Rbr[I] O(K2N) d O(DK2Me) Rbb O(K2) A1HA1 O(K2N) Pilot AHr O(KND) Data Channel Estimation Matched Filter TI Meeting

  22. 5 x 10 Data Rates for Different Levels of Pipelining and Parallelism 3 2.5 (Parallel A) (Parallel+Pipe B) (Parallel A) (Pipe B) (Parallel A) B 2 A B Sequential A + B Data Rates 1.5 Data Rate Requirement = 128 Kbps 1 0.5 0 9 10 11 12 13 14 15 Number of Users Achieved Data Rates TI Meeting

  23. VLSI Implementation • Channel Estimation as a Case Study • Area - Time Efficient Architecture • Real - Time Implementation • Bit- Level Computations - FPGAs • Core Operations - DSPs TI Meeting

  24. Motivation for Architecture • Wireless, the next wave after Multimedia • Highly Compute-Intensive Algorithms • Real-Time Requirements TI Meeting

  25. Outline • Processor Core with Reconfigurable Support • Permutation Based Interleaved Memory • Processor Architecture -EPIC • Instruction Set Extensions • Truncated Multipliers • Software Support Needed TI Meeting

  26. Characteristics of Wireless Algorithms • Massive Parallelism • Bit-level Computations • Matrix Based Operations • Memory Intensive • Complex-valued Data • Approximate Computations TI Meeting

  27. What’s wrong with Current Architectures for these applications? TI Meeting

  28. Problems with Current Architectures • UltraSPARC, C6x, MMX, IA-64 • Not enough MIPs/FLOPs • Unable to fully exploit parallelism • Bit Level Computations • Memory Bottlenecks • Specialized Instructions for Wireless Communications TI Meeting

  29. Home Area Wireless LAN Outdoor CDMA Cellular Network High Speed Office Wireless LAN Why Reconfigurable • Adapt algorithms to environment • Seamless and Continuous Data Processing during Handoffs TI Meeting

  30. User Interface Translation Synchronization Transport Network OSI Layers 3-7 Data Link Layer (Converts Frames to Bits) OSI Layer 2 Physical Layer (hardware; raw bit stream) OSI Layer 1 Reconfigurable Support TI Meeting

  31. Different Protocols • MPEG-4, H.723 - Voice,Multimedia • Convolutional,Turbo - Channel Coding Source Coding Channel Coding Source Decoding Channel Decoding Multiuser Detection Channel Estimation TI Meeting

  32. A New Architecture Main Memory Processor Core (GPP/DSP) Cache Q Q Crossbar Real-Time I/O Bit Stream Reconfigurable Logic RF Unit Add-on PCMCIA Card Processor TI Meeting

  33. Why Reconfigurable • Process initial bit level computations • Optimize for fast I/O transfer Real-Time I/O Bit Stream Reconfigurable Logic RF Unit TI Meeting

  34. Reconfigurable Support 2 64-bit data buses 1 64-bit address bus Control Blocks Boolean values Fast I/O Configuration Caches 64-bit Datapath Sequencer GARP Architecture at UC,Berkeley TI Meeting

  35. Reconfigurable Support • Wide Path to Memory • Data Transfer • Minimize Load Times • Configuration Caches • Recently Displaced Configurations(5 cycles) • Can hold 4 full size Configurations • Independent Execution TI Meeting

  36. Reconfigurable Support • Access to same Memory System as Processor • Minimize overhead • When idle • Load Configurations • Transfer Data TI Meeting

  37. Instruction Cache Processor Core (GPP/DSP) L1 Data Cache Main Memory Q Q Crossbar FPGA Memory Interface • Access to Main Memory and L1 Data Cache • Large, fast Memory Store • Memory Prefetch Queues for Sequential Accesses • Read aheads and Write Behinds TI Meeting

  38. Permutation Based Interleaved Memory (PBI) • High Memory Bandwidth Needed • Stride-Insensitive Memory System for Matrices • Multiple Banks • Sustained Peak Throughput (95%) Main Memory L1 Data Cache TI Meeting

  39. Processor Core (GPP/DSP) Cache Q Q Crossbar FPGA Processor Core • 64-bit EPIC Architecture with Extensions(IA-64/C6x) • Statically determined Parallelism;exploit ILP • Execution Time Predictability TI Meeting

  40. EPIC Principle • Explicitly Parallel Instruction Computing • Evolution of VLIW Computing • Compiler- Key role • Architecture to assist Compiler • Better cope with dynamic factors • which limited VLIW Parallelism TI Meeting

  41. Instruction Set Extensions • To accelerate Bit level computations in Wireless • Real/Complex Integer - Bit Multiplications • Used in Multiuser Detection, Decoding • Bit - Bit Multiplications • Used in Outer Product Updates • Correlation, Channel Estimation • Complex Integer-Integer Multiplications • Useful in other Signal Processing applications • Speech, Video,,, TI Meeting

  42. Architecture Support • Support via Instruction Set Extensions • Minimal ALU Modifications necessary • Transparent to Register Files/Memory • Additional 8-bit Special Purpose Registers TI Meeting

  43. Integer - Bit Multiplications D = D + b*C Eg: Cross-Correlation 64-bit Register C 64-bit Register A +/- +/- +/- 8-bit Register b 64-bit Register D Register Renaming? TI Meeting

  44. b(1) b(2) b(7) b(8) 8-bit to 64-bit conversions 1.2 1.1 D = D + b*bT Eg: Auto-Correlation 2.1 b1 = b(1:8),b(1:8),….b(1:8) b2 = b(1)b(1)……b(8)b(8) 8-bit Register b 64-bit Register A b(1)..b(8) b(1)..b(8) b(1) b(1) b(8) b(8) TI Meeting

  45. Bit-Bit Multiplications D = D + b*bT Eg: Auto-Correlation b1*b2 Bit-Bit Multiplications 64-bit Register A = b1 64-bit Register B=b2 Ex-NOR 64-bit Register C=b1*b2 TI Meeting

  46. Increment/Decrement D = D + b*bT Eg: Auto-Correlation 64-bit Register D 1 +/- +/- +/- 8-bit Register b1*b2 64-bit Register (D+b1*b2) TI Meeting

  47. Complex-valued Data Processing • Is it easy to add ? • Is this worth an additional ALU Support ? • Typically supported by Software! ? TI Meeting

  48. ALU Multipliers Truncated Multiplier Multiplier 1 Multiplier 2 Truncated Multipliers • Many applications need approximate computations • Adaptive Algorithms :Y = Y + mu*(Y*C) • Truncate lower bits • Truncated Multipliers - half the area/half the delay • Can do 2 truncated multiplies in parallel with regular TI Meeting

  49. Software Support • Greater Interaction between Compilers and Architectures • EPIC • Reconfigurable Logic • Compiler needs to find and exploit bit level computations • Reconfigurable Logic Programming TI Meeting

  50. Other Uses • Reconfigurable Logic • For accelerating loops of general purpose processors • Bit Level Support • For other voice, video and multimedia applications TI Meeting

More Related