1 / 24

Architectural Exploration: 802.11a Transmitter Arvind, Nirav Dave, Steve Gerding, Mike Pellauer

Architectural Exploration: 802.11a Transmitter Arvind, Nirav Dave, Steve Gerding, Mike Pellauer Computer Science & Artificial Intelligence Laboratory Massachusetts Institute of Technology MIT-Nokia Architecture Group Helsinki, June 5, 2006. Why architectural exploration.

wei
Download Presentation

Architectural Exploration: 802.11a Transmitter Arvind, Nirav Dave, Steve Gerding, Mike Pellauer

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Architectural Exploration: 802.11a Transmitter Arvind, Nirav Dave, Steve Gerding, Mike Pellauer Computer Science & Artificial Intelligence Laboratory Massachusetts Institute of Technology MIT-Nokia Architecture Group Helsinki, June 5, 2006

  2. Why architectural exploration • Architects are clever people and can think of a variety of designs • But often cannot determine which design is best for a given metric (e.g., power) • Too short of time and manpower to go far enough with several designs for proper evaluation  Guess work instead of architectural exploration New design tools can change all that

  3. This talk • Architectural exploration of 802.11a transmitter • The goal is to show that it is easy and economical to do so in Bluespec • You don’t have to know 802.11a or Bluespec to understand the talk

  4. Depending upon the transmission rate, consumes 1, 2 or 4 tokens to produce one OFDM symbol Cyclic Extend Controller Scrambler Encoder Interleaver Mapper IFFT IFFT Transforms 64 (frequency domain) complex numbers into 64 (time domain) complex numbers One OFDM symbol (64 Complex Numbers) accounts for > 95% area 802.11a Transmitter Overview headers Must produce one OFDM symbol every 4 msec 24 Uncoded bits data

  5. + + Radix 4 in0 out0 Radix 4 - - Radix 4 in1 out1 Radix 4 Permute_1 x16 out2 in2 Radix 4 Radix 4 Radix 4 Permute_3 Permute_2 in3 out3 + + … … out4 in4 Radix 4 Radix 4 … … * t0 out63 in63 - - * t1 * t2 *j * t3 Combinational IFFT All numbers are complex and represented as two sixteen bit quantities. Fixed-point arithmetic is used to reduce area, power, ...

  6. Design Tradeoffs • We can decrease the area by multiplexing some circuits It may be a win if the throughput requirements can be met without increasing the frequency • Power can be lowered by lowering the frequency, which can be adjusted by changing the voltage power  (voltage)2

  7. Radix 4 in0 out0 Radix 4 Radix 4 out1 in1 Radix 4 Permute_1 x16 in2 out2 Radix 4 Radix 4 Radix 4 Permute_2 Permute_3 in3 out3 … … in4 out4 Radix 4 Radix 4 … … in63 out63 Combinational IFFTOpportunity for reuse Reuse the same circuit three times

  8. Radix 4 in0 out0 … Permute_2 Permute_3 Permute_1 in1 out1 Radix 4 in2 out2 in3 out3 in4 out4 … … in63 out63 Circular pipeline: Reusing the Pipeline Stage 64, 4-way Muxes Stage Counter 16 Radix 4s can be shared but not the three permutations. Hence the need for muxes

  9. in0 out0 Radix 4 in1 out1 4, 16-way Muxes in2 Permute_1 out2 64, 4-way Muxes in3 out3 in4 out4 … … Index Counter 0 to 15 4, 16-way DeMuxes Permute_2 in63 out63 Stage Counter 0 to 2 Permute_3 Superfolded circular pipeline: Just one Radix-4 node! Designs with 2, 4, and 8 Radix-4 modules make sense too!

  10. Which design consumes the least energy to transmit a symbol? • Can we quickly code up all the alternatives? • single source with parameters? Not practical in traditional hardware description languages like Verilog/VHDL

  11. Expressing the designs in Bluespec

  12. + + - - + + * - - * * *j * Bluespec code: Radix-4 Node function Vector#(4,Complex) radix4(Vector#(4,Complex) t, Vector#(4,Complex) k); Vector#(4,Complex) m = newVector(), y = newVector(), z = newVector(); m[0] = k[0] * t[0]; m[1] = k[1] * t[1]; m[2] = k[2] * t[2]; m[3] = k[3] * t[3]; y[0] = m[0] + m[2]; y[1] = m[0] – m[2]; y[2] = m[1] + m[3]; y[3] = i*(m[1] – m[3]); z[0] = y[0] + y[2]; z[1] = y[1] + y[3]; z[2] = y[0] – y[2]; z[3] = y[1] – y[3]; return(z); endfunction Polymorphic code: works on any type of numbers for which *, + and - have been defined

  13. Radix 4 in0 out0 Radix 4 Radix 4 out1 in1 Radix 4 Permute_1 x16 in2 out2 Radix 4 Radix 4 Radix 4 Permute_2 Permute_3 in3 out3 … … in4 out4 Radix 4 Radix 4 … … in63 out63 Combinational IFFTCan be used as a reference stage_f function repeat it three times

  14. Bluespec Code for Combinational IFFT function SVector#(64, Complex) ifft (SVector#(64, Complex) in_data); //Declare vectors SVector#(4,SVector#(64, Complex)) stage_data = replicate(newSVector); stage_data[0] = in_data; for (Integer stage = 0; stage < 3; stage = stage + 1) stage_data[i+1] = stage_f(stage, stage_data[i]); return(stage_data[3]); function SVector#(64, Complex) stage_f(Bit#(2) stage, SVector#(64, Complex) stage_in); begin for (Integer i = 0; i < 16; i = i + 1) begin Integer idx = i * 4; let twid = getTwiddle(stage, fromInteger(i)); let y = radix4(twid, stage_in[idx:idx+3]); stage_temp[idx] = y[0]; stage_temp[idx + 1] = y[1]; stage_temp[idx + 2] = y[2]; stage_temp[idx + 3] = y[3]; end //Permutation for (Integer i = 0; i < 64; i = i + 1) stage_out[i] = stage_temp[permute[i]]; end return(stage_out); The code is unfolded to generate a combinational circuit Stage function

  15. x inQ sReg1 sReg2 outQ f1 f2 f3 Synchronous pipeline rule sync-pipeline (True); inQ.deq(); sReg1 <= f1(inQ.first()); sReg2 <= f2(sReg1); outQ.enq(f3(sReg2)); endrule This is real IFFT code; just replace f1, f2 and f3 with stage_f code

  16. f f1 f2 f3 Folded pipeline x inQ outQ stage sReg function f (stage,sx); case (stage) 1: return f1(sx); 2: return f2(sx); 3: return f3(sx); endcase endfunction rule folded-pipeline (True); if (stage==1) begininQ.deq(); sxIn= inQ.first(); end else sxIn= sReg; sxOut = f(stage,sxIn); if (stage==3) outQ.enq(sxOut); else sReg <= sxOut; stage <= (stage==3)? 1 : stage+1; endrule This is real IFFT code too ...

  17. Expressing these designs in Bluespec is easy • All these designs were done in less than one day! • Area and power estimates? How long will it take to write these designs in Verilog? VHDL? SystemC?

  18. Power estimation tool Place & Route Physical Tapeout Bluespec Tool flow Bluespec SystemVerilog source Bluespec Compiler Verilog 95 RTL C CycleAccurate Bluespec C sim Verilog sim RTL synthesis VCD output gates Debussy Visualization FPGA Sequence Design PowerTheater

  19. 802.11a Transmitter Synthesis results for various IFFT designs TSMC .18 micron; numbers reported are before place and route. Some areas will be larger after layout.

  20. Radix 4 in0 out0 Radix 4 Radix 4 out1 in1 Radix 4 Permute_1 x16 in2 out2 Radix 4 Radix 4 Radix 4 Permute_2 Permute_3 in3 out3 … … in4 out4 Radix 4 Radix 4 … … in63 out63 Algorithmic Improvements 1. All the three permutations can be made identical  more saving in area 2. One multiplication can be removed from Radix-4

  21. 802.11a Transmitter Synthesis results: old vs. new IFFT designs ??? expected TSMC .18 micron; numbers reported are before place and route.

  22. 802.11a Transmitter Synthesis results with new IFFT designs TSMC .18 micron; numbers reported are before place and route.

  23. 802.11a Transmitter with new IFFT designs: Power Estimates Work in progress c3 = min clock x scaling factor; c4 is raw data collected by the Sequence Design PowerTheater c5 = c4xc3/100MHz/voltage scaling(=10); c6 = c5x4 sec

  24. Summary • It is essential to do architectural exploration for better (area, power, performance, ...) designs. • It is possible to do so with new design tools and methodologies. • Better and faster tools for estimating area, timing and power would dramatically increase our capability to do architectural exploration. Thanks

More Related