1 / 17

Architectural Exploration: Area-Performance tradeoff in 802.11a Transmitter Arvind

Architectural Exploration: Area-Performance tradeoff in 802.11a Transmitter Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology. Depending upon the transmission rate, consumes 1, 2 or 4 tokens to produce one OFDM symbol. Cyclic Extend. Controller.

bella
Download Presentation

Architectural Exploration: Area-Performance tradeoff in 802.11a Transmitter Arvind

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Architectural Exploration: Area-Performance tradeoff in 802.11a Transmitter Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology http://csg.csail.mit.edu/arvind

  2. Depending upon the transmission rate, consumes 1, 2 or 4 tokens to produce one OFDM symbol Cyclic Extend Controller Scrambler Encoder Interleaver Mapper IFFT IFFT Transforms 64 (frequency domain) complex numbers into 64 (time domain) complex numbers One OFDM symbol (64 Complex Numbers) accounts for 85% area 802.11a Transmitter Overview headers Must produce one OFDM symbol every 4 msec 24 Uncoded bits data http://csg.csail.mit.edu/arvind

  3. Preliminary results[MEMOCODE 2006] Dave, Gerding, Pellauer, Arvind Design Lines of Relative Block Code (BSV) Area Controller 49 0% Scrambler 40 0% Conv. Encoder 113 0% Interleaver 76 1% Mapper 112 11% IFFT 95 85% Cyc. Extender 23 3% Complex arithmetic libraries constitute another 200 lines of code http://csg.csail.mit.edu/arvind

  4. Bfly4 in0 out0 Bfly4 Permute Bfly4 out1 in1 Bfly4 x16 out2 in2 Bfly4 Bfly4 Bfly4 Permute Permute in3 out3 … … in4 out4 Bfly4 Bfly4 … … in63 out63 Reuse the same circuit three times to reduce area Combinational IFFT http://csg.csail.mit.edu/arvind

  5. f f g f g Design Alternatives Reuse a block over multiple cycles we expect: Throughput to Area to decrease – less parallelism decrease – reusing a block The clock needs to run faster for the same throughput  hyper-linear increase in energy http://csg.csail.mit.edu/arvind

  6. Bfly4 in0 out0 Permute … in1 out1 Bfly4 in2 out2 in3 out3 in4 out4 … … in63 out63 Circular pipeline: Reusing the Pipeline Stage Stage Counter http://csg.csail.mit.edu/arvind

  7. in0 out0 in1 out1 Permute in2 out2 in3 out3 in4 out4 … … in63 out63 Superfolded circular pipeline: Just one Bfly-4 node! Bfly4 64, 2-way Muxes Stage 0 to 2 4, 16-way Muxes 4, 16-way DeMuxes Index: 0 to 15 Index == 15? http://csg.csail.mit.edu/arvind

  8. Combinational f1 f2 f3 C inQ outQ Pipeline f1 f2 f3 P inQ outQ Folded Pipeline f FP inQ outQ Clock? Area? Throughput? Pipelining a block Clock: C < P  FP Area: FP < C < P Throughput: FP < C < P http://csg.csail.mit.edu/arvind

  9. x inQ sReg1 sReg2 outQ f1 f2 f3 Synchronous pipeline This rule can fire only if rule sync-pipeline (True); inQ.deq(); sReg1 <= f1(inQ.first()); sReg2 <= f2(sReg1); outQ.enq(f3(sReg2)); endrule - inQ has an element - outQ has space Atomicity: Either all or none of the state elements inQ, outQ, sReg1 and sReg2 will be updated This is real IFFT code; just replace f1, f2 and f3 with stage_f code http://csg.csail.mit.edu/arvind

  10. Stage functions f1, f2 and f3 function f1(x); return (stage_f(1,x)); endfunction function f2(x); return (stage_f(2,x)); endfunction function f3(x); return (stage_f(3,x)); endfunction The stage_f function was given earlier http://csg.csail.mit.edu/arvind

  11. x inQ sReg1 sReg2 outQ f1 f2 f3 Problem: What about pipeline bubbles? Red and Green tokens must move even if there is nothing in the inQ! rule sync-pipeline (True); inQ.deq(); sReg1 <= f1(inQ.first()); sReg2 <= f2(sReg1); outQ.enq(f3(sReg2)); endrule Also if there is no token in sReg2 then nothing should be enqueued in the outQ Valid bits or the Maybe type Modify the rule to deal with these conditions http://csg.csail.mit.edu/arvind

  12. data valid/invalid The Maybe type data in the pipeline typedef union tagged { void Invalid; data_T Valid; } Maybe#(type data_T); Registers contain Maybe type values rule sync-pipeline (True); if (inQ.notEmpty()) begin sReg1 <= Valid f1(inQ.first()); inq.deq(); end elsesReg1 <= Invalid; case (sReg1) matches tagged Valid .sx1: sReg2 <= Valid f2(sx1); tagged Invalid: sReg2 <= Invalid; case (sReg2) matches tagged Valid .sx2: outQ.enq(f3(sx2)); endrule http://csg.csail.mit.edu/arvind

  13. x inQ outQ stage sReg notice stage is a dynamic parameter now! f Folded pipeline The same code will work for superfolded pipelines by changing n and stage function f rule folded-pipeline (True); if (stage==0) beginsxIn= inQ.first(); inQ.deq(); end else sxIn= sReg; sxOut = f(stage,sxIn); if (stage==n-1) outQ.enq(sxOut); else sReg <= sxOut; stage <= (stage==n-1)? 0 : stage+1; endrule no for-loop Need type declarations for sxIn and sxOut http://csg.csail.mit.edu/arvind

  14. The same source code 802.11a Transmitter Synthesis results (Only the IFFT block is changing) All these designs were done in less than 24 hours! TSMC .18 micron; numbers reported are before place and route. http://csg.csail.mit.edu/arvind

  15. Why are the areas so similar • Folding should have given a 3x improvement in IFFT area • BUT a constant twiddle allows low-level optimization on a Bfly-4 block • a 2.5x area reduction! http://csg.csail.mit.edu/arvind

  16. x inQ sReg[1] sReg[n-1] outQ f1 fn Parameterize the synchronous pipeline n and stage are static parameters Vector#(n, Reg#(t)) sReg <- replicateM(mkReg(Invalid)); rule sync-pipeline (True); if (inQ.notEmpty()) begin (sReg[1]) <= Valid f(0,inQ.first()); inq.deq(); end else (sReg[1]) <= Invalid; for (Integer stage = 1; stage < n-1; stage = stage+1) case (sReg[stage]) matches tagged Valid .sx: (sReg[stage+1]) <= Valid f(stage,sx); tagged Invalid : (sReg[stage+1]) <= Invalid; endcase case (sReg[n-1]) matches tagged Valid .sx: outQ.enq(f(n-1,sx)); endcase endrule http://csg.csail.mit.edu/arvind

  17. Syntax: Vector of Registers • Register • suppose xandyare both of type Reg. Then x <= ymeans x._write(y._read()) • Vector of (say) Int • x[i] means sel(x,i) • x[i] = y[j] means x = update(x,i, sel(y,j)) • Vector of Registers • x[i] <= y[j] does not work. The parser thinks it means (sel(x,i)._read)._write(sel(y,j)._read), which will not type check • (x[i]) <= y[j] does work! http://csg.csail.mit.edu/arvind

More Related