1 / 28

Combinational Circuits in Bluespec Arvind Computer Science & Artificial Intelligence Lab

Combinational Circuits in Bluespec Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology. Now we call this Guarded Atomic Actions. Bluespec: Two-Level Compilation. Bluespec (Objects, Types, Higher-order functions). Lennart Augustsson

dena
Download Presentation

Combinational Circuits in Bluespec Arvind Computer Science & Artificial Intelligence Lab

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Combinational Circuits in Bluespec Arvind Computer Science & Artificial Intelligence Lab Massachusetts Institute of Technology http://csg.csail.mit.edu/6.375

  2. Now we call this Guarded Atomic Actions Bluespec: Two-Level Compilation Bluespec (Objects, Types, Higher-order functions) Lennart Augustsson @Sandburst 2000-2002 • Type checking • Massive partial evaluation and static elaboration Level 1 compilation Rules and Actions (Term Rewriting System) • Rule conflict analysis • Rule scheduling Level 2 synthesis James Hoe & Arvind @MIT 1997-2000 Object code (Verilog/C) http://csg.csail.mit.edu/6.375

  3. elaborate w/params Software Toolflow: Hardware Toolflow: source source compile .exe design1 design2 design3 run w/ params run w/ params run1 run run1 run1.1 run1 run1 run2.1 run1 run1 run3.1 run1 run1 … … … … Static Elaboration At compile time • Inline function calls and unroll loops • Instantiate modules with specific parameters • Resolve polymorphism/overloading, perform most data structure operations http://csg.csail.mit.edu/6.375

  4. + + Bfly4 in0 out0 Bfly4 - - Permute Bfly4 in1 out1 Bfly4 x16 out2 in2 Bfly4 Bfly4 Bfly4 Permute Permute in3 out3 + + … … in4 out4 Bfly4 Bfly4 … … * t0 out63 in63 - - * t1 * t2 *j * t3 Combinational IFFT All numbers are complex and represented as two sixteen bit quantities. Fixed-point arithmetic is used to reduce area, power, ... http://csg.csail.mit.edu/6.375

  5. t0 t1 t2 t3 + + k0 k1 k2 k3 - - + + * - - * * *i * 4-way Butterfly Node function Vector#(4,Complex) bfly4 (Vector#(4,Complex) t, Vector#(4,Complex) k); • BSV has a very strong notion of types • Every expression has a type. Either it is declared by the user or automatically deduced by the compiler • The compiler verifies that the type declarations are compatible http://csg.csail.mit.edu/6.375

  6. + + - - + + m z y * - - * * *i * BSV code: 4-way Butterfly function Vector#(4,Complex) bfly4 (Vector#(4,Complex) t, Vector#(4,Complex) k); Vector#(4,Complex) m, y, z; m[0] = k[0] * t[0]; m[1] = k[1] * t[1]; m[2] = k[2] * t[2]; m[3] = k[3] * t[3]; y[0] = m[0] + m[2]; y[1] = m[0] – m[2]; y[2] = m[1] + m[3]; y[3] = i*(m[1] – m[3]); z[0] = y[0] + y[2]; z[1] = y[1] + y[3]; z[2] = y[0] – y[2]; z[3] = y[1] – y[3]; return(z); endfunction Polymorphic code: works on any type of numbers for which *, + and - have been defined Note: Vector does not mean storage http://csg.csail.mit.edu/6.375

  7. Complex Arithmetic • Addition • zR = xR+ yR • zI = xI+ yI • Multiplication • zR = xR* yR- xI*yI • zI = xR* yI+ xI*yR The actual arithmetic for FFT is different because we use a non-standard fixed point representation http://csg.csail.mit.edu/6.375

  8. What is the type of this + ? BSV code for Addition typedef struct{ Int#(t) r; Int#(t) i; } Complex#(numeric type t) deriving (Eq,Bits); function Complex#(t) \+ (Complex#(t) x, Complex#(t) y); Int#(t) real = x.r + y.r; Int#(t) imag = x.i + y.i; return(Complex{r:real, i:imag}); endfunction http://csg.csail.mit.edu/6.375

  9. Bfly4 in0 out0 Bfly4 Permute Bfly4 in1 out1 Bfly4 x16 out2 in2 Bfly4 Bfly4 Bfly4 Permute Permute out3 in3 … … in4 out4 Bfly4 Bfly4 … … in63 out63 Combinational IFFT stage_f function function Vector#(64, Complex) stage_f (Bit#(2) stage, Vector#(64, Complex) stage_in); function Vector#(64, Complex) ifft (Vector#(64, Complex) in_data); repeat stage_f three times http://csg.csail.mit.edu/6.375

  10. BSV Code: Combinational IFFT function Vector#(64, Complex) ifft (Vector#(64, Complex) in_data); //Declare vectors Vector#(4,Vector#(64, Complex)) stage_data; stage_data[0] = in_data; for (Integer stage = 0; stage < 3; stage = stage + 1) stage_data[stage+1] = stage_f(stage,stage_data[stage]); return(stage_data[3]); The for-loop is unfolded and stage_f is inlined during static elaboration Note: no notion of loops or procedures during execution http://csg.csail.mit.edu/6.375

  11. BSV Code: Combinational IFFT- Unfolded function Vector#(64, Complex) ifft (Vector#(64, Complex) in_data); //Declare vectors Vector#(4,Vector#(64, Complex)) stage_data; stage_data[0] = in_data; for (Integer stage = 0; stage < 3; stage = stage + 1) stage_data[stage+1] = stage_f(stage,stage_data[stage]); return(stage_data[3]); stage_data[1] = stage_f(0,stage_data[0]); stage_data[2] = stage_f(1,stage_data[1]); stage_data[3] = stage_f(2,stage_data[2]); Stage_f can be inlined now; it could have been inlined before loop unfolding also. Does the order matter? http://csg.csail.mit.edu/6.375

  12. twid’s are mathematically derivable constants Bluespec Code for stage_f function Vector#(64, Complex) stage_f (Bit#(2) stage, Vector#(64, Complex) stage_in); begin for (Integer i = 0; i < 16; i = i + 1) begin Integer idx = i * 4; let twid = getTwiddle(stage, fromInteger(i)); let y = bfly4(twid, stage_in[idx:idx+3]); stage_temp[idx] = y[0]; stage_temp[idx+1] = y[1]; stage_temp[idx+2] = y[2]; stage_temp[idx+3] = y[3]; end //Permutation for (Integer i = 0; i < 64; i = i + 1) stage_out[i] = stage_temp[permute[i]]; end return(stage_out); http://csg.csail.mit.edu/6.375

  13. Higher-order functions:Stage functions f1, f2 and f3 function f0(x); return (stage_f(0,x)); endfunction function f1(x); return (stage_f(1,x)); endfunction function f2(x); return (stage_f(2,x)); endfunction What is the type of f0(x) ? function Vector#(64, Complex) f0 (Vector#(64, Complex) x); http://csg.csail.mit.edu/6.375

  14. Bfly4 in0 out0 Bfly4 Permute Bfly4 in1 out1 Bfly4 x16 in2 out2 Bfly4 Bfly4 Bfly4 Permute Permute in3 out3 … … in4 out4 Bfly4 Bfly4 … … in63 out63 Reuse the same circuit three times to reduce area Suppose we want to reuse some part of the circuit ... But why? http://csg.csail.mit.edu/6.375

  15. Architectural Exploration: Area-Performance tradeoff in 802.11a Transmitter http://csg.csail.mit.edu/6.375

  16. Depending upon the transmission rate, consumes 1, 2 or 4 tokens to produce one OFDM symbol Cyclic Extend Controller Scrambler Encoder Interleaver Mapper IFFT IFFT Transforms 64 (frequency domain) complex numbers into 64 (time domain) complex numbers One OFDM symbol (64 Complex Numbers) accounts for 85% area 802.11a Transmitter Overview headers Must produce one OFDM symbol every 4 msec 24 Uncoded bits data http://csg.csail.mit.edu/6.375

  17. Preliminary results[MEMOCODE 2006] Dave, Gerding, Pellauer, Arvind Relative Area 0% 0% 0% 1% 11% 85% 3% Design Lines of Block Code (BSV) Controller 49 Scrambler 40 Conv. Encoder 113 Interleaver 76 Mapper 112 IFFT 95 Cyc. Extender 23 Complex arithmetic libraries constitute another 200 lines of code http://csg.csail.mit.edu/6.375

  18. Bfly4 in0 out0 Bfly4 Permute Bfly4 out1 in1 Bfly4 x16 in2 out2 Bfly4 Bfly4 Bfly4 Permute Permute out3 in3 … … in4 out4 Bfly4 Bfly4 … … out63 in63 Reuse the same circuit three times to reduce area Combinational IFFT http://csg.csail.mit.edu/6.375

  19. f f g f g Design Alternatives Reuse a block over multiple cycles we expect: Throughput to Area to decrease – less parallelism decrease – reusing a block The clock needs to run faster for the same throughput  hyper-linear increase in energy http://csg.csail.mit.edu/6.375

  20. Bfly4 in0 out0 Permute … in1 out1 Bfly4 in2 out2 in3 out3 in4 out4 … … in63 out63 Circular pipeline: Reusing the Pipeline Stage Stage Counter http://csg.csail.mit.edu/6.375

  21. in0 out0 in1 out1 Permute in2 out2 in3 out3 in4 out4 … … in63 out63 Superfolded circular pipeline: Just one Bfly-4 node! Bfly4 64, 2-way Muxes Stage 0 to 2 4, 16-way Muxes 4, 16-way DeMuxes Index: 0 to 15 Index == 15? http://csg.csail.mit.edu/6.375

  22. Combinational f1 f2 f3 C inQ outQ Pipeline f1 f2 f3 P inQ outQ Folded Pipeline f FP inQ outQ Clock? Area? Throughput? Pipelining a block Clock: C < P  FP Area: FP < C < P Throughput: FP < C < P http://csg.csail.mit.edu/6.375

  23. x inQ sReg1 sReg2 outQ f0 f1 f2 Inelastic pipeline This rule can fire only if rule sync-pipeline (True); inQ.deq(); sReg1 <= f0(inQ.first()); sReg2 <= f1(sReg1); outQ.enq(f2(sReg2)); endrule - inQ has an element - outQ has space Atomicity: Either all or none of the state elements inQ, outQ, sReg1 and sReg2 will be updated This is real IFFT code; just replace f0, f1 and f2 with stage_f code http://csg.csail.mit.edu/6.375

  24. Stage functions f1, f2 and f3 function f0(x); return (stage_f(0,x)); endfunction function f1(x); return (stage_f(1,x)); endfunction function f2(x); return (stage_f(2,x)); endfunction The stage_f function was given earlier http://csg.csail.mit.edu/6.375

  25. x inQ sReg1 sReg2 outQ f0 f1 f2 Problem: What about pipeline bubbles? Red and Green tokens must move even if there is nothing in the inQ! rule sync-pipeline (True); inQ.deq(); sReg1 <= f0(inQ.first()); sReg2 <= f1(sReg1); outQ.enq(f2(sReg2)); endrule Also if there is no token in sReg2 then nothing should be enqueued in the outQ Valid bits or the Maybe type Modify the rule to deal with these conditions http://csg.csail.mit.edu/6.375

  26. data valid/invalid sx1 will get bound to the appropriate part of sReg1 The Maybe type data in the pipeline typedef union tagged { void Invalid; data_T Valid; } Maybe#(type data_T); Registers contain Maybe type values rule sync-pipeline (True); if (inQ.notEmpty()) begin sReg1 <= tagged Valid f0(inQ.first()); inQ.deq(); end elsesReg1 <= tagged Invalid; case (sReg1) matches tagged Valid .sx1: sReg2 <= tagged Valid f1(sx1); tagged Invalid: sReg2 <= tagged Invalid; endcase case (sReg2) matches tagged Valid .sx2: outQ.enq(f2(sx2)); endcase endrule http://csg.csail.mit.edu/6.375

  27. When is this rule enabled? rule sync-pipeline (True); if (inQ.notEmpty()) begin sReg1 <= tagged Valid f0(inQ.first()); inQ.deq(); end elsesReg1 <= tagged Invalid; case (sReg1) matches tagged Valid .sx1: sReg2 <= tagged Valid f1(sx1); tagged Invalid: sReg2 <= tagged Invalid; endcase case (sReg2) matches tagged Valid .sx2: outQ.enq(f2(sx2)); endcase endrule inQ sReg1 sReg2 outQ inQ sReg1 sReg2 outQ NE V VNF NE V VF NE V I NF NE V I F NE I V NF NE I V F NE I I NF NE I I F yes No Yes Yes Yes No Yes yes E V VNF E V VF E V I NF E V I F EI V NF EI V F EI I NF EI I F yes No Yes Yes Yes No Yes1 yes Yes1 = yes but no change http://csg.csail.mit.edu/6.375

  28. Next lecture Folded pipeline for FFT http://csg.csail.mit.edu/6.375

More Related