李海娥

Efficient search space exploration and feature description based on the floating point arithmetic of FPGA 李海娥

1 Introduction • IP reuse; • parameterized IP(number,pipeline,resource) meet the needs of many different designs and reduce design time and risk); • Power • Floating point arithmetic Tianjin University

2 the purpose How to set the parameters of the floating point to meet requirements: • Meet the function • Save resources • Running time • Reduce the power dissipation Tianjin University

3 the method • Fully pipelined single precision floating point multiply and adder; • Different parameters(LUTS or DSP, latency, ports); • Latency：describes the number of clock cycles between an operand input and the result output. Tianjin University

the design architecture Figure 1. Data-path designs for MAC c=c+ai*bi, i=1..N. Tianjin University

A Signal operation One AM One ADD Tianjin University

B combinations 1 resources(slice luts、registers) 2 power 3 the maximum frequency of the combinations is less then the related AM and ADD Tianjin University

the two optimization models • design space of the system includes: power consumption resource usage the speed The latency • P=fp* ( PAM + PADD) /(100MHZ) • L=m* LAM +n* LADD • R=m* RAM +n* RADD • F< =min (FAM, FADD) • Running time constraints Tianjin University

Running time constraints C=N/zm-1 + Lm + La *(log2za+1) + La*(log2La+1) za=zm=2z C – fp*T<=0 Tianjin University

3.1.1 linear models Tianjin University

3.1.2 Modeling based on regression analysis Modeling based on regression analysis • x=intvar(1); • y=intvar(1); • z=intvar(1); • fp=sdpvar(1); • f=0.01*fp*1.75*(2^z)*(x+11)+0.01*fp*1.04545*(2^z)*(y+15.08696); • F=[0<=fp<=min(36.28571*(x+2.12598),30.78182*(y+2.30597)),x+(z+1)*y+2^(10-z)+y*(log(y)/log(2)+1)<3*fp,(2^z)*639.14286+(2^z)*2.69752*(y+146.01506)<=6000,(2^z)*92*(x+0.06677)+(2^z)*47.67273*(y+0.18993)<=6000,2<=y<=12,2<=x<=8,0<=z<=4]; • % options = sdpsettings('verbose',1,'solver','bmibnb'); • solvesdp(F,f); • disp(double(f)); • disp(double(x)); • disp(double(y)); • disp(double(2^z)); • disp(double(fp)); • disp(double(1000/fp)); The number of the floating point multiply and adder 2z Tianjin University

3.2 the polynomial models Tianjin University

3.2.1 Modeling based on Simulated annealing • Parameters x[2,8,0],y[2,12,0],z[0,4,0],rf[0,500]; • MinFunction 0.01*rf*(2^z)*(93.73807-89.77205*x+40.53269*x^2-8.26548*x^3+0.7934*x^4-0.02916*x^5)+0.01*rf*(2^z)*(-10.71794+24.35363*y-7.40564*y^2+1.08772*y^3-0.07455*y^4+ 0.00192*y^5); • (2^z)*(-3384.85647+5039.78552*x-2312.30642*x^2+494.61355*x^3-50.02651*x^4+ 1.93333*x^5)+(2^z)*(122.23825+239.21494*y-68.6165*y^2+8.59996*y^3-0.47702*y^4+ 0.00944*y^5)<=6000; • (2^z)*(2223.14264-2720.01336*x+1270.69305*x^2-262.053*x^3+25.30682*x^4-0.93333*x^5) +(2^z)*(-219.77249+252.99128*y-73.2325*y^2+12.26375*y^3-0.9434*y^4+0.02675*y^5)<=6000; • x+(z+1)*y+2^(10-z)+y*(log(y)/log(2)+1)<=1.0*rf; • rf<=min(-54.8162+151.86111*x-42.53182*x^2+9.09894*x^3-1.06665*x^4+0.04994*x^5,88.24364-24.82321*y+28.04692*y^2-5.27886*y^3+0.43151*y^4-0.01299*y^5); Tianjin University

4 the verification • The verification is based on the dot product: • 1024*1024 1 Slice luts<=9000; Slice registers<=9000; running time constraints: 0.9ms,1.0ms,1.1ms…… 2 Slice luts<=6000; Slice registers<=6000; running time constraints: 0.9ms,1.0ms,1.1ms…… Tianjin University

Result 1 Tianjin University

Result 2 Tianjin University

谢谢大家 ！

李海娥

李海娥

Presentation Transcript