1 / 8

Customizable Soft Vector Processors

Customizable Soft Vector Processors. Peter Yiannacouras, PhD Candidate Connections 2009. Soft Processors in FPGA Systems. Make FPGA technology more easily accessible. Optimize soft processor to application properties. Weeks. Months. Software + Compiler. HDL + CAD.

jabari
Download Presentation

Customizable Soft Vector Processors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Customizable Soft Vector Processors Peter Yiannacouras, PhD Candidate Connections 2009

  2. Soft Processors in FPGA Systems Make FPGA technology more easily accessible Optimize soft processor to application properties Weeks Months Software + Compiler HDL + CAD Used in 25% of designs [source: Altera, 2009] Soft Processor Custom HW Faster Smaller Less Power Easier COMPETE Configurable

  3. Data Level Parallelism Same operation • Commonly found in embedded systems Independent data // C code for(i=0;i<16; i++) c[i]=a[i]+b[i] c[15]=a[15]+b[15] c[14]=a[14]+b[14] Data Level Parallelism c[13]=a[13]+b[13] c[12]=a[12]+b[12] c[11]=a[11]+b[11] c[10]=a[10]+b[10] c[9]= a[9]+b[9] c[8]= a[8]+b[8] c[7]= a[7]+b[7] • Exploit using a Vector Processor c=a+b c[6]= a[6]+b[6] //Processor instructions load r0,a[1] load r1,b[1] add r2,r0,r1 store r2,c[1] c[5]= a[5]+b[5] c[4]= a[4]+b[4] c[3]= a[3]+b[3] c[2]= a[2]+b[2] c[1]= a[1]+b[1] c[0]= a[0]+b[0]

  4. Vector Processing Primer vadd // C code for(i=0;i<16; i++) c[i]=a[i]+b[i] // Vectorized code set vl,16 vload vr0,a vload vr1,b vadd vr2,vr0,vr1 vstore vr2,c vr2[15]=vr0[15]+vr1[15] vr2[14]=vr0[14]+vr1[14] vr2[13]=vr0[13]+vr1[13] vr2[12]=vr0[12]+vr1[12] vr2[11]=vr0[11]+vr1[11] vr2[10]=vr0[10]+vr1[10] vr2[9]= vr0[9]+vr1[9] vr2[8]= vr0[8]+vr1[8] vr2[7]= vr0[7]+vr1[7] vr2[6]= vr0[6]+vr1[6] vr2[5]= vr0[5]+vr1[5] vr2[4]= vr0[4]+vr1[4] Each vector instruction holds many units of independent operations vr2[3]= vr0[3]+vr1[3] vr2[2]= vr0[2]+vr1[2] vr2[1]= vr0[1]+vr1[1] vr2[0]= vr0[0]+vr1[0] 1 Vector Lane

  5. Vector Processing Primer 16x speedup vadd // C code for(i=0;i<16; i++) c[i]=a[i]+b[i] // Vectorized code set vl,16 vload vr0,a vload vr1,b vadd vr2,vr0,vr1 vstore vr2,c 16 Vector Lanes vr2[15]=vr0[15]+vr1[15] vr2[14]=vr0[14]+vr1[14] vr2[13]=vr0[13]+vr1[13] vr2[12]=vr0[12]+vr1[12] Implemented on an FPGA (Soft Vector Processor) Is it scalable? vr2[11]=vr0[11]+vr1[11] vr2[10]=vr0[10]+vr1[10] vr2[9]= vr0[9]+vr1[9] vr2[8]= vr0[8]+vr1[8] vr2[7]= vr0[7]+vr1[7] vr2[6]= vr0[6]+vr1[6] vr2[5]= vr0[5]+vr1[5] vr2[4]= vr0[4]+vr1[4] Each vector instruction holds many units of independent operations vr2[3]= vr0[3]+vr1[3] vr2[2]= vr0[2]+vr1[2] vr2[1]= vr0[1]+vr1[1] vr2[0]= vr0[0]+vr1[0]

  6. Soft Vector Processor Scalability 7 configurations: 14x speed, 9x area => coarse-grained! 9x 14x

  7. More Architectural Parameters Processor Architecture Instruction Set Architecture Memory System

  8. Fine-Grained Trade Off Space Memory System: Weak Moderate Good

More Related