1 / 22

L6: Lower Power Architecture Design

L6: Lower Power Architecture Design. 1999. 8.2 성균관대학교 조 준 동 교수 http://vada.skku.ac.kr. Through WAVE PIPELINING. Wave-pipelining on FPGA. Pipeline 의 문제점 Balanced partitioning Delay element overhead Tclk > Tmax - Tmin + clock skew + setup/hold time Area, Power, 전체 지연시간의 증가

gina
Download Presentation

L6: Lower Power Architecture Design

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. L6:Lower Power Architecture Design 1999. 8.2 성균관대학교 조 준 동 교수 http://vada.skku.ac.kr SungKyunKwan Univ.

  2. Through WAVE PIPELINING SungKyunKwan Univ.

  3. Wave-pipelining on FPGA • Pipeline의 문제점 • Balanced partitioning • Delay element overhead • Tclk > Tmax - Tmin + clock skew + setup/hold time • Area, Power, 전체 지연시간의 증가 • Clock distribution problem • Wavepipelining = high throughput w/o such overhead =Ideal pipelining SungKyunKwan Univ.

  4. FPGA on WavePipeline • LUT의 delay는 다양한 logic function에서도 비슷하다. • 동일delay를 구성할 수 있다. • FPGA element delay (wire, LUT, interconnection) • Powerful layout editor • Fast design cycle SungKyunKwan Univ.

  5. WP advantages • Area efficient - register, clock distribution network & clock buffer 필요 없음. • Low power dissipation • Higher throughput • Low latency SungKyunKwan Univ.

  6. Disadvantage • Degraded performance in certain case • Difficult to achieve sharp rise and fall time in synchronous design • Layout is critical for balancing the delay • Parameter variation - power supply and temperature dependence SungKyunKwan Univ.

  7. Experimental Results By 이재형, SKKU SungKyunKwan Univ.

  8. Observation • WP multiplier는 delay를 조절하기 위한 LUTs의 추가가 많아서 전력소모 면에서 큰 이득은 보지 못했다. • FPGA에서 delay를 조절하기 위해 LUTs나 net delay를 사용하지 않고 별도의 delay 소자를 사용하면 보다 효과적 • 또한, 동일한 level을 가지는 multiplier를 설계하면 WP 구현이 용이하고 pipeline 구조보다 전력소모나 면적에서 큰 이득을 얻을 수 있을 것이다. SungKyunKwan Univ.

  9. VON NEUMANN VERSUS HARVARD SungKyunKwan Univ.

  10. Power vs Area of Micro-coded Microprocessor 1.5V and 10MHz clock rate: instruction and data memory accesses account for 47% of the total power consumption. SungKyunKwan Univ.

  11. Memory Architecture SungKyunKwan Univ.

  12. Exploiting Locality for Low-Power Design • A spatially local cluster: group of algorithm operations that are tightly connected to each other in the flow graph representation. • Two nodes are tightly connected to each other on the flow graph representation if the shortest distance between them, in terms of number of edges traversed, is low. • Power consumption (mW) in the maximally time-shared and fully-parallel versions of the QMF sub-band coder filter • Improvement of a factor of 10.5 at the expense of a 20% increase in area • The interconnect elements (buses, multiplexers, and buffers) consumes 43% and 28% of the total power in the time-shared and parallel versions. SungKyunKwan Univ.

  13. Cascade filter layouts (a)Non-local implementation from Hyper (b)Local implementation from Hyper-LP SungKyunKwan Univ.

  14. Frequency Multipliers and Dividers SungKyunKwan Univ.

  15. Low Power DSP • Instruction Buffer (또는 Cache) locality 이용 Program memory의 access를 줄인다. • Decoded Instruction Buffer • LOOP의 첫번째 iteration의 decoding결과를 RAM에 저장한 후 재사용 • Fetch/Decoding 과정을 제거 • 30~40% Power Saving SungKyunKwan Univ.

  16. Stage-Skip Pipeline • The power savings is achieved by stopping the instruction fetch and decode stages of the processor during the loop execution except its first iteration. • DIB = Decoded Instruction Buffer • 40 % power savings using DSP or RISC processor. SungKyunKwan Univ.

  17. Stage-Skip Pipeline • Selector: selects the output from either the instruction decoder or DIB • The decoded instruction signals for a loop are temporarily stored in the DIB and are reused in each iteration of the loop. • The power wasted in the conventional pipeline is saved in our pipeline by stopping the instruction fetching and decoding for each loop execution. SungKyunKwan Univ.

  18. Stage-Skip Pipeline Majority of execution cycles in signal processing programs are used for loop execution : 40% reduction in power with area increase 2%. SungKyunKwan Univ.

  19. Two’s complement implementation of an accumulator SungKyunKwan Univ.

  20. Sign magnitude implementation of an accumulator. SungKyunKwan Univ.

  21. Number representation trade-off for arithmetic SungKyunKwan Univ.

  22. Signal statistics for Sign Magnitude implementation of the accumulator datapath assuming random inputs. SungKyunKwan Univ.

More Related