Create Presentation
Download Presentation

Download Presentation
## Low Power Architecture and Implementation of Multicore Design

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Low Power Architecture and Implementation of Multicore**Design Khushboo Sheth, Kyungseok Kim Fan Wang, Siddharth Dantu Advisor: Dr. V Agrawal ELEC6270 Low Power Design of Electronic Circuits Team Project VLSI D&T Seminar Nov. 8 2006**Project Objectives**• Design and verify 16-bit ALU with synchronous clocked inputs and outputs. • Study low-voltage power and delay characteristics of the design. • Redesign ALU for minimum power and highest speed.**Component of Power Dissipation**• Dynamic • Power due to Signal transitions. • Logic power (due to logic transitions). • Glitch power (due to glitches). • Short Circuit power • Static • Leakage power (due to leakage currents).**Power components in CMOS circuit**Ron Dynamic power VDD Leakage power vi (t) vo(t) Short circuit power CL R=large Ground Power = CVDD2**1-bit ALU Core**Reg B Reg C Reg A 1-bit ALU Design**A**NX156 C B NX60 Z Combinational Logic NX16 CY CYIN DFF NX80 CLK 1-bit ALU Core Timing ( Vdd=2.5V ) opcode[3:0] COMPOUT opcode 1010 (nand) opcode 1001 (c<=b) opcode 1000 (c<=a) opcode 0111 (and) opcode 0110 (or) opcode 0101 (nor) opcode 0100 (xor) opcode 0011 (not equal) opcode 0010 (equal) opcode 0001 (a-b) opcode 0000 (a+b) opcode others (all zero’s output) Longest Path in Combinational Logic: c <= a+b (Opcode 0000) C CY Z COMPOUT**1-bit ALU Core Sweep Vdd from 2.5V to 0V**2.5V 2.0V 1.5V 1.0V 0.5V 0.0V Analog Mode C(NX156) Output Vdd=2.5 Vdd=0.5**Vsupply = 0.80 V**(Analog Domain) Vsupply = 0.85 V (Analog Domain) Overshoot opcode 1000 (c<=a) Ripples Output Output Input Input Vsupply = 0.80 V Wrong Operation Vsupply = 0.85 V Correct Operation 1Bit ALU Core Logic Operation Voltage @200Mz Supply Voltage Sweep near PMOS Vth = -0.5625 V ( ver. NMOS Vth= 0.365) Sweep From Vsupply = 0.50 to 1.00 Volt ( linear increment 0.05 V, 11 point)**354.563**2.2493 179.9153 1.4203 82.8828 0.4955 31.0283 0.4123 0.7204 0.5427 0.0 0.5 1.0 1.5 2.0 2.5 1-bit ALU Average Power vs. Delay @200MHz 1bit ALU Block Average Power 1-bit ALU Core Average Power 1-bit ALU Core Delay Power =CVDD2**16 Bit ALU (Single Core) Design**Combinational Logic (16-Bit ALU) Output Register Input Register Cref CK Supply voltage = Vref Total capacitance switched per cycle = Cref Clock frequency = f Power consumption: Pref = CrefVref2f**16-BIT ALU Vectors***Vector4 activate the critical path, carryout = 1**16-Bit ALU Simulation Result**Circuit information: # 694 Gates Clock Frequency applied: 10 MHz Temperature: 27C o Vectors Applied: 6 vectors TSMC025 Technology : Vthn = 0.365 V, Vthp = -0.562 V By ELDO, SPICE simulation Simulation Time: 700 ns**16 Bit ALU Functional Correct Operation at 2.5 V, 1.25 V,**0.85 V and 0.625 V for 6 Vectors**Circuit fail @0.45 V (< Vth)**Simulated Single Vector Pair**16-Bit ALU Power Savings and Delay Increase with Reference @**2.5 Volts**16 Bit ALU Power Savings and Delay Increase with Reference**@1.25 Volts**Different Technology Impact On Power Saving**16 Bit ALU Simulation Setup: • Supply Voltage: 2.5v • Simulation Transient Time: 700 ns • 6 vectors • Temperature: 27Co**Temperature Influence On Power**• Circuit information: # 734 Gates • Clock Frequency applied: 10 MHz ; Vdd=2.5V • Vectors Applied: 6 vectors • Simulation Time: 700 ns • TSMC035 Technology**Multicore Design Methodology**• Lower supply voltage • This slows down circuit speed • Use parallel computing to gain the speed back • Multi-core means to place two or more complete cores within a single module. • This architecture is a “divide and conquer” strategy. By splitting the work between multiple execution cores , a multi-core design can perform more work within a given clock cycle. • About more than 60% reduction in power is observed. Source: http://www.eng.auburn.edu/~vagrawal/D&TSEMINAR_SPR06/SLIDES/Agrawal_DTSem06.ppt**Rgst**Rgst Rgst Register Parallel Architecture Comb. Logic Copy 1 f/4 16 Bit ALU Comb. Logic Copy 2 Output Input f/4 4 to 1 multiplexer Comb. Logic Copy 3 Rgst f f/4 Ck3 Comb. Logic Copy 4 Ck2 Ck1 f/4 Ck0 Mux control CK**Control Signals, N = 4**CK Phase 1 Phase 2 Phase 3 Phase 4 Mux control 00 01 10 11 00 01 01 10 11 ……**16 Bit ALU Multi-core Power Savings and Delay Increase with**Reference @2.5 Volts Circuit information: # 2617 Gates Clock Frequency applied: 10 MHz Temperature: 27C Vectors Applied: 6 vectors TSMC025 Technology : Vthn = 0.365 V, Vthp = -0.562 V Simulator: ELDO(Spice) Simulation Setup: Simulation Time: 700 ns**16 Bit ALU Multicore Power Savings and Delay Increase with**Reference @1.25 Volts**Power and Delay comparison @2.5 V Reference Design with**Multicore Design at different voltages**Summary**• For Single core ALU design we get more than 60% power savings at reduced voltage but at the cost of performance. • With Reference of 2.5 Volts we observe power drops faster than 1/Vsquare. • With Reference of 1.25 Volts, power drop is almost equal to 1/Vsquare. • Multi-core design helps to gain the speed back at reduced voltage and consumes less power.**References**• ELEC6270 Low Power Design Electronics Class Slides from Dr. Agrawal • Spring 06, Dr. Agrawal’ Presentation on VLSI D&T seminar “Multi-Core Parallelism for Low-Power Design” • www.tomshardware.com • N. H. E. Weste and D. Harris, CMOS VLSI Design, Third Edition, Reading, Massachusetts, Addison-Wesley, 2005. • L. Shang, R.P Dick, “Thermal crisis: challenges and potential solutions,” Potentials IEEE, vol. 25 , Issue 5, 2006 • International Technology Roadmap for Semiconductors. http://public.itrs.net • Alokik Kanwal, “A review of Carbon Nanotube Field Effect Transistors” Version 2.0, 2003 • K. K Likharev, “Single Electron Devices and their applications,” Proc IIEEE, vol. 87, no. 4, pp. 606-632, Apr. 1999 • A. P. Chandrakasan and R. W. Brodersen, Low Power Digital CMOS Design, Boston: Kluwer Academic Publishers (Now Springer), 1995. • “Quad-core processor forecas”,Alexander Wolfe @TechWeb