Techniques for VLSI Circuit Optimization Considering Process Variations

Techniques for VLSI Circuit Optimization Considering Process Variations Mahalingam Venkataraman, PhD Defense Date: 3/23/2009 Mahalingam Venkataraman Department of Computer Science and Engineering University of South Florida, Tampa, FL, 33620 Chair: Prof. Babu Joseph Major Professor: Prof. Nagarajan Ranganathan Committee Members: Prof. Srinivas Katkoori Prof. Hao Zheng Prof. Justin E. Harlow Prof. Kandethody Ramachandran Prof. Sanjuktha Bhanja

Outline of Presentation Mahalingam Venkataraman, PhD Defense Date: 3/23/2009

VLSI Circuit Complexity • Transistor • Count Source: Intel Wolfdale 410 Mill. Yonah 151 Mill. Prescott 125 Mill. Northwood 55 Mill. Yonah, 151 Mill.

Nanometer Dimensions 1 m 10 cm 100 nm 1 mm 10 µm 100 µm 1 cm Source: Spektrum der Wissenschaften 65 nm Transistor Source: Intel Courtesy: Sill, PGPEE 2008

Process Variations Process variations, in general, refer to the difference between the intended and obtained values in voltage and process parameters prior and post fabrication of the circuit. The variations are more pronounced in nanometer era due to the limitations in fabrication equipment and lithography process Process variations in nanometer era has a impact on the failure probability and hence the timing yield of integrated circuits

VLSI Circuit Optimization • Circuit optimization in the nanometer era, is formally defined as the process of designing circuits with best possible power, delay and noise parameters • Common methods • Transistor/Gate Sizing, Wire sizing, Incremental placement • Multiple supply, threshold voltages, Buffer insertion • The relationship among the parameters are conflicting • Circuits with optimal power can have a poor performance and/or noise value • Process variations have made the relationships among the conflicting optimization objectives complex and hence more difficult to optimize

Motivation: Dissertation Research • Corner based circuit optimization ignoring variation effects can negatively impact timing yield • Worst case consideration of variations, guarantees good yield, but can lead to severe over design. • In this context, there is a strong need for re-invention of circuit optimization techniques in a statistical perspective. • The methodology • has to consider multiple conflicting objectives • model variation effects without assumptions regarding distributions • has to be efficient enough to handle large circuits. • Hence, in this dissertation, we model and develop novel statistical and runtime variation aware solutions for circuit optimization considering process variations.

Statistical Timing Analysis Static Timing Analysis Min/Max Min/Max Min/Max Min/Max Min/Max Element delay Circuit delay Statistical Timing Analysis • Variation awareness in VLSI started with PDF/CDF propagation • in timing analysis. • Circuit optimization frameworks were then built on top of the • SSTA engine to optimize performance considering variations. Element delay as PDF/CDF Circuit delay as PDF/CDF

Mathematical Programming based Circuit Optimization • SSTA based iterative circuit optimization require a number of complicated operations at each node and hence incur a prohibitive runtime [Schmidt, EJOR 2000, Karkowski, ICSS 1995]. • Hence, the authors in [Mani, DAC 2005, Mani ICCD 2004], proposed stochastic mathematical programming based circuit optimization. • Mathematical programs are fast and has the capability to handle large circuits • Several circuit optimization problems like gate sizing, buffer insertion and placement have well defined mathematical programming formulations • The stochastic programming technique is reasonably fast, but can be conservative in terms of yield and hence lesser savings in area or power [Buckley, IJFSS 1990]

Fuzzy Mathematical Programming (FMP) • FMP is a special case of Mathematical programming • with fuzzy variables in constraints or objective functions. • variations are modelled as fuzzy numbers. • Similar to stochastic programming, fuzzy programming involves a relaxation step • FMP has been used to model uncertainty in • scheduling, binding, testing, robotics, pattern matching and artificial intelligence. • A fuzzy number (linear, trapezoidal or non-linear) is defined as a number whose precise value is somewhat uncertain.

Motivation: Fuzzy Programming • The author, in [Buckley, IJFSS 1990], highlighted that fuzzy programming guarantees solutions better or at least as good as stochastic programming and proved the same using Monte-Carlo simulations. • The bound constraints in fuzzy programming allows the FMP to search for the optimal value instead of averaging a list of close to optimal values as in stochastic programming. • Fuzzy programming also handles variation parameter in the objective function as opposed to constraints in stochastic • Hence, we planned to use fuzzy programming based modeling and solution for uncertainty aware VLSI circuit optimization.

Motivation: Dynamic Clock Stretching • The proposed statistical design methods (fuzzy or stochastic) are quite effective in the presence of variations incurring reasonable overheads. • However, when there are no variations occuring in critical paths, the overheads still remain. • To avoid this, we investigate a completely different approach to handle process variations. • A dynamic delay detection and clock stretching technique is proposed to combat the effects of process variations

Contributions of This Dissertation

Outline of Presentation

Variation Aware Gate Sizing (VA-GS) Gate sizing is one of the simplest, yet effective technique for improving power/performance trade-off in VLSI circuits Increasing size of a gate increases performance and power consumption. The problem of gate sizing is well suited to be formulated as a mathematical programming problem In this work, we formulate variation aware gate sizing as a fuzzy linear programming problem, maximizing timing yield with power and delay as constraints.

Previous Works

Variation Aware Gate Sizing – Outline Step 1: Formulation of linear models for gate delay and dynamic power as functions of gate sizes. Step 2: Modeling process variation in gate delay coefficients by treating them as triangular fuzzy numbers. Step 3: Formulating and solving the LP for Deterministic Gate Sizing by setting the variation parameters to worst and typical case -> we get bounds for fuzzy formulation. Step 4: The bound values generated above are used to convert fuzzy formulation into a corresponding crisp formulation using symmetric relaxation. Step 5: The crisp optimization problem is then solved through a commercial nonlinear optimization solver.

Step 1: Power and Timing Models The power consumption of a gate is fitted as a linear function of the gate size (si) only. Linear approximation for gate delayis adopted from [Berkelaar, EDAC 90] where a, b, c : constant coefficients from spice simulations fo(i): fan-out of gate i; si: size of gate i; The above equation describes, gate delay (di) as a function of gate size (si) and sizes of its fan-out gates

Step 2: Modeling Variations The variations in gate length and oxide thickness are translated to coefficients b and c in the delay equation The actual physical variability of these coefficients are unknown, but they closely approximate gate length and oxide thickness [Mani, ICCD 04] The fuzzy coefficients are modeledas triangular fuzzy numbers of the form (bi,bi–gi, bi+gi) and (ci,ci–hi,ci+hi)and the coefficients gi and hi represent the maximum variations

Step 3: Deterministic Gate Sizing: LP Formulation In this work, we use a delay constrained power minimization formulation for gate sizing The deterministic version of the gate sizing optimization problem can be shown as where Pi is the power consumption of gate i, Dp is the delay of path p and Tspec is the required timing specification of the circuit The variations in delay are transferred to the coefficients b and c in the delay equation

Step 3: Pre-Processing for Creating Crisp Problem The deterministic LP problem is solved with gate delay set to worst case (wc_sizing) Next, the deterministic LP problem is also solved with delay of a gate set to nominal case (nc_sizing) The solution to these optimizations represent the lower and upper bound values for variation aware fuzzy gate sizing problem

Using these bound values from the pre-processing step and a variationparameter lambda ) the fuzzy linear programming problem shown below is converted to crisp programming problem. The solution to the crisp problem is in between the bound values and represents an overall degree of satisfaction of the variation parameters and the objectives of the optimization problem. Step 4: Variation Aware Fuzzy Gate Sizing

Step 4: Crisp Nonlinear VA-GS Problem The crisp problem for VA-GS is given by, Where is the variation parameter, ncsizingand wcsizingrepresent the values of the objective functions from the deterministic pre-processing optimizations and  varies from 0 to 1. The crisp problem maximizes the variation resistance (robustness), bounds the power value and satisfies the delay constraints in an optimal fashion

Step 5: VA-GS Simulation VA-GS was tested on ITC’99 circuits AMPL – mathematical programming language format. KNITRO a commercial non-linear optimization solver. A variation of 25% in gate delay was assumed in accordance with [Nassif, ISSCC 2000].

Experimental Results The variation aware fuzzy gate sizing approach provides an average improvement of 18% compared to DWC and 9% compared to stochastic gate sizing without compromising on timing yield.

Monte-Carlo simulation The solution of the fuzzy technique is verified for timing yield values using Monte-Carlo simulation We generated 10000 copy of all benchmark circuits with random gate delay coefficients and fixed gate sizes from the solution of the fuzzy approach The delay coefficients corresponding to gate length and oxide thickness were treated as random numbers within the nominal case and worst case range. The timing yield defined as the number of times delay of the random circuit is less than Tspec value. The proposed fuzzy approach indicates a timing yield of 99% for the ITC benchmark circuits.

Timing Based Placement (TBP) Incremental placement for delay improvement is a crucial step in the post layout timing convergence flow The TBP performs small changes to the cell locations, after wire length driven standard cell placement, with the objective of improving worst negative slack Previous works on timing driven placement [Choi, ICCAD 03] has shown significant improvements of (upto 20%) in worst negative slack

Variation Aware TBP The objective of timing based placement is to find optimal locations of cells in a critical sub-circuit such that the critical delay of the circuit is minimized. The timing based placement technique requires a nonlinear programming approach, as net delay has a quadratic dependence on net length We proposed two new solutions: (i) A fuzzy nonlinear program based solution (ii) A stochastic chance constrained programming based solution for variation aware timing based placement.

Taxonomy VA-TBP

uppery Net lowery leftx rightx Location Constraints and HPWL • The variables leftx, rightx, lowery and uppery are defined for every net. • For every cell at location (x,y)connected to net, following constraints are required, • Half perimeter wire length (HPWL) of this net is then given by,

Variation Aware Fuzzy TBP – Outline Step 1: Formulationof linear model for gate delay and nonlinear model for interconnect delay. Step 2: Modeling process variation in delay coefficients by treating them as triangular fuzzy numbers. Step 3: Estimate critical cells and calculate move distance. Step 4: Formulating and solving the NLP for TBP by setting the variation parameters to worst and typical case -> we get bounds for fuzzy formulation. Step 5: The bound values generated above are used to convert fuzzy formulation into a corresponding crisp formulation using symmetric relaxation. Step 6: The crisp optimization problem is then solved through a commercial nonlinear optimization solver.

Step 1: Gate and Interconnect Delay models • We model gate delay as linear function of gate size (si) and capacitance (Cpi). In timing based placement, the gate size (si) does not change and only load seen by the gate changes, due to change in interconnect length. • The interconnect delay is modeled as a quadratic function of the net length and can be shown as, • Hence, in this work, we model timing based placement as a nonlinear programming problem to maximize timing yield with delay and location constraints

Step 3: Pre-Processing

Step 4: Deterministic TBP Formulation The deterministic version of the incremental timing based placement problem can be shown as, The HPWL and location constraints are not shown here as they are not affected by process variations. Here, arris the arrival time variable of gate and nets and Tspec is the required timing specification of the circuit The problem is formulated to maximize the timing specification (a pseudo for worst negative slack) with node based required arrival time constraints.

Using these bound values from the pre-processing step and a variation parameter lambda ) the uncertain nonlinear programming problem is converted to a crisp nonlinear problem. The problem aims to maximize variation resistance (l) and maintains the timing specification in between the bound values ( wc_tbp and nc_tbp) Step 5: Crisp TBP Formulation

Stochastic Timing Based Placement The stochastic formulation is cast as a robust mathematical program, which captures variation effects on the constraints using the mean and variance of the uncertain parameters. The stochastic chance constrained programming technique models uncertainty in delay using probabilistic constraints.

Probabilistic Constraints The uncertain arrival time constraints modeled as probabilistic constraints: Where, (h) the probability at which the constraint has to be met corresponds to the timing yield of the circuit The probabilistic constraints are relaxed to the equivalent formulation with mean, cumulative distribution and standard deviation

Stochastic TBP The resultant stochastic TBP problem can be shown as, Here, (s) is the standard deviation and is the inverse cdf value of the distribution. In accordance with previous works [Prekopa, Kluwer 95], a inverse cdf value of 3 is used for timing yield of 99.7%

Step 6: VA-TBP Simulation VA-TBP was tested on ITC’99 benchmark circuits KNITRO solver available through NEOS is used for both formulations described in AMPL format

Experimental Results The variation aware fuzzy placement approach provides an average improvement of 12% compared to DWC and the stochastic placement methodology provided a 10% compared to DWC

Buffer Insertion and Driver Sizing (BIDS) Impact of interconnect driven performance optimization is increasing in the nanometer era. In prior buffer insertion techniques, wires have been divided into smaller segments and bring the wire delay to almost linear in terms of its length. It has also been pointed out in [Saxena, TCAD 04], that 35% of the total standard logic cells in a circuit will be buffers at the 65nm technology level. Further, several works have pointed out that buffer insertion coupled with driver sizing, in the optimization phase, can reduce the number of buffers inserted.

Logic Level Variation Aware BIDS • We formulate the buffer insertion and driver sizing problem at the logic level as a piece-wise linear program with variations modeled as fuzzy numbers. • Piece-wise linear constraints are used for modeling buffer insertion, when multiple buffers are to be inserted in a net segment • A look-up table based approximation is used for net length modeling at the logic level • Number of buffers and gate sizes used as pseudonym for dynamic power consumption during BIDS

Logic Level Net Length Estimation Accurate modeling of the interconnect length at the logic level is crucial to optimization at this level In this work, we estimate wire length using a fast and accurate lookup table based estimation. Previous works, have used the Rent’s rule to derive the upper bounds for interconnection lengths The rent’s rule however, does not hold true at all levels of partition hierarchy in the nanometer era Hence, we use a table based methodology with number of cells/interconnects and fan-out count of each cell as the address for look-up

Logic Level Net Length Estimation The look-up table is created with layout-level wire length results of sample benchmark circuits MCNC benchmark suite with gate complexity ranging from 500 to 10000 gates were used for estimation Interconnects with same fan-out count is grouped and the average net length for each fan-out count is calculated For each fan-out count, nets are averaged again based on gate count in the second dimension A maximum fan-out size of 20 is assumed and all nets with more than 20 fan-out count are rounded to 20

Deterministic-BIDS • The equation below shows the BIDS problem formulated to minimize buffer and gate cost with piece-wise required time constraints

Conversion to Crisp Formulation • The Objwcand Objnc from the deterministic-BIDS are the worst case and nominal case objective values • Now with these pre-processed objective (Obj) values and a variation resistance parameter (lambda), the fuzzy problem is converted to the following crisp problem,

Experimental Setup The simulation flow for the fuzzy-BIDS is shown in Figure. Fuzzy-BIDS was tested on ITC 99 benchmark circuits mapped to user defined technology library AMPL – mathematical programming language format KNITRO –interior point non-linear optimization solver

Experimental Results The variation aware logic level fuzzy-BIDS approach provides an average improvement of 35% on the number of buffers and gate cost required to meet performance and yield targets

Techniques for VLSI Circuit Optimization Considering Process Variations

Techniques for VLSI Circuit Optimization Considering Process Variations

Presentation Transcript

Optimization Techniques

Process Optimization

Speaker Optimization Circuit (SOC)

ECE 425 - VLSI Circuit Design

SW Optimization Techniques

ECE 425 - VLSI Circuit Design

Process Optimization

Optimization Techniques

Search Optimization Techniques

Process Optimization

Optimization Techniques

Techniques for Combinational Logic Optimization

Process Optimization

Placement and Timing for FPGAs Considering Variations

YouTube Optimization Techniques

NEMT OPTIMIZATION TECHNIQUES:

Optimization Techniques

Process Optimization