Techniques for VLSI Circuit Optimization Considering Process Variations. Mahalingam Venkataraman , PhD Defense Date: 3/23/2009. Mahalingam Venkataraman Department of Computer Science and Engineering University of South Florida, Tampa, FL, 33620 Chair: Prof. Babu Joseph
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Mahalingam Venkataraman, PhD Defense
Department of Computer Science and Engineering
University of South Florida, Tampa, FL, 33620
Chair: Prof. Babu Joseph
Major Professor: Prof. Nagarajan Ranganathan
Committee Members: Prof. Srinivas Katkoori
Prof. Hao Zheng
Prof. Justin E. Harlow
Prof. Kandethody Ramachandran
Prof. Sanjuktha Bhanja
Mahalingam Venkataraman, PhD Defense
Source: Spektrum der Wissenschaften
65 nm Transistor
Courtesy: Sill, PGPEE 2008
Process variations, in general, refer to the difference between the intended and obtained values in voltage and process parameters prior and post fabrication of the circuit.
The variations are more pronounced in nanometer era due to the limitations in fabrication equipment and lithography process
Process variations in nanometer era has a impact on the failure probability and hence the timing yield of integrated circuits
Static Timing Analysis
Statistical Timing Analysis
Circuit delay as PDF/CDF
pattern matching and artificial intelligence.
Gate sizing is one of the simplest, yet effective technique for improving power/performance trade-off in VLSI circuits
Increasing size of a gate increases performance and power consumption.
The problem of gate sizing is well suited to be formulated as a mathematical programming problem
In this work, we formulate variation aware gate sizing as a fuzzy linear programming problem, maximizing timing yield with power and delay as constraints.
Step 1: Formulation of linear models for gate delay and dynamic power as functions of gate sizes.
Step 2: Modeling process variation in gate delay coefficients by treating them as triangular fuzzy numbers.
Step 3: Formulating and solving the LP for Deterministic Gate Sizing by setting the variation parameters to worst and typical case -> we get bounds for fuzzy formulation.
Step 4: The bound values generated above are used to convert fuzzy formulation into a corresponding crisp formulation using symmetric relaxation.
Step 5: The crisp optimization problem is then solved through a commercial nonlinear optimization solver.
The power consumption of a gate is fitted as a linear function of the gate size (si) only.
Linear approximation for gate delayis adopted from [Berkelaar, EDAC 90]
where a, b, c : constant coefficients from spice simulations
fo(i): fan-out of gate i;
si: size of gate i;
The above equation describes, gate delay (di) as a function of gate size (si) and sizes of its fan-out gates
The variations in gate length and oxide thickness are translated to coefficients b and c in the delay equation
The actual physical variability of these coefficients are unknown, but they closely approximate gate length and oxide thickness [Mani, ICCD 04]
The fuzzy coefficients are modeledas triangular fuzzy numbers of the form (bi,bi–gi, bi+gi) and (ci,ci–hi,ci+hi)and the coefficients gi and hi represent the maximum variations
In this work, we use a delay constrained power minimization formulation for gate sizing
The deterministic version of the gate sizing optimization problem can be shown as
where Pi is the power consumption of gate i, Dp is the delay of path p and Tspec is the required timing specification of the circuit
The variations in delay are transferred to the coefficients b and c in the delay equation
The deterministic LP problem is solved with gate delay set to worst case (wc_sizing)
Next, the deterministic LP problem is also solved with delay of a gate set to nominal case (nc_sizing)
The solution to these optimizations represent the lower and upper bound values for variation aware fuzzy gate sizing problem
Using these Variationsbound values from the pre-processing step and a variationparameter lambda ) the fuzzy linear programming problem shown below is converted to crisp programming problem.
The solution to the crisp problem is in between the bound values and represents an overall degree of satisfaction of the variation parameters and the objectives of the optimization problem.Step 4: Variation Aware Fuzzy Gate Sizing
The crisp problem for VA-GS is given by,
Where is the variation parameter, ncsizingand wcsizingrepresent the values of the objective functions from the deterministic pre-processing optimizations and varies from 0 to 1.
The crisp problem maximizes the variation resistance (robustness), bounds the power value and satisfies the delay constraints in an optimal fashion
VA-GS was tested on ITC’99 circuits
AMPL – mathematical programming language format.
KNITRO a commercial non-linear optimization solver.
A variation of 25% in gate delay was assumed in accordance with [Nassif, ISSCC 2000].
The variation aware fuzzy gate sizing approach provides an average improvement of 18% compared to DWC and 9% compared to stochastic gate sizing without compromising on timing yield.
The solution of the fuzzy technique is verified for timing yield values using Monte-Carlo simulation
We generated 10000 copy of all benchmark circuits with random gate delay coefficients and fixed gate sizes from the solution of the fuzzy approach
The delay coefficients corresponding to gate length and oxide thickness were treated as random numbers within the nominal case and worst case range.
The timing yield defined as the number of times delay of the random circuit is less than Tspec value.
The proposed fuzzy approach indicates a timing yield of 99% for the ITC benchmark circuits.
Incremental placement for delay improvement is a crucial step in the post layout timing convergence flow
The TBP performs small changes to the cell locations, after wire length driven standard cell placement, with the objective of improving worst negative slack
Previous works on timing driven placement [Choi, ICCAD 03] has shown significant improvements of (upto 20%) in worst negative slack
The objective of timing based placement is to find optimal locations of cells in a critical sub-circuit such that the critical delay of the circuit is minimized.
The timing based placement technique requires a nonlinear programming approach, as net delay has a quadratic dependence on net length
We proposed two new solutions:
(i) A fuzzy nonlinear program based solution
(ii) A stochastic chance constrained programming based solution
for variation aware timing based placement.
rightxLocation Constraints and HPWL
Step 1: Formulationof linear model for gate delay and nonlinear model for interconnect delay.
Step 2: Modeling process variation in delay coefficients by treating them as triangular fuzzy numbers.
Step 3: Estimate critical cells and calculate move distance.
Step 4: Formulating and solving the NLP for TBP by setting the variation parameters to worst and typical case -> we get bounds for fuzzy formulation.
Step 5: The bound values generated above are used to convert fuzzy formulation into a corresponding crisp formulation using symmetric relaxation.
Step 6: The crisp optimization problem is then solved through a commercial nonlinear optimization solver.
The deterministic version of the incremental timing based placement problem can be shown as,
The HPWL and location constraints are not shown here as they are not affected by process variations. Here, arris the arrival time variable of gate and nets and Tspec is the required timing specification of the circuit
The problem is formulated to maximize the timing specification (a pseudo for worst negative slack) with node based required arrival time constraints.
Using these Variationsbound values from the pre-processing step and a variation parameter lambda ) the uncertain nonlinear programming problem is converted to a crisp nonlinear problem.
The problem aims to maximize variation resistance (l) and maintains the timing specification in between the bound values ( wc_tbp and nc_tbp)Step 5: Crisp TBP Formulation
The stochastic formulation is cast as a robust mathematical program, which captures variation effects on the constraints using the mean and variance of the uncertain parameters.
The stochastic chance constrained programming technique models uncertainty in delay using probabilistic constraints.
The uncertain arrival time constraints modeled as probabilistic constraints:
Where, (h) the probability at which the constraint has to be met corresponds to the timing yield of the circuit
The probabilistic constraints are relaxed to the equivalent formulation with mean, cumulative distribution and standard deviation
The resultant stochastic TBP problem can be shown as,
Here, (s) is the standard deviation and is the inverse cdf value of the distribution.
In accordance with previous works [Prekopa, Kluwer 95], a inverse cdf value of 3 is used for timing yield of 99.7%
VA-TBP was tested on ITC’99 benchmark circuits
KNITRO solver available through NEOS is used for both formulations described in AMPL format
The variation aware fuzzy placement approach provides an average improvement of 12% compared to DWC and the stochastic placement methodology provided a 10% compared to DWC
Impact of interconnect driven performance optimization is increasing in the nanometer era.
In prior buffer insertion techniques, wires have been divided into smaller segments and bring the wire delay to almost linear in terms of its length.
It has also been pointed out in [Saxena, TCAD 04], that 35% of the total standard logic cells in a circuit will be buffers at the 65nm technology level.
Further, several works have pointed out that buffer insertion coupled with driver sizing, in the optimization phase, can reduce the number of buffers inserted.
Accurate modeling of the interconnect length at the logic level is crucial to optimization at this level
In this work, we estimate wire length using a fast and accurate lookup table based estimation.
Previous works, have used the Rent’s rule to derive the upper bounds for interconnection lengths
The rent’s rule however, does not hold true at all levels of partition hierarchy in the nanometer era
Hence, we use a table based methodology with number of cells/interconnects and fan-out count of each cell as the address for look-up
The look-up table is created with layout-level wire length results of sample benchmark circuits
MCNC benchmark suite with gate complexity ranging from 500 to 10000 gates were used for estimation
Interconnects with same fan-out count is grouped and the average net length for each fan-out count is calculated
For each fan-out count, nets are averaged again based on gate count in the second dimension
A maximum fan-out size of 20 is assumed and all nets with more than 20 fan-out count are rounded to 20
The simulation flow for the fuzzy-BIDS is shown in Figure.
Fuzzy-BIDS was tested on ITC 99 benchmark circuits mapped to user defined technology library
AMPL – mathematical programming language format
KNITRO –interior point non-linear optimization solver
The variation aware logic level fuzzy-BIDS approach provides an average improvement of 35% on the number of buffers and gate cost required to meet performance and yield targets
The variation aware buffer insertion at the layout level is formulated to optimize variation resistance with delay and cost (number of buffers and gate sizes) as constraints.
The layout level buffer insertion, however, has restriction on the candidate buffer location to avoid repeating the place and route step.
The generation of candidate buffer locations is performed by dividing the routed wires into channels.
Sparse channels were preferred as candidates compared to denser ones.
A incremental legalization step is performed after the layout level buffer insertion to remove overlaps
The benchmark circuits for layout level BIDS were placed and routed using cadence design encounter tool to estimate actual wire lengths
Similar to the logic level simulations, the layout level AMPL models were solved with KNITRO nonlinear programming (NLP) solver
The AMPL models were rebuilt for layout level with worst-case, nominal-case and fuzzy modeling
The cost function (number of buffers plus gate size increments) comparing logic and layout level BIDS for various benchmarks is shown in Figure.
The average difference (among all benchmarks) in buffer plus gate cost between logic and layout level simulations is within 10%
Statistical optimization methods (fuzzy, stochastic) have been effective in improving the yield/cost tradeoffs for circuits in the nanometer era
However, statistical design methods over consume power/delay even in the absence of variations
Hence, solutions which can dynamically detect delay due to variations and perform corrective/preventive action is becoming necessary
Here, we propose a dynamic delay detection and clock stretching technique to prevent timing violations
RAZOR: Dan Ernst et. al., MICRO 2003.
The methodology isolates critical paths.
Evaluates the data in two cycles whenever critical paths are activated.
Works well on special designs with few critical paths, but incurs delay overhead on random designs.
Adaptive voltage scaling based on Critical path duplication[Burd, ISSCC 2000]
Clock phase adjustment based on dynamic delay buffer cell [Semiao, DDECS 2008]
The dynamic delay buffer and critical path duplication do not consider spatial correlation
Dynamic delay buffer design considers variations in process parameters and ignores temperature and voltage variations
Irrespective of the variations occurring (P, V or T), we would like to investigate solutions at circuit level to combat variations with significantly less overheads.
Identify and capture the delay due to process variations early in the clock period
Employ a delay detection circuit to identify if a transition is delayed in the critical paths
Delay the clock (or select a delayed clock) in the event that the arrival of a signal is delayed due to process variations.
An important pre-processing step would be the identification of critical locations (interconnects), halfway in the critical path
In the presence (absence) of variations, the transitions have to be after (before) the negative edge of the clock
The positive level triggered latch, shown in Figure captures the value floating on critical interconnect at the positive level of the clock.
If the transition is delayed due to process variations, then the inputs to the XOR gate will be different.
The multiplexor selects the normal (undelayed) or delayed clock for the destination flip-flop based on the value of the XOR gate output
In the proposed approach, the delayed clock can be dynamically selected, in case the signal propagation is delayed in the data path due to process variations.
The delay detection and clock stretching logic (CSL) is added to the critical and near critical paths that can potentially have timing failure due to process variations
Unlike voltage or frequency scaling, the proposed methodology can provide immediate activation and enable prevention of timing failures
Since the detection circuit monitors data transitions on critical interconnects, the methodology is independent of the type of process variation (PVT).
A chain of inverters in between two flip-flops stages is chosen as the example circuit.
In this circuit, all interconnects in the path switch making the net halfway in the path, the necessary critical interconnect.
In the context of clock stretching, the issue of short paths and consecutively pipelined critical paths has to be addressed.
In nanometer designs, short paths are usually rare due to the multiple objectives of power, performance and yield
Plus, in this work, we only use a small margin for clock stretching (approximately 10%), hence minimizing the possibility of short path failures
Secondly in pipeline circuits if a critical path is followed by another critical path in the following pipeline stage, the CSL methodology can cause timing failures.
This is because the delayed clock circuitry reduces the data capture time available in subsequent pipeline stage.
The simulation flow for timing yield estimation is shown in Figure.
A simple C program was developed to estimate timing yield, with place-route and timing analysis reports
Number of critical paths can be reduced by incremental sizing/placement to improve CSL overhead
Graph showing impact of clock stretching on timing yield.
In this research, we have proposed solutions for improving timing yield considering variations without significant over design.
The fuzzy modeling is shown to effectively model variations in linear, nonlinear and piece-wise linear circuit optimization problems.
Hence, the various algorithms and circuit optimization methods proposed in this dissertation research represent significant additions to the VLSI CAD tools in the context of variation aware design.
The proposed circuit level technique can be used to dynamically detect delay in signals that occur due to variations and stretch the clock to add the required extra slack.
This method is expected to make a significant impact in the industry and a totally different approach from the previous works.
Semiconductor Research Corporation contract 2007-HJ-1596
NSF Computing Research Infrastructure grant CNS-0551621
V. Mahalingam, N. Ranganathan and J.E. Harlow, ”Fuzzy Optimization Approach for Gate Sizing in the presence of Process Variations”, IEEE Transactions on VLSI Systems, 16(8), Pages 975-984, Aug 2008
V. Mahalingam and N. Ranganathan, ”Timing Based Placement Considering Uncertainty due to Process Variations”, Accepted for Publication (Feb 2009) in IEEE Transactions on VLSI Systems
V. Mahalingam and N. Ranganathan, ”Improving Accuracy in Mitchells Logarithmic Multiplication using Operand Decomposition”, IEEE Transactions on Computers, 55(12), Pages 1523-1535, Dec 2006
V. Mahalingam, K. Bhattacharya, N. Ranganathan, H. Chakravarthula, R. Murphy and K. Pratt,”An Efficient VLSI Architecture for Accurate Computation of Lucas-Kanade based Optical Flow”, Accepted for Publication (Sep 2008) in IEEE Transactions on VLSI Systems
N. Ranganathan, U. Gupta and V. Mahalingam, ”Simultaneous Optimization of Total Power, Crosstalk Noise, and Delay Under Uncertainty”, Great lakes symposium in VLSI (GLSVLSI), Pages 171-176, May 2008
V. Mahalingam and N. Ranganathan, ”A Fuzzy Optimization Approach for Process Variation Aware Buffer Insertion and Driver Sizing”, IEEE Computer Society Annual Symposium on VLSI (ISVLSI), Pages 329-334, Apr 2008
V. Mahalingam and N. Ranganathan, ”Variation Aware Timing based Placement using Fuzzy Programming”, IEEE International Symposium on Quality Electronic Design (ISQED), Pages 327-332, Mar 2007
V. Mahalingam, N. Ranganathan and Justin E. Harlow, ”A Novel Approach for Variation Aware Power Minimization during Gate Sizing”, IEEE International Symposium on Low Power Electronic Design (ISLPED), Pages 174-179, Oct 2006
V. Mahalingam and N. Ranganathan, ”Variation Aware Circuit-Wise Buffer Insertion and Driver Sizing at the Logic Level”, Submitted to Design Automation Conference (DAC), 2009
V. Mahalingam,N. Ranganathan, N. Ahmed and H. Towfique, “A Variation Aware Circuit Design using Dynamic Clock Stretching”, Submitted to IEEE International Symposium on Low Power Electronic Design (ISLPED), 2009