1 / 10

Processing Rate Optimization by Sequential System Floorplanning

Processing Rate Optimization by Sequential System Floorplanning. Jia Wang 1 , Ping-Chih Wu 2 , and Hai Zhou 1 1 Electrical Engineering & Computer Science Northwestern University, U.S.A 2 Cadence Design Systems Inc, U.S.A. Motivation. Optimize the performance of a sequential system.

alice
Download Presentation

Processing Rate Optimization by Sequential System Floorplanning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Processing Rate Optimization by Sequential System Floorplanning Jia Wang1, Ping-Chih Wu2, and Hai Zhou1 1Electrical Engineering & Computer Science Northwestern University, U.S.A 2Cadence Design Systems Inc, U.S.A

  2. Motivation • Optimize the performance of a sequential system. • Optimize the frequency (clock period). • Minimal period retiming. ([4] Lin et al. ICCAD’03) • Clock skew scheduling. • When the frequency is given but cannot be met by the above methods. • Global interconnects need to be pipelined while the functionality of the system should not change. • Latency insensitive design (LIS). ([6] [7] Carloni et al. ICCAD’99, DAC’00) • Wire-pipelining correcting method. ([5] Nookala et al. DAC’04) • Throughput is traded-off for frequency. • Optimizations are applied after delays are estimated. • More optimization possibilities in floorplanning and placement when interconnect delays dominate. • Placement driven by sequential timing. ([9] Hurst et al. ICCAD’04) • Floorplanning for throughput. ([8] Casu et al. ISPD’04) ISQED 2006

  3. Processing Rate • How to measure the performance of a sequential system? • Frequency? Throughput varies. • Throughput? Frequency varies. • Use Processing Rate to measure the performance. • Defined as the length of processed input sequence per unit time. • Equal to frequency times throughput in a synchronous system. • An upper bound of the processing rate is derived as: • G is the graph describing the sequential system. For a wire e, w(e) is the number of flip-flops on it and d(e) is the delay of it. • Independent of afterward optimization methodologies. • Independent of the operating frequency. ISQED 2006

  4. Floorplanning for Processing Rate (FPR) • Find a floorplan to maximize the upper bound. • Intuitively, designs with larger bounds are superior to the ones with smaller bounds. • Good fidelity between the bound and the processing rate makes our approach effective. • Optimizing the bound means at the stage of floorplanning, • Not necessary to determine what methodology to apply later. • Not necessary to know the operating frequency. • Save design time since it is not necessary to repeatedly perform floorplanning according to the different afterward optimization methodologies and operation frequencies. ISQED 2006

  5. Overview of the Floorplanning Algorithm • Simulated annealing (SA) based floorplanner. • Adjacent Constraint Graph (ACG) as the floorplan representation. • A representation for general floorplans. • Common ACG perturbations that change the geometric relationships locally. • Local changes enable incremental evaluation of the bound. • Cost function to be optimized includes the area of the floorplan and the processing rate upper bound. • When the floorplan is required to be fit into a fixed outline, a outline cost is included in the cost function as well: • W and H are the width and height of the current floorplan respectively. • W* and H* are the desired width and height respectively. ISQED 2006

  6. Adjacent Constraint Graph (ACG) • A constraint graph containing both horizontal and vertical constraint edges satisfying that, • Exactly one constraint relation between every pair of modules. • No transitive edges. • No cross, which is an edge configuration if allowed may introduce quadratic number of edges to the graph. • Reduced ACG simplifies ACG by removing a group of edges that can be inferred from other edges. • An example: • The floorplan. • Its ACG. • Its Reduced ACG. ISQED 2006

  7. Direct Bound Evaluation • The minimum cycle ratio problem. • Need to be solved many times in simulated annealing (SA). • More than 700K times for our largest benchmark. • Previous work [8] only estimates but not computes the ratio in SA. • Many polynomial-time algorithms available. • However, choose Howard’s algorithm. • Not proved to be polynomial-time but among the fastest ones in practice. • Howard’s algorithm iteratively finds the ratio. • Maintain a policy graph. • A sub-graph of G where there is exactly one edge starting from any vertex. • Its minimum cycle ratio is obtained by enumerating its cycles. •  is an upper bound of the minimum cycle ratio of G. • Check if there is a negative cycle in G with edge weights w(e)- d(e). • No. Then  is the minimum cycle ratio. • Yes. Build a new policy graph containing one such cycle. • Keep a vertex labeling to interleave the above two steps. ISQED 2006

  8. Incremental Bound Evaluation • The initial policy graph in Howard’s algorithm: • Constructed heuristically. • Intuitively, an initial one with a smaller  tends to converge quicker. • The floorplans in simulated annealing: • ACG perturbations change the geometric relationships locally. • Most likely, the cycle ratio will not change much across perturbations. • Reuse final policy graph as the initial one for the floorplan after perturbation. • Reduce running time by 29% on average. • Columns time are in seconds. • Columns #iter. are the total number of iterations in Howard’s algorithm. ISQED 2006

  9. Experiments • Six GSRC floorplanning benchmarks: • n10, n30, n50, n100, n200, n300. • Including 10, 30, 50, 100, 200, 300 modules respectively. • Each n-pin net is decomposed into n-1 2-pin nets. • The last pin of the net is treated as the sources of the nets after decomposition. • Other n-1 pins as sinks. • Exactly one flip-flop on each net. • Wire delays are computed as Manhattan distances between pins. • Pins are assumed to be at the centers of the modules. • Evaluate the processing rate under different operating frequencies for comparison with [8]. • Frequencies are modeled by critical length. • The distance that a signal travels in one clock cycle. • 30%, 50%, 70%, and 100% of the square root of the total module area. ISQED 2006

  10. Experiments (Cont.) • Results reported in the format 1−throughput/white space (%). • The smaller the number, the better. • Dominating solutions are highlighted. • One floorplan for all the frequencies in our approach. • One floorplan for each frequency in [8]. ISQED 2006

More Related