1 / 58

ICCAD ’ 03 Review

ICCAD ’ 03 Review. CSE 597B Lin Li. Outline. Overview Archive download URL Best paper award Paper from our group Interesting tutorial Paper in related areas Power and energy optimization Interconnect-centric SoC design Reliable issue Performance optimization

Download Presentation

ICCAD ’ 03 Review

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ICCAD’03 Review CSE 597B Lin Li

  2. Outline • Overview • Archive download URL • Best paper award • Paper from our group • Interesting tutorial • Paper in related areas • Power and energy optimization • Interconnect-centric SoC design • Reliable issue • Performance optimization • Simulation at the nanometer scale • Other areas in ICCAD

  3. Archive Download URL • Papers and presentation slides can be downloaded from: http://www.iccad.com/archive.html

  4. Best Paper Award • 6C.1 - Noise Analysis for Optical Fiber Communication Systems • Alper Demir • KOC University, Sariyer-Istanbul, Turkey • 8B.1 - Block-Based Static Timing Analysis with Uncertainty • Anirudh Devgan, Chandramouli Kashyap • IBM Research at Austin, IBM Microelectronics

  5. Paper from Our Group • 1A.1 - Adaptive Error Protection for Energy Efficiency • Lin Li, N. Vijaykrishnan, Mahmut Kandemir, Mary Jane Irwin • 3C.1 - Array Composition and Decomposition for Optimizing Embedded Applications • Guilin Chen, Mahmut Kandemir, Ugur Sezer, Avanti Nadgir

  6. Interesting Tutorial • 2C.1 - Design and CAD Challenges in sub-90nm CMOS Technology • Kerry Bernstein, Ching-Te Chuang, Rajiv V. Joshi, Ruchir Puri • IBM T.J. Watson • 11B.1 - Formal Methods for Dynamic Power Mangement • Rajesh K. Gupta, Sandeep Shukla, Sandy Irani • UCSD, UCI, and VT

  7. 2C.1 - Design and CAD Challenges in sub-90nm CMOS Technology • Introduction • CMOS device scaling • New devices for high-performance logic • Planar device structures • Partially-depleted (PD) SOI • Fully-depleted (FD) SOI • Strained-Si & high-k gate • Emerging technologies • Double-gate MOSFETs • 3D integration and interconnects • Carbon Nanotube Transistor (CNT) • Molecular computing • CAD challenges • Challenges of Advanced device technologies • Major issues • Power crisis • Coping with Variability

  8. 2C.1 - Design and CAD Challenges in sub-90nm CMOS Technology (Cont’d)

  9. 11B.1 - Formal Methods for Dynamic Power Mangement • Overview the formal methods that have been explored in solving the system-level Dynamic Power Management (DPM) problem. • Show how formal reasoning frameworks can unify apparently disparate DPM techniques. • Approaches that treat the DPM problem as one of stochastic optimization with probabilistic guarantees on performance.

  10. Power and Energy Optimization • Using dynamic voltage scaling in embedded systems (Section 1B) • Using software techniques in embedded systems (Section 3C) • Energy issues in systems design (Section 7B) • Power-aware design (Section 8C)

  11. 1B.1 - Generalized Network Flow Techniquesfor Dynamic Voltage Scaling in Hard Real-Time Systems • Vishnu Swaminathan, Krishnendu Chakrabarty ECE@Duke • Energy consumption must be carefully balanced with real-time responsiveness in hard real-time systems. • Present an optimal offline dynamic voltage scaling (DVS) scheme for dynamic power management in such systems.

  12. lij, uij, Cij, mij i j Jobs Speeds Intervals s1h c1h,c1h,Vh2,1 0, ,0,1 j1 D1 0,1,Vl2c1l-Vh2c1h,c1l-c1h 0, ,0,1 s1l 0, D1,0,1 0,1,Vi2c1i-Vh2c1h,c1i-c1h c1h,c1h+1,Vh2,1 D2 . . . . 0, D2,0,1 . . . . s1i t s snh D2n-2 snl jn sni D2n-1 Generalized Network Flow Models for the DVS problem

  13. 1B.2 - Approaching the Maximum Energy Saving on Embedded Systems with Multiple Voltages • Shaoxiong Hua, Gang Qu ECE@UMCP • For a multiple-voltage DVS system to serve a set of applications {(ei, di, pi): i=1, 2, …, n} without missing their deadlines, • if the system has m voltages {v1, v2,… ,vm}, determine the value of each vi to minimize the energy consumption. • determine m and the value of each vi.

  14. 1B.2 - Approaching the Maximum Energy Saving on Embedded Systems with Multiple Voltages (Cont’d) • Voltage set-up is the fundamental problem for multiple-voltage DVS system. • application-specific • 2-voltage DVS system: analytic solutions and a linear search algorithm • m-voltage DVS system: analytic solution does not exist, an approximation method • Multiple-voltage can be very close to the maximal energy saving by DVS.

  15. 1B.3 - Combined Dynamic Voltage Scaling and Adaptive Body Biasing for Heterogeneous Distributed Real-Time Embedded Systems • Le Yan, Jiong Luo, Niraj K. JhaEE@Princeton • New scheduling algorithm that combines DVS and adaptive body biasing (ABB) to simultaneously optimize both dynamic power consumption and leakage power consumption for real-time distributed embedded systems.

  16. 1B.3 - Combined Dynamic Voltage Scaling and Adaptive Body Biasing for Heterogeneous Distributed Real-Time Embedded Systems • A novel two-phase approach Phase I Optimal tradeoff between supply and threshold voltages Phase II Trade off energy consumption and clock period

  17. 1B.3 - Combined Dynamic Voltage Scaling and Adaptive Body Biasing for Heterogeneous Distributed Real-Time Embedded Systems Initializations Phase I No Extensible tasks exist? Yes Return Allocate slack to reference task Phase II Reference task: highest energy_derivative Allocate slack to each other task energy_derivative: higher than reference level No EST+WCET>LFT? Yes Invalidate this slack allocation

  18. 3C.3 - Energy Optimazation of Distributed Embedded Processors by Combined Data Compress ion and Functional Partitioning • Jinfeng Liu, Pai H. Chou ECE@UCI • Goal • Energy minimization for distributed embedded processors • Combined optimization • Selection of optimal compression algorithm • Functional partitioning

  19. PROC1 150MHz A bad partitioning scheme that produces extra I/O load,without compression N1 SEND1 RECV1 IDLE D PROC2 150MHz SEND2 RECV2 N2 IDLE D Non-optimal without compression D However, it could turn out optimal with compression, if the data from N1 to N2 can be compressed well. N1 SEND1 RECV1 80MHz DECO1 COMP1 PROC1 IDLE D SEND2 RECV2 80MHz N2 Optimal with compression DECO2 PROC2 COMP2 IDLE D 3C.3 - Energy Optimazation of Distributed Embedded Processors by Combined Data Compress ion and Functional Partitioning

  20. 3C.4 - Energy-Aware Fault Tolerance in Fixed-Priority Real-Time Embedded Systems • Ying Zhang, Krishnendu Chakrabarty, Vishnu Swaminathan ECE@Duke • Goal: low power, fault-tolerant real-time systems • Fault tolerance is achieved via checkpointing • Power management is carried out using dynamic voltage scaling (DVS).

  21. 7B.1 - A Game Theoretic Approach to Dynamic Energy Minimization in Wireless Transceivers • Ali Iranli, Hanif E. Fatemi, Massoud PedramEE@USC • A hierarchical formulation for energy optimization of wireless transceivers is proposed • A game theoretic approach to solve this energy minimization is proposed by which the energy consumption is reduced by 15% for BER = 10-5 • The proposed hierarchical frame work can be used in general for energy optimization of server-client systems

  22. Transmitter Leader Receiver Follower Transmit Power& Modulation level Leader’sPolicy Leader’scost function Overall energy consumption Follower’sPolicy Truncation length Follower’scost function Receiver's energy consumption 7B.1 - A Game Theoretic Approach to Dynamic Energy Minimization in Wireless Transceivers Transceiver Energy Optimization Stackelberg Game

  23. 7B.2 - Communication-Aware Task Scheduling and Voltage Selection for Total Systems Energy Minimization • Girish V. Varatkar, Radu MarculescuECE@CMU • Recent work in ES community: performance and energy are crucial! • Voltage selection • Task scheduling algorithm should use the foresight that voltage selection is going to follow the scheduling step • Schedule should provide the maximum slowing down potential • This work brings the communication aspect into the picture • A ‘communication-centric’ approach • A ‘voltage selection’ approach

  24. 7B.3 - LRU-SEQ: A Novel Replacement Policy for Transition Energy Reduction in Instruction Caches • Praveen G. Kalla, Xiaobo Sharon Hu, Joerg Henkel CSE@Notre Dame • LRU to LRU-SEQ (Sequential LRU) • Constraining sequential fetches to the same bank (same way) avoids bank transitions. • It also increases the sleep time for the banks over-coming break-even time requirements. • LRU nature has to be maintained, else associativity is lost !! (hit-ratio is affected) • Distance between the last fetched line and the present line is a parameter that will affect the performance of this policy.

  25. FOR (every cache access) DO IF (access == HIT) THEN P_way = C_way ELSE dist = abs(Curr_Addr, Prev_Addr); IF ( dist <= SEQ_DST) THEN C_way = P_way ELSE C_way = LRU_Way END END Update LRU state for access. END P_( ) : Previous_( ) C_( ) : Current_( ) 7B.3 - LRU-SEQ: A Novel Replacement Policy for Transition Energy Reduction in Instruction Caches State Holder 1: P_way (entire cache) State Holder 2 : P_line (each cache way)

  26. 7B.4 - Compiler-Based Register Name Adjustment for Low-Power Embedded Processors • Peter Petrov, Alex Orailoglu CSE@UCSD • Compiler-driven register name adjustment for low-power was proposed • Register names reassigned without incurring any performance or power overhead • No hardware support required whatsoever • Efficient algorithm for Register Name Adjustment proposed with additional frequency skew enhancing phase

  27. 8C.1 - Leakage Power Optimization Techniques for Ultra Deep Sub-Micron Multi-Level Caches • Nam S. Kim, David Blaauw, Trevor N. MudgeEECS@UMICH • Cost- effective # of VTH for cache leakage reduction • depending on the target access time, but 1 or 2 high VTH’s is enough for leakage reduction • Cache leakage • another design constraint in processor design • trade-off among delay / area / leakage • Incorporating w/ realistic cache miss statistics for the leakage optimization

  28. 8C.1 - Leakage Power Optimization Techniques for Ultra Deep Sub-Micron Multi-Level Caches Using high-k dielectric reduces gate-oxide leakage ITRS 2002 projections with doubling of # of transistors every two years

  29. 8C.1 - Leakage Power Optimization Techniques for Ultra Deep Sub-Micron Multi-Level Caches cache sub-bank organization bit-line pair VTH2 Circuit model based on CACTI word-line VTH1 70nm Berkeley predictive technology model VTH3 decoder memory cell Abus buffer w/ repeater Interconnect R/C annotated repeaters used to minimize interconnect delay sense-amp w/ I/O circuits VTH4 Dbus buffer w/ repeater

  30. 8C.3 - Dynamic Platform Management for Configurable Platform-Based System-on-Chips • Krishna Sekar, Kanishka Lahiri, Sujit Dey ECE@UCSD • Described design techniques for dynamically customizing a general-purpose configurable platform • Dynamic platform management helps combine benefits of general-purpose & application-specific approaches • Benefits • Improved application performance • More efficient platform resource usage • Improved energy efficiency

  31. Platform Customization Techniques Customized Platforms 8C.3 - Dynamic Platform Management for Configurable Platform-Based System-on-Chips General-Purpose Processors General Purpose Configurable Platforms Improving flexibility, time-to-market, engg. cost, time-in market, Domain Specific Platforms ASIC, Custom SoC Improving performance, power, size

  32. Performance Objectives, Data Properties Performance Objectives, Data Properties Performance Objectives, Data Properties Power Constraints Application 1 Application 2 Application 3 Processing Requirements Processing Requirements Processing Requirements Dynamic Platform Management Optimized Platform Configuration General-purpose Configurable Platform Programmable Voltage Regulator Embedded processor PLD Programmable PLL On-chip communication architecture Flexible on-chip SRAM Re- configurable Cache Parameterized co-processor 8C.3 - Dynamic Platform Management for Configurable Platform-Based System-on-Chips

  33. Interconnect-Centric SoC Design

  34. 1A.2 - SAMBA-Bus: A High Performance Bus Architecture for System-on-Chips • Ruibing Lu, Cheng-Kok Koh ECE@Purdue • Single Arbitration, Multiple Bus Accesses • Automatically delivers multiple bus transactions • High bandwidth • Bus transactions can be performed even without explicit bus access grant from the arbiter • Communication latency increases only slightly even with high arbitration latency

  35. M1 M2 M3 M4 Forward Sub-bus Backward Sub-bus 1A.2 - SAMBA-Bus: A High Performance Bus Architecture for System-on-Chips Two sub-buses

  36. 1A.3 - The Y-Architecture for On-Chip Interconnect: Analysis and Methodology • Hongyu Chen, Chung-Kuan Cheng, Andrew B. Kahng et.al. CSE@UCSD • The Y-architecture for on-chip interconnect is based on pervasive use of 0-, 120-, and 240-degree oriented semi-global and global wiring. • Communication capability (throughput of meshes) better than Manhattan architecture and X-architecture. • Better total wire length compared to both H and X clock tree structures and better path length compared to the H tree. • Achieve 8.5% less IR drop than an equally-resourced power network in Manhattan architecture.

  37. 1A.3 - The Y-Architecture for On-Chip Interconnect: Analysis and Methodology 7 x 7 meshes with different interconnect architectures.

  38. Reliable Issue

  39. 3B.4 - Vectorless Analysis of Supply Noise Induced Delay Variation • Sanjay Pant, David Blaauw, Savithri SundareswaranUMICH, Motorola • Power Supply Integrity Issues • Functional Failure • Voltage fluctuations inject noise in the circuit • Performance Failure • Gate delay becoming increasing sensitive to supply voltage • ±10% variation in supply can result in 30% delay increase • Proposed Approach • Vectorless • Conservative in estimating worst-case drop/delay increase • Takes into account both IR and LdI/dt drops

  40. Power Grid Worst- Case Timing Input Vectors Worst Voltage Drop Library Charac. STA i/p Vector Search Simulator 3B.4 - Vectorless Analysis of Supply Noise Induced Delay Variation • Voltage Drop Estimation • Worst Drop highly dependent on input vectors • Slow simulation times allow only a few vectors to be tried • Worst-Case Voltage Budget Analysis • Highly conservative • Worst-case drop is localized • Ignores voltage shifts between distant driver-receiver pairs

  41. i(t) V(t) VDD Gate Delay Characterize POWER GRID V(t) VDD GND Variables GND GROUND GRID 3B.4 - Vectorless Analysis of Supply Noise Induced Delay Variation Divide Chip Into Blocks Compute Unit Pulse Response Express Delay/Voltage Using Spatial/Temporal Superposition Formulate Delay/Voltage Max. As Linear Optimization

  42. 5B.2 - Fault-Tolerant Techniques for Ambient Intelligent Distributed Systems • Diana Marculescu ECE@CMU • Novel techniques for harnessing redundancy as a way for increasing fault-tolerance • Assume a large number of networked devices • Idle devices can act as surrogates for failing ones via application migration or remapping • Scheduling techniques for optimizing system lifetime • Determine optimal migration schedule, under realistic battery models

  43. 8C.2 - Dynamic Fault-Tolerance and Metrics for Battery Powered, Failure-Prone Systems • Phillip Stanley-Marbell, Diana MarculescuECE@CMU • Introduce the concept of adaptive fault-tolerance management for failure-prone systems, and a classification of local algorithms for achieving system-wide reliability.

  44. Performance Optimization

  45. 5B.1 - Cache Optimization For Embedded Processor Cores: An Analytical Approach • Arijit Ghosh, Tony Givargis CS@UCI • An efficient algorithm to directly compute cache parameters satisfying desired performance criteria.

  46. 5B.3 - Performance Efficiency of Context-Flow System-On-Chip Platform • Rami Beidas, Jianwen Zhu ECE@Toronto • A new programming model, called context-flow, that is simple, safe, highly parallelizable yet transparent to the underlying architectural details.

  47. Simulation at the Nanometer Scale

  48. 7A.1 - A Probabilistic-Based Design Methodology for Nano-Scale Computation • Iris Bahar, Joseph Mundy, Jie Chen Brown • Based on Markov random fields • Propose a new architectural framework designed to handle faulty processes prevalent with nanoscale devices • Dynamically defect tolerant • Adapts to errors as a natural consequence of probability maximization • Removes need to actually detect faults • Can handle both structure- and signal-based faults

  49. On Junction Off Junction Carbon Nanotubes 7A.1 - A Probabilistic-Based Design Methodology for Nano-Scale Computation • Carbon Nanotubes (CNTs) • Excellent conductors • Diodes, FETs, and memory arrays using CNTs have been demonstrated • Physical placement of CNTs is an issue • Alumina substrates have been proposed to fabricate arrays of CNTs

  50. 7A.1 - A Probabilistic-Based Design Methodology for Nano-Scale Computation • Molecular devices • Direct use of molecules and their electronic states • Conduction achieved by changes in physical configuration or electronic state • Diodes and memory have been demonstrated additional electron switch on

More Related