Digital Integrated Circuits A Design Perspective. System on a Chip Design. Application Specific Integrated Circuits: Introduction. Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. http://vada.skku.ac.kr. Contents. Why ASIC? Introduction to System On Chip Design
Digital Integrated CircuitsA Design Perspective System on a Chip Design
Application Specific Integrated Circuits: Introduction Jun-Dong Cho SungKyunKwan Univ. Dept. of ECE, Vada Lab. http://vada.skku.ac.kr
Contents • Why ASIC? • Introduction to System On Chip Design • Hardware and Software Co-design • Low Power ASIC Designs
Why ASIC – Design productivity grows! Complexity increase 40 % per year Design productivity increase 15 % per year • Integration of PCB on single die
Silicon in 2010 Die Area: 2.5x2.5 cm Voltage: 0.6 V Technology: 0.07 m
ASIC Principles • Value-added ASIC for huge volume opportunities; standard parts for quick time to market applications • Economics of Design • Fast Prototyping, Low Volume • Custom Design, Labor Intensive, High Volume • CAD Tools Needed to Achieve the Design Strategies • System-level design: Concept to VHDL/C • Physical design VHDL/C to silicon, Timing closure (Monterey, Magma, Synopsys, Cadence, Avant!) • Design Strategies:Hierarchy; Regularity; Modularity; Locality
ASIC Design Strategies • Design is a continuous tradeoff to achieve performance specs with adequate results in all the other parameters. • Performance Specs- function, timing, speed, power • Size of Die- manufacturing cost • Time to Design- engineering cost and schedule • Ease of Test Generation & Testability- engineering cost, manufacturing cost, schedule
Structured ASIC Designs • Hierarchy:Subdivide the design into many levels of sub-modules • Regularity: Subdivide to max number of similar sub-modules at each level • Modularity: Define sub-modules unambiguously & well defined interfaces • Locality: Max local connections, keeping critical paths within module boundaries
ASIC Design Options • Programmable Logic • Programmable Interconnect • Reprogrammable Gate Arrays • Sea of Gates & Gate Array Design • Standard Cell Design • Full Custom Mask Design • Symbolic Layout • Process Migration - Retargeting Designs
Why SOC? • SOC specs are coming from system engineers rather • than RTL descriptions • SOC will bridge the gap hardware/software and their implementation in novel, energy-efficient silicon architecture. • In SOC design, chips are assembled at IP block level (design reusable) and IP interfaces rather than gate level
mP core Dedicated logic phone book keypad intfc phonebook RAM & ROM DMA S/P control protocol Demod and sync Viterbi Equal. voice recognition speech quality enhancement A de-intl & decoder RPE-LTP speech decoder digital down conv D Analog DSP core CMOS density now allows complete System-on-a-chip Solutions • FPGA • Reconfigurable Interconnect Source: Brodersen, ICASSP ‘98 Also like to add How do we design these chips?
Software Radio GOAL: Simplify System Design Process Seek architectures which are flexible such that hardware and protocols can be designed independently APPROACH: Minimize the use of dedicated logic Universal Radio GOAL: Maximize Bandwidth Efficiency and Battery Life Seek architectures which perform complex algorithms very fast with minimal energy APPROACH: Minimize the use of programmable logic Possible Single-Chip Radio Architectures Why is SOC design so scary?
A low power 30 GHz LNA is designed as the front end of the receiver. Wideband and high gain response is realized by a 2-stage design using a stagger-tuned technique. The simulated performance predicts a forward gain of |S21| > 20 dB over a 6 GHz range with an input match of |S11| < -30 dB and output match of |S22| < -10 dB. The mixer consists of a single balanced Gilbert cell. A fully-integrated differential 25 GHz VCO is used, in conjunction with the mixer, to downconvert the RF input to a 5 GHz IF. 60 GHz SiGe Transceiver for Wireless LAN Applications 30 GHz receiver layout consisting of the LNA, mixer and VCO
A 1.8 GHz wideband LC VCO implemented in 0.18 µm bulk CMOS has been successfully designed, fabricated, and measured. This VCO utilizes a 4-bit array of switched capacitors and a small accumulation-mode varactor to achieve a measured tuning range exceeding 2:1 (73%) and a worst-case tuning sensitivity of 270 MHz/V. The amplitude reference level is programmable by means of a 3-bit DAC. Wideband CMOS LC VCO VCOs die photograph
HDL Entry Front-End good? Synthesis good? Back-End Floor-plan Place & Route good? Physical Verification DRC & LVS good? done A High Level View of an Industry Standard Design Flow source: Hitachi, Prof. R. W. Brodersen • Every step can loop to every other step • Each step can take hours or days for a 100,000 line description • HDL description contains no physical information • Different engineers handle the front-end and back-end design Problems with this flow: How have semiconductor companies made this flow work?
Architecture 10 months Front-End 10 months Back-End 2 months Fabrication 2 months A More Accurate Picture of the Standard Flow Source: IBM Semiconductor, Prof. R. Newton • Architecture: Partition the chip into functional units and generate bit-true test vectors to specify the behavior of each unitTOOLS: Matlab, C, SPW, (VCC)FREEZE the test vectors • Front-End: Enter HDL code which matches the test vectorsTOOLS: HDL Simulators, Design CompilerFREEZE the HDL code • Back-End: Create a floor-plan and tweak the tools until a successful mask layout is createdTOOLS:Design Compiler, Floor-planners, Placers, Routers, Clock-tree generators, Physical Verification How can we improve this flow?
Common Fabric for IP Blocks • Soft IP blocks are portable, but not as predictable as hard IP. • Hard IP blocks are very predictable since a specific physical implementation can be characterized, but are hard to port since are often tied to a specific process. • Common fabric is required for both portability and predictability. • Wide availability: Cell Based Array, metal programmable architecture that provides the performance of a standard cell and is optimized for synthesis.
Four main applications • Set-top box: Mobile multimedia system, base station for the home local-area network. • Digital PCTV: concurrent use of TV,3D graphics, and Internet services • Set-top box LAN service: Wireless home-networks, multi-user wireless LAN • Navigation system:steer and control traffic and/or goods-transportation • CMPRis a multipurpose program that can be used for displaying diffraction data, manual- & auto-indexing, peak fitting and other
Physical gap • Timing closure problem: layout-driven logic and RT-level synthesis • Energy efficiency requires locality of computation and storage: match for stream-based data processing of speech,images, and multimedia-system packets. • Next generation SOC designers must bridge the architectural gap b/w system specification and energy-efficient IP-based architectures, while CAE vendors and IP providers will bridge the physical gap.
SOC Co-Design Challenges • Current systems are complex and heterogenous Contain many different types of components • Half of the chip can be filled with 200 low-power, RISC-like processors (ASIP) interconnected by field-programmable buses, embedded in 20Mbytes of distributed DRAM and flash memory, Another Half: ASIC • Computational power will not result from multi-GHz clocking but from parallelism, with below 200 MHz. • This will greatly simplify the design for correct timing, testability, and signal integrity.
Bridging the architectural gap • One-M gate reconfigurable, one-M gate hardwired logic. • 50GIPS for programmable components or 500 GIPS for dedicated hardwares • Product reliability: design at a level far above the RT level, with reuse factors in excess of 100 • Trade-off: 100MOPs/watt (microprocessor) 100GOPs/watt (hardwired) Reconf. Computing with a large number of computing nodes and a very restricted instruction set (Pleiades)
Portable systems long battery life light weight small form factor IC priority list power dissipation cost performance Technology direction Reduced voltage/power designs based on mature high performance IC technology, high integration to minimize size, cost, power, and speed Why Lower Power
Power(W) Alpha 21164 Alpha 21264 50 45 P III 500 P II 300 40 35 Alpha21064 200 30 25 P6 166 20 P5 66 15 P-PC604 133 10 i486 DX2 66 P-PC601 50 i486 DX25 i386 DX 16 i486 DX4 100 5 i286 i486 DX 50 P-PC750 400 1980 1985 1990 1995 2000 year Microprocessor Power Dissipation
Power-hungry Applications • Signal Compression: HDTV Standard, ADPCM, Vector Quantization, H.263, 2-D motion estimation, MPEG-2 storage management • Digital Communications: Shaping Filters, Equalizers, Viterbi decoders, Reed-Solomon decoders
New Computing Platforms • SOC power efficiency more than 10GOPs/w • Higher On Chip System Integration: COTS: 100W, SOC:10W (inter-chip capacitive loads, I/O buffers) • Speed & Performance: shorter interconnection,fewer drivers,faster devices,more efficient processing artchitectures • Mixed signal systems • Reuse of IP blocks • Multiprocessor, configurable computing • Domain-specific, combined memory-logic
Function System System-Level Partitioning and Level Power Analysis HW/SW Allocation Specification Behavioral Software Description Functions Power-driven Behavioral-Level Processor Behavioral Power Analysis Selection Transformation Power Conscious Behavioral Description High-Level Software-Level RT-Level Software Synthesis and Power Analysis Power Analysis Optimization Optimization To RT-Level Design Low Power Design Flow I
RT-level Description Controller Data-path Logic Synthesis Gate-Level RTL RTL and Power Analysis mapping Library Optimization Gate-level Description High-Level Switch-Level Standard cell Synthesis and Processor Memory Power Analysis Library Optimization Control and RTL Steering Logic Macrocells Switch-level Description Low Power Design Flow II
Three Factors affecting Energy • Reducing waste by Hardware Simplification: redundant h/w extraction, Locality of reference,Demand-driven / Data-driven computation,Application-specific processing,Preservation of data correlations, Distributed processing • All in one Approach(SOC): I/O pin and buffer reduction • Voltage Reducible Hardwares • 2-D pipelining (systolic arrays) • SIMD:Parallel Processing:useful for data w/ parallel structure • VLIW: Approach- flexible
IBM’s PowerPC Lower Power Architecture • Optimum Supply Voltage through Hardware Parallel, Pipelining ,Parallel instruction execution • 603e executes five instruction in parallel (IU, FPU, BPU, LSU, SRU) • FPU is pipelined so a multiply-add instruction can be issued every clock cycle • Low power 3.3-volt design • Use small complex instruction with smaller instruction length • IBM’s PowerPC 603e is RISC • Superscalar: CPI < 1 • 603e issues as many as three instructions per cycle • Low Power Management • 603e provides four software controllable power-saving modes. • Copper Processor with SOI • IBM’s Blue Logic ASIC :New design reduces of power by a factor of 10 times
Power-Down Techniques Lowering the voltage along with the clock actually alters the energy-per-operation of the microprocessor, reducing the energy required to perform a fixed amount of work
Three Co-Design Approaches • IFIP International Conference FORTE/PSTV’98, Nov.’98 N.S. Voros et.al, “Hardware -software co-design of embedded systems using multiple formalisms for application development” • ASIP co-design: builds a specific programmable processor for an application, and translates the application into software code. H/w and s/w partitioning includes the instruction set design. • H/w s/w synchronous system co-design: s/w processor as a master controller, and a set of h/w accelerators as co-processors. Vulcan, Codes, Tosca, Cosyma • H/w s/w for distributed systems: mapping of a set of communication processors onto a set of interconnected processors. Behavioral decomposition, process allocation and communication transformation. Coware(powerful), Siera (reuse), Ptolemy (DSP)
Mixing H/W and S/W • Argument: Mixed hardware/ software systems represent the best of both worlds. High performance, flexibility, design reuse, etc. • Counterpoint: From a design standpoint, it is the worst of both worlds • Simulation: Problems of verification, and test become harder • Interface: Too many tools, too many interactions, too much heterogeneity • Hardware/ software partitioning is “AI- complete”! • (MIT, Stanford: by analogy with "NP-complete") A term used to describe problems in artificial intelligence, to indicate that the solution presupposes a solution to the "strong AI problem" (that is, the synthesis of a human-level intelligence). A problem that is AI-complete is just too hard.
Low power partitioning approach • Different HW resources are invoked according to the instruction executed at a specific point in time • During the execution of the add op., ALU and register are used, but Multiplier is in idle state. • Non-active resources will still consume energy since the according circuit continue to switch • Calculate wasting energy • Adding application specific core and partial running Whenever one core performing, all the other cores are shut down
ASIP (Application Specific Instruction Processors) Design • Given a set of applications, determine micro architecture of ASIP (i. e., configuration of functional units in datapaths, instruction set) • To accurately evaluate performance of processor on a given application need to compile the application program onto the processor datapath and simulate object code. • The micro architecture of the processor is a design parameter!
Cross-Disciplinary nature • Software for low power:loop transformation leads to much higher temporal and spatial locality of data. • Code size becomes an important objective Software will eventually become a part of the chip • Behavior-platform-compiler codesign: codesigned with C++ or JAVA, describing their h/w and s/w implementation. • Multidisciplinary system thinking is required for future designs (e.g., Eindhoven Embedded Systems Institutehttp://www.eesi.tue.nl/english)
VLSI Signal Processing Design Methodology • pipelining, parallel processing, retiming, folding, unfolding, look-ahead, relaxed look-ahead, and approximate filtering • bit-serial, bit-parallel and digit-serial architectures, carry save architecture • redundant and residue systems • Viterbi decoder, motion compensation, 2D-filtering, and data transmission systems
Low Power DSP • DO-LOOPDominant • VSELP Vocoder : 83.4 % • 2D 8x8 DCT : 98.3 % • LPC computation : 98.0 % DO-LOOPPower Minimization ==> DSPPower Minimization VSELP : Vector Sum Excited Linear Prediction LPC : Linear Prediction Coding
Deep-Submicron Design Flows • Rapid evaluation of complex designs for area and performance • Timing convergence via estimated routing parasitics • In-place timing repair without resynthesis • Shorter design intervals, minimum iterations • Block-level design and place and route • Localized changes without disturbance • Integration of complex projects and design reuse
Avant! www.avanticorp.com Cadence www.cadence.com Duet Tech www.duettech.com Escalade www.escalade.com Logic visions www.logicvision.com Mentor Graphics www.mentor.com Palmchip www.palmchip.com Sonic www.sonicsinc.com Summit Design www.summit-design.com Synopsys www.synopsys.com Topdown design solutions www.topdown.com Xynetix Design Systems www.xynetix.com Zuken-Redac www.redac.co.uk SOC CAD Companies
Design Technology for Low Power Radio Systems Rhett Davis Dept. of EECS Univ. of Calif. Berkeley http://bwrc.eecs.berkeley.edu
Domain of Interest • Highly integrated system-on-a-chip solutions – SOC’s • Wireless communications with associated processing, e.g. multimedia processing, compression, switching, etc… • Primary computation is high complexity dataflow with a relatively small amount of control