CPE 626: Advanced VLSI DesignL01 Department of Electrical and Computer Engineering University of Alabama in Huntsville
Outline • Computer Engineering: Motivation, Present, Future • Computer Engineering Methodology • Power as a Design Constraint • Stored-program Computer: MU0 Example • Digital System Modeling: Motivation
Why Computer Engineering? CHANGE! It is exciting. It has never been more exciting!It impacts every aspect of human life. PC, 2002 PDA, 2002 Eniac, 1946 (first stored-program computer) Bionic, 2002
Why Such Change? • Continuous growth in performance due to advances in technology (CMOS VLSI) and innovations in computer design (RISC, RAID, ILP) • Lower cost due to simpler development and higher volumes • These resulted in significant enhancement of the capability available to computer user • Example: our today’s PC of less than $1000 has more performance, main memory and disk storage than $1 million computer in 1970s
Computer Engineering Methodology Market Applications Evaluate Existing Systems for Bottlenecks Implementation Complexity Benchmarks Technology Trends Simulate New Designs and Organizations Implement Next Generation System Workloads
Technology Trends State of the art: Intel Pentium 4, 2.2 GHz, 0.13microns, 42 million transistors Reuters, Monday 11 June 2001: Intel engineers have designed and manufactured the world’s smallest and fastest transistor of 0.02 microns in size. This will open the way for microprocessors of 1 billion transistors, running at 20 GHz by 2007.
Pentium III Die Photo • EBL/BBL - Bus logic, Front, Back • MOB - Memory Order Buffer • Packed FPU - MMX Fl. Pt. (SSE) • IEU - Integer Execution Unit • FAU - Fl. Pt. Arithmetic Unit • MIU - Memory Interface Unit • DCU - Data Cache Unit • PMH - Page Miss Handler • DTLB - Data TLB • BAC - Branch Address Calculator • RAT - Register Alias Table • SIMD - Packed Fl. Pt. • RS - Reservation Station • BTB - Branch Target Buffer • IFU - Instruction Fetch Unit (+I$) • ID - Instruction Decode • ROB - Reorder Buffer • MS - Micro-instruction Sequencer 1st Pentium III, Katmai: 9.5 M transistors, 12.3 * 10.4 mm in 0.25-mi. with 5 layers of aluminum
Pentium 4 Die Photo • 42M Xtors • PIII: 26M • 217 mm2 • PIII: 106 mm2 • L1 Execution Cache • Buffer 12,000 Micro-Ops • 8KB data cache • 256KB L2$
Future Applications • Desktop: 90% of cycles will be spent on media applications • video encode/decode, polygon & image-based graphics • audio processing, compression, music, speech recognition/synthesis • modulation/demodulation at audio and video rates • Scientific desktops: high-performance FPs and graphics • Commercial servers: support for databases and transaction processing, enhancement for reliability, support for scalability • Embedded computing: special support for graphics or video, power limitations
Future Directions • Conditions • new workloads are characterised with more exploitable parallelism • dominant wire delays on a billion transistor chip will force hardware to be more distributed • Novel architectural techniques • Exploit parallelism • multiprocessor on chip • simultaneous multithreading • CPU-memory integration • memory tolerating techniques • flexible hierarchy to adapt to application • Reconfigurable computing Develop architectural techniques that exploit semiconductor technology and workload characteristics in order to maximize performance at low cost
Power as a Design Constraint Power becomes critical issue • Portable and mobile platforms • battery-operated devices • Desktops, server farms • Reliability? • Power consumption: IT consumes 10% in the US • Power density: 30 W/cm2 in Alpha 21364 (3x of typical hot plate)
Power as a Design Constraint (cont’d) Power due to short-circuit current during transition Dynamic power consumption Power due to leakage current A (activity of gates) => Turn off unused parts or use design techniques to minimize number of transitions Reduce the supply voltage, V Reduce threshold Vt
Recap: Computer Architecture • Computer Architecture describes user’s view of the computer: visible registers, data types, instruction set, instruction formats, memory management table structures, exception handling • Computer Organization describes user’s invisible implementation of the architecture: pipeline structure, caches, TLB, ...
Typical Hierarchy Vdd • Transistors • Logic gates, memory cells, special circuits • Single-bit adders, MUXs, flip-flops, decoders, coders • Word-wide adders, MUXs, registers, decoders, buses • ALUs, shifters, register files, memory blocks • Processor, peripheral cells, cache memories, MMUs • Integrated system chips • PCBs • Mobile phones, laptops, PCs, engine controllers A A.B B Vss
MU0 – A Simple Processor • Instruction format • Instruction set
Program Counter – PC Accumulator - ACC Arithmetic-Logic Unit – ALU Instruction Register Instruction Decode andControl Logic MU0 Datapath Example Follow the principle that the memory will be limiting factor in design: each instruction takes exactly the number of clock cycles defined by the number of memory accesses it must take.
Assume that each instruction starts when it has arrived in the IR Step 1: EX (execute) LDA S: ACC <- Mem[S] STO S: Mem[S] <- ACC ADD S: ACC <- ACC + Mem[S] SUB S: ACC <- ACC - Mem[S] JMP S: PC <- S JGE S: if (ACC >= 0) PC <- S JNE S: if (ACC != 0) PC <- S Step 2: IF (fetch the next instruction) Either PC or the address in the IR is issued to fetch the next instruction address is incremented in the ALU and value saved into the PC Initialization Reset input to start executing instructions from a known address; here it is 000hex provide zero at the ALU output and then load it into the PC register MU0 Datapath Design
Control Logic Asel Bsel ACCce (ACC change enable) PCce (PC change enable) IRce (IR change enable) ACCoe (ACC output enable) ALUfs (ALU function select) MEMrq (memory request) RnW (read/write) Ex/ft (execute/fetch) MU0 RTL Organization
ALU functions: A+B, A-B, B, B+1, 0 (used only when reset is active) => 4 functions Aen (enable operand A) Binv (invert operand B) MU0 ALU Design
Digital System Modeling: Motivation • Requirements specification • Functional specification • Testing and verification of the design • Formal verification of the correctness of the design • Automatic synthesis
Architectural Structural Behavioral Algorithmic Processor Functional Block Systems Hardware Modules Algorithms Logic ALUs, Registers Register Transfer Gates, FFs Circuit Logic Transistors Transfer Functions Rectangles Cell, Module Plans Floor Plans Clusters Physical Partitions Gajski and Kuhn’s Y Chart Domains Functional – operations performed by the system Structural – how the system is composed Geometry – how the system is laid out in physical space Physical/Geometry