1 / 49

Microprocessors

Microprocessors. source. gate. Conducts if gate=1. drain. 1. gate. oxide. IC package. IC. source. channel. drain. Silicon substrate. CMOS transistor on silicon. Transistor The basic electrical component in digital systems Acts as an on/off switch

talli
Download Presentation

Microprocessors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Microprocessors

  2. source gate Conducts if gate=1 drain 1 gate oxide IC package IC source channel drain Silicon substrate CMOS transistor on silicon • Transistor • The basic electrical component in digital systems • Acts as an on/off switch • Voltage at “gate” controls whether current flows from source to drain • Don’t confuse this “gate” with a logic gate

  3. source source gate Conducts if gate=0 gate Conducts if gate=1 drain drain pMOS nMOS 1 1 1 x x y x F = x' y F = (xy)' x F = (x+y)' y 0 x y 0 0 NOR gate inverter NAND gate CMOS transistor implementations • Complementary Metal Oxide Semiconductor • We refer to logic levels • Typically 0 is 0V, 1 is 5V • Two basic CMOS types • nMOS conducts if gate=1 • pMOS conducts if gate=0 • Hence “complementary” • Basic gates • Inverter, NAND, NOR

  4. x x F F x x x F F y x F x x x x x x y y y y y y F F F F F F y 0 0 0 1 F y 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 1 1 1 0 0 0 0 0 0 0 1 1 1 1 1 1 0 1 1 0 0 1 1 1 1 1 1 1 0 0 0 0 0 0 1 1 0 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 0 1 x x x F x F F F y y y F = x y XNOR Basic logic gates F = x Driver F = x y AND F = x + y OR F = x  y XOR F = x’ Inverter F = (x y)’ NAND F = (x+y)’ NOR

  5. B) Truth table C) Output equations D) Minimized output equations Outputs Inputs y bc y = a'bc + ab'c' + ab'c + abc' + abc a b c y z 00 01 11 10 a 0 0 0 1 0 0 0 0 0 0 0 0 1 0 1 1 1 1 1 1 z = a'b'c + a'bc' + ab'c + abc' + abc 0 1 0 0 1 0 1 1 1 0 y = a + bc z 1 0 0 1 0 bc 00 01 11 10 1 0 1 1 1 a 0 0 1 0 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 E) Logic Gates z = ab + b’c + bc’ a y b c z Combinational logic design A) Problem description y is 1 if a is to 1, or b and c are 1. z is 1 if b or c is to 1, but not both, or if all are 1.

  6. A B I1 I0 I(m-1) n n n … n bit, m function ALU S0 n-bit, m x 1 Multiplexor S0 … … S(log m) n S(log m) n O O O = I0 if S=0..00 I1 if S=0..01 … I(m-1) if S=1..11 less = 1 if A<B equal =1 if A=B greater=1 if A>B O = A op B op determined by S. O0 =1 if I=0..00 O1 =1 if I=0..01 … O(n-1) =1 if I=1..11 sum = A+B (first n bits) carry = (n+1)’th bit of A+B A B I0 A I(log n -1) B n n … n log n x n Decoder n-bit Adder n-bit Comparator With enable input e  all O’s are 0 if e=0 With carry-in input Ci sum = A + B + Ci May have status outputs carry, zero, etc. … n O(n-1) O1 O0 carry sum less equal greater Combinational components

  7. I n load shift n-bit Register n-bit Shift register n-bit Counter clear I Q n n Q Q Sequential components Q = lsb - Content shifted - I stored in msb Q = 0 if clear=1, I if load=1 and clock=1, Q(previous) otherwise. Q = 0 if clear=1, Q(prev)+1 if count=1 and clock=1.

  8. Gated R-S Latch (clocked S-R flip-flop) Enb = 1, latch closed (outputs unchanged) Enb = 0, enabled (outputs depend on inputs)

  9. J-K Flip-flop How to eliminate the forbidden state? Idea: use output feedback to guarantee that R and S are never both one J, K both one yields toggle Characteristic Equation: Q+ = Q K + Q J

  10. D) State Table (Moore-type) C) Implementation Model B) State Diagram Outputs Inputs Q1 Q0 a I1 I0 x x a Combinational logic 0 0 0 0 0 x=1 x=0 a=0 a=0 0 I1 0 0 1 0 1 0 3 a=1 0 1 0 0 1 I0 0 0 1 1 1 0 a=1 a=1 1 0 0 1 0 0 Q1 Q0 1 0 1 1 1 1 2 1 1 0 1 1 1 a=1 1 1 1 0 0 State register a=0 a=0 x=0 x=0 I0 I1 Sequential logic design • Given this implementation model • Sequential logic design quickly reduces to combinational logic design A) Problem Description You want to construct a clock divider. Slow down your pre-existing clock so that you output a 1 for every four clock cycles

  11. E) Minimized Output Equations F) Combinational Logic Q1Q0 I1 00 01 11 10 a 0 0 1 1 a 0 I1 = Q1’Q0a + Q1a’ + Q1Q0’ x 0 1 0 1 1 Q1Q0 I0 00 01 11 10 a 0 1 1 0 I1 0 I0 = Q0a’ + Q0’a 1 0 0 1 1 x I0 Q1Q0 00 01 11 10 a 0 0 1 0 x = Q1Q0 0 0 0 1 0 Q1 Q0 1 Sequential logic design (cont.)

  12. Processor Control unit Datapath ALU Controller Control /Status Registers PC IR I/O Memory Basic Architecture • Control unit and datapath • Note similarity to single-purpose processor • Key differences • Datapath is general • Control unit doesn’t store the algorithm – the algorithm is “programmed” into the memory

  13. +1 Datapath Operations • Load • Read memory location into register Processor Control unit Datapath ALU • ALU operation • Input certain registers through ALU, store back in register Controller Control /Status Registers • Store • Write register to memory location 10 11 PC IR I/O ... Memory 10 11 ...

  14. Processor Control unit Datapath ALU Controller Control /Status Registers PC IR R0 R1 I/O ... Memory 100 load R0, M[500] 500 10 101 inc R1, R0 501 ... 102 store M[501], R1 Control Unit • Control unit: configures the datapath operations • Sequence of desired operations (“instructions”) stored in memory – “program” • Instruction cycle – broken into several sub-operations, each one clock cycle, e.g.: • Fetch: Get next instruction into IR • Decode: Determine what the instruction means • Fetch operands: Move data from memory to datapath register • Execute: Move data through the ALU • Store results: Write data from register to memory

  15. Control Unit Sub-Operations • Fetch • Get next instruction into IR • PC: program counter, always points to next instruction • IR: holds the fetched instruction Processor Control unit Datapath ALU Controller Control /Status Registers PC IR 100 R0 R1 load R0, M[500] I/O ... Memory 100 load R0, M[500] 500 10 101 inc R1, R0 501 ... 102 store M[501], R1

  16. Control Unit Sub-Operations • Decode • Determine what the instruction means Processor Control unit Datapath ALU Controller Control /Status Registers PC IR 100 R0 R1 load R0, M[500] I/O ... Memory 100 load R0, M[500] 500 10 101 inc R1, R0 501 ... 102 store M[501], R1

  17. Control Unit Sub-Operations • Fetch operands • Move data from memory to datapath register Processor Control unit Datapath ALU Controller Control /Status Registers 10 PC IR 100 R0 R1 load R0, M[500] I/O ... Memory 100 load R0, M[500] 500 10 101 inc R1, R0 501 ... 102 store M[501], R1

  18. Control Unit Sub-Operations • Execute • Move data through the ALU • This particular instruction does nothing during this sub-operation Processor Control unit Datapath ALU Controller Control /Status Registers 10 PC IR 100 R0 R1 load R0, M[500] I/O ... Memory 100 load R0, M[500] 500 10 101 inc R1, R0 501 ... 102 store M[501], R1

  19. Control Unit Sub-Operations • Store results • Write data from register to memory • This particular instruction does nothing during this sub-operation Processor Control unit Datapath ALU Controller Control /Status Registers 10 PC IR 100 R0 R1 load R0, M[500] I/O ... Memory 100 load R0, M[500] 500 10 101 inc R1, R0 501 ... 102 store M[501], R1

  20. Processor Fetch ops Store results Control unit Datapath Fetch Decode Exec. ALU Controller Control /Status Registers 10 PC IR R0 R1 load R0, M[500] I/O ... Memory 100 load R0, M[500] 500 10 101 inc R1, R0 501 ... 102 store M[501], R1 Instruction Cycles PC=100 clk 100

  21. Processor Control unit Datapath ALU Controller +1 Control /Status Registers Fetch ops Store results Fetch Decode Exec. 11 PC IR R0 R1 inc R1, R0 I/O ... Memory 100 load R0, M[500] 500 10 101 inc R1, R0 501 ... 102 store M[501], R1 Instruction Cycles PC=100 Fetch ops Store results Fetch Decode Exec. clk PC=101 clk 10 101

  22. Processor Control unit Datapath ALU Controller Control /Status Registers PC IR R0 R1 store M[501], R1 Fetch ops Store results Fetch Decode Exec. I/O ... Memory 100 load R0, M[500] 500 10 101 inc R1, R0 501 11 ... 102 store M[501], R1 Instruction Cycles PC=100 Fetch ops Store results Fetch Decode Exec. clk PC=101 Fetch ops Store results Fetch Decode Exec. clk 10 11 102 PC=102 clk

  23. Processor Control unit Datapath ALU Controller Control /Status Registers PC IR I/O Memory Architectural Considerations • N-bit processor • N-bit ALU, registers, buses, memory data interface • Embedded: 8-bit, 16-bit, 32-bit common • Desktop/servers: 32-bit, even 64 • PC size determines address space

  24. Processor Control unit Datapath ALU Controller Control /Status Registers PC IR I/O Memory Architectural Considerations • Clock frequency • Inverse of clock period • Must be longer than longest register to register delay in entire processor • Memory access is often the longest

  25. Pipelining: Increasing Instruction Throughput Wash 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 Non-pipelined Pipelined Dry 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 non-pipelined dish cleaning Time pipelined dish cleaning Time Fetch-instr. 1 2 3 4 5 6 7 8 Decode 1 2 3 4 5 6 7 8 Fetch ops. 1 2 3 4 5 6 7 8 Pipelined Execute 1 2 3 4 5 6 7 8 Instruction 1 Store res. 1 2 3 4 5 6 7 8 Time pipelined instruction execution

  26. Superscalar and VLIW Architectures • Performance can be improved by: • Faster clock (but there’s a limit) • Pipelining: slice up instruction into stages, overlap stages • Multiple ALUs to support more than one instruction stream • Superscalar • Scalar: non-vector operations • Fetches instructions in batches, executes as many as possible • May require extensive hardware to detect independent instructions • VLIW: each word in memory has multiple independent instructions • Relies on the compiler to detect and schedule instructions • Currently growing in popularity

  27. Processor Processor Program memory Data memory Memory (program and data) Harvard Princeton Two Memory Architectures • Princeton • Fewer memory wires • Harvard • Simultaneous program and data memory access

  28. Fast/expensive technology, usually on the same chip Processor Cache Memory Slower/cheaper technology, usually on a different chip Cache Memory • Memory access may be slow • Cache is small but fast memory close to processor • Holds copy of part of memory • Hits and misses

  29. Programmer’s View • Programmer doesn’t need detailed understanding of architecture • Instead, needs to know what instructions can be executed • Two levels of instructions: • Assembly level • Structured languages (C, C++, Java, etc.) • Most development today done using structured languages • But, some assembly level programming may still be necessary • Drivers: portion of program that communicates with and/or controls (drives) another device • Often have detailed timing considerations, extensive bit manipulation • Assembly level may be best for these

  30. Instruction 1 opcode operand1 operand2 Instruction 2 opcode operand1 operand2 Instruction 3 opcode operand1 operand2 Instruction 4 opcode operand1 operand2 ... Assembly-Level Instructions • Instruction Set • Defines the legal set of instructions for that processor • Data transfer: memory/register, register/register, I/O, etc. • Arithmetic/logical: move register through ALU and back • Branches: determine next PC value when not just PC+1

  31. A Simple (Trivial) Instruction Set Assembly instruct. First byte Second byte Operation MOV Rn, direct 0000 Rn direct Rn = M(direct) MOV direct, Rn 0001 Rn direct M(direct) = Rn Rm MOV @Rn, Rm 0010 Rn M(Rn) = Rm MOV Rn, #immed. 0011 Rn immediate Rn = immediate ADD Rn, Rm 0100 Rn Rm Rn = Rn + Rm SUB Rn, Rm 0101 Rn Rm Rn = Rn - Rm JZ Rn, relative 0110 Rn relative PC = PC+ relative (only if Rn is 0) opcode operands

  32. Addressing mode Register-file contents Memory contents Operand field Immediate Data Register-direct Register address Data Register indirect Register address Memory address Data Direct Memory address Data Indirect Memory address Memory address Data Addressing Modes

  33. C program Equivalent assembly program 0 MOV R0, #0; // total = 0 1 MOV R1, #10; // i = 10 2 MOV R2, #1; // constant 1 int total = 0; for (int i=10; i!=0; i--) total += i; // next instructions... 3 MOV R3, #0; // constant 0 Loop: JZ R1, Next; // Done if i=0 5 ADD R0, R1; // total += i 6 SUB R1, R2; // i-- 7 JZ R3, Loop; // Jump always Next: // next instructions... Sample Programs • Try some others • Handshake: Wait until the value of M[254] is not 0, set M[255] to 1, wait until M[254] is 0, set M[255] to 0 (assume those locations are ports). • (Harder) Count the occurrences of zero in an array stored in memory locations 100 through 199.

  34. Application-Specific Instruction-Set Processors (ASIPs) • General-purpose processors • Sometimes too general to be effective in demanding application • e.g., video processing – requires huge video buffers and operations on large arrays of data, inefficient on a GPP • But single-purpose processor has high NRE, not programmable • ASIPs – targeted to a particular domain • Contain architectural features specific to that domain • e.g., embedded control, digital signal processing, video processing, network processing, telecommunications, etc. • Still programmable

  35. A Common ASIP: Microcontroller • For embedded control applications • Reading sensors, setting actuators • Mostly dealing with events (bits): data is present, but not in huge amounts • e.g., VCR, disk drive, digital camera (assuming SPP for image compression), washing machine, microwave oven • Microcontroller features • On-chip peripherals • Timers, analog-digital converters, serial communication, etc. • Tightly integrated for programmer, typically part of register space • On-chip program and data memory • Direct programmer access to many of the chip’s pins • Specialized instructions for bit-manipulation and other low-level operations

  36. Another Common ASIP: Digital Signal Processors (DSP) • For signal processing applications • Large amounts of digitized data, often streaming • Data transformations must be applied fast • e.g., cell-phone voice filter, digital TV, music synthesizer • DSP features • Several instruction execution units • Multiple-accumulate single-cycle instruction, other instrs. • Efficient vector operations – e.g., add two arrays • Vector ALUs, loop buffers, etc.

  37. Trend: Even More Customized ASIPs • In the past, microprocessors were acquired as chips • Today, we increasingly acquire a processor as Intellectual Property (IP) • e.g., synthesizable VHDL model • Opportunity to add a custom datapath hardware and a few custom instructions, or delete a few instructions • Can have significant performance, power and size impacts • Problem: need compiler/debugger for customized ASIP • Remember, most development uses structured languages • One solution: automatic compiler/debugger generation • e.g., www.tensillica.com • Another solution: retargettable compilers • e.g., www.improvsys.com (customized VLIW architectures)

  38. Programmer Considerations • Program and data memory space • Embedded processors often very limited • e.g., 64 Kbytes program, 256 bytes of RAM (expandable) • Registers: How many are there? • Only a direct concern for assembly-level programmers • I/O • How communicate with external signals? • Interrupts

  39. Selecting a Microprocessor • Issues • Technical: speed, power, size, cost • Other: development environment, prior expertise, licensing, etc. • Speed: how evaluate a processor’s speed? • Clock speed – but instructions per cycle may differ • Instructions per second – but work per instr. may differ • Dhrystone: Synthetic benchmark, developed in 1984. Dhrystones/sec. • MIPS: 1 MIPS = 1757 Dhrystones per second (based on Digital’s VAX 11/780). A.k.a. Dhrystone MIPS. Commonly used today. • So, 750 MIPS = 750*1757 = 1,317,750 Dhrystones per second • SPEC: set of more realistic benchmarks, but oriented to desktops • EEMBC – EDN Embedded Benchmark Consortium, www.eembc.org • Suites of benchmarks: automotive, consumer electronics, networking, office automation, telecommunications

  40. General Purpose Processors Sources: Intel, Motorola, MIPS, ARM, TI, and IBM Website/Datasheet; Embedded Systems Programming, Nov. 1998

  41. Microprocessor Architecture Overview • If you are using a particular microprocessor, now is a good time to review its architecture

  42. Microcontroller catalogue

  43. Microcontroller packaging

More Related