1 / 45

Architecture (I)

Processor Architecture. Architecture (I). Goal. Understand basic computer organization Instruction set architecture Deeply explore the CPU working mechanism How the instruction is executed: sequential and pipeline version Help you programming

leah-joyner
Download Presentation

Architecture (I)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Processor Architecture Architecture (I)

  2. Goal • Understand basic computer organization • Instruction set architecture • Deeply explore the CPU working mechanism • How the instruction is executed: sequential and pipeline version • Help you programming • Fully understand how computer is organized and works will help you write more stable and efficient code.

  3. CPU Design (Why?) • It is interesting. • Aid in understanding how the overall computer system works. • Many design hardware systems containing processors. • Maybe you will work on a processor design.

  4. CPU Design • Instruction set architecture • Logic design • Sequential implementation • Pipelining and initial pipelined implementation • Making the pipeline work • Modern processor design

  5. Suggested Reading • - Chap 4.1, 4.2

  6. Instruction Set Architecture #1 • What is it ? • Assemble Language Abstraction • Machine Language Abstraction • What does it provide? • An abstraction of the real computer, hide the details of implementation • The syntax of computer instructions • The semantics of instructions • The execution model • Programmer-visible computer status Instruction Set Architecture (ISA)

  7. Application Program Compiler OS ISA CPU Design Circuit Design Chip Layout Instruction Set Architecture #2 • Assembly Language View • Processor state • Registers, memory, … • Instructions • addl, movl, leal, … • How instructions are encoded as bytes • Layer of Abstraction • Above: how to program machine • Processor executes instructions in a sequence • Below: what needs to be built • Use tricks to make it run fast • E.g., execute multiple instructions simultaneously

  8. Instruction Set Architecture #3 • ISA define the processor family • Two main kind: RISC and CISC • RISC:SPARC, MIPS, PowerPC • CISC:X86 (or called IA32) • Another divide: Superscalar, VLIW and EPIC • Superscalar: all the above • VLIW: Philips TriMedia • EPIC: IA64 • Under same ISA, there are many different processors • From different manufacturers: • X86 from Intel and AMD and VIA • Different models • 8086, 80386, Pentium, Pentium 4

  9. OF ZF SF Y86 Processor State P258 Figure 4.1 Program registers Condition codes Memory • Program Registers • Same 8 as with IA32. Each 32 bits • Condition Codes • Single-bit flags set by arithmetic or logical instructions • OF: Overflow ZF: Zero SF:Negative • Program Counter • Indicates address of instruction • Memory • Byte-addressable storage array • Words stored in little-endian byte order %eax %esi %ecx %edi PC %edx %esp %ebx %ebp

  10. Y86 Instructions P259 Figure 4.2 • Format (P259) • 1--6 bytes of information read from memory • Can determine instruction length from first byte • Not as many instruction types, and simpler encoding than with IA32 • Each accesses and modifies some part(s) of the program state • Errata: JXX and call are 5 bytes long.

  11. %eax 0 %esi 6 %ecx 1 %edi 7 %edx 2 %esp 4 %ebx 3 %ebp 5 Encoding Registers P261 Figure 4.4 • Each register has 4-bit ID • Same encoding as in IA32, but IA32 using only 3-bit ID • Register ID 8 indicates “no register” • Will use this in our hardware design in multiple places

  12. Generic Form Encoded Representation addl rA, rB 6 0 rA rB Instruction Example P261 Figure 4.3 • Addition Instruction • Add value in register rA to that in register rB • Store result in register rB • Note that Y86 only allows addition to be applied to register data • Set condition codes based on result • e.g., addl %eax,%esi Encoding: 60 06 • Two-byte encoding • First indicates instruction type • Second gives source and destination registers

  13. Instruction Code Function Code addl rA, rB xorl rA, rB andl rA, rB subl rA, rB 6 6 6 6 2 3 1 0 rA rA rA rA rB rB rB rB Arithmetic and Logical Operations • Refer to generically as “OPl” • Encodings differ only by “function code” • Low-order 4 bytes in first instruction word • Set condition codes as side effect • Notice: no multiply or divide operation Add Subtract (rA from rB) And Exclusive-Or

  14. rA rA 8 rB rB rB rrmovl rA, rB 5 3 2 4 0 0 0 0 rA rB mrmovl D(rB), rA rmmovl rA, D(rB) irmovl V, rB V D D Move Operations P259 Figure 4.2 • Like the IA32 movl instruction • Simpler format for memory addresses • Give different names to keep them distinct Register --> Register Immediate --> Register Register --> Memory Memory --> Register Distinct: 清楚的

  15. Move Instruction Examples IA32 Y86 Encoding movl $0xabcd, %edx irmovl $0xabcd, %edx 30 82 cd ab 00 00 movl %esp, %ebx rrmovl %esp, %ebx 20 43 movl -12(%ebp),%ecx mrmovl -12(%ebp),%ecx 50 15 f4 ff ff ff movl %esi,0x41c(%esp) rmmovl %esi,0x41c(%esp) 40 64 1c 04 00 00 movl $0xabcd, (%eax) — movl %eax, 12(%eax,%edx) — movl (%ebp,%eax,4),%ecx —

  16. Jump When Greater Jump When Equal Jump When Not Equal Jump When Less Jump Unconditionally Jump When Greater or Equal Jump When Less or Equal jmp Dest jg Dest jle Dest jge Dest jl Dest je Dest jne Dest Dest Dest Dest Dest Dest Dest Dest 7 7 7 7 7 7 7 6 4 2 1 5 0 3 Jump Instructions P259 Figure 4.2 • Refer to generically as “jXX” • Encodings differ only by “function code” • Based on values of condition codes • Same as IA32 counterparts • Encode full destination address • Unlike PC-relative addressing seen in IA32

  17. Y86 Program Stack Stack “Bottom” • Region of memory holding program data • Used in Y86 (and IA32) for supporting procedure calls • Stack top indicated by %esp • Address of top stack element • Stack grows toward lower addresses • Top element is at lowest address in the stack • When pushing, must first decrement stack pointer • When popping, increment stack pointer • • • Increasing Addresses %esp Stack “Top”

  18. pushl rA popl rA a rA b rA 0 8 0 8 Stack Operations P259 Figure 4.2 • Decrement %esp by 4 • Store word from rA to memory at %esp • Like IA32 • Read word from memory at %esp • Save in rA • Increment %esp by 4 • Like IA32

  19. call Dest Dest ret 8 9 0 0 Subroutine Call and Return P259 Figure 4.2 • Push address of next instruction onto stack • Start executing instructions at Dest • Like IA32 • Pop value from stack • Use as address for next instruction • Like IA32

  20. nop halt 0 1 0 0 Miscellaneous Instructions P259 Figure 4.2 • Don’t do anything • Stop executing instructions • IA32 has comparable instruction, but can’t execute it in user mode • We will use it to stop the simulator

  21. Y86 Program Structure irmovl Stack,%esp # Set up stack rrmovl %esp,%ebp # Set up frame irmovl List,%edx pushl %edx # Push argument call len2 # Call Function halt # Halt .align 4 List: # List of elements .long 5043 .long 6125 .long 7395 .long 0 # Function len2: . . . # Allocate space for stack .pos 0x100 Stack: • Program starts at address 0 • Must set up stack • Make sure don’t overwrite code! • Must initialize data • Can use symbolic names

  22. Assembling Y86 Program • Generates “object code” file eg.yo • Actually looks like disassembler output • unix> yas eg.ys • 0x000: 308400010000 | irmovl Stack,%esp # Set up stack • 0x006: 2045 | rrmovl %esp,%ebp # Set up frame • 0x008: 308218000000 | irmovl List,%edx • 0x00e: a028 | pushl %edx # Push argument • 0x010: 8028000000 | call len2 # Call Function • 0x015: 10 | halt # Halt • 0x018: | .align 4 • 0x018: | List: # List of elements • 0x018: b3130000 | .long 5043 • 0x01c: ed170000 | .long 6125 • 0x020: e31c0000 | .long 7395 • 0x024: 00000000 | .long 0

  23. Simulating Y86 Program • Instruction set simulator • Computes effect of each instruction on processor state • Prints changes in state from original • unix> yis eg.yo • Stopped in 41 steps at PC = 0x16. Exception 'HLT', CC Z=1 S=0 O=0 • Changes to registers: • %eax: 0x00000000 0x00000003 • %ecx: 0x00000000 0x00000003 • %edx: 0x00000000 0x00000028 • %esp: 0x00000000 0x000000fc • %ebp: 0x00000000 0x00000100 • %esi: 0x00000000 0x00000004 • Changes to memory: • 0x00f4: 0x00000000 0x00000100 • 0x00f8: 0x00000000 0x00000015 • 0x00fc: 0x00000000 0x00000018

  24. Summary • Y86 Instruction Set Architecture • Similar state and instructions as IA32 • Simpler encodings • Somewhere between CISC and RISC • How Important is ISA Design? • Less now than before • With enough hardware, can make almost anything go fast • Intel is moving away from IA32 • Does not allow enough parallel execution • Introduced IA64 • 64-bit word sizes (overcome address space limitations) • Radically different style of instruction set with explicit parallelism • Requires sophisticated compilers Radically:根本上

  25. Logic Design • Digital circuit • What is digital circuit? • Know what a CPU will base on? • Hardware Control Language (HCL) • A simple and functional language to describe our CPU implementation • Syntax like C

  26. Category of Circuit • Analog Circuit • Use all the range of Signal • Most part is amplifier • Hard to model and automatic design • Use transistor and capacitance as basis • We will not discuss it here • Digital Circuit • Has only two values, 0 and 1 • Easy to model and design • Use true table and other tools to analyze • Use gate as the basis • The voltage of 1 is differ in different kind circuit. • E.g. TTL circuit using 5 voltage as 1 Amplifier: 放大器 Transistor:晶体管 Capacitance: 电容

  27. 0 1 0 Voltage Time Digital Signals • Use voltage thresholds to extract discrete values from continuous signal • Simplest version: 1-bit signal • Either high range (1) or low range (0) • With guard range between them • Not strongly affected by noise or low quality circuit elements • Can make circuits simple, small, and fast

  28. Overview of Logic Design • Fundamental Hardware Requirements • Communication • How to get values from one place to another • Computation • Storage (Memory) • Clock Signal • Bits are Our Friends • Everything expressed in terms of values 0 and 1 • Communication • Low or high voltage on wire • Computation • Compute Boolean functions • Storage • Clock Signal

  29. Category of Digital Circuit • Combinational Circuit • Without memory. So the circuit can’t have state. Any same input will get the same output at any time. • Needn’t clock signal • Typical application: ALU • Sequential Circuit • = Combinational circuit + memory and clock signal • Have state. Two same inputs may not generate the same output. • Use clock signal to control the run of circuit. • Typical application: CPU

  30. a && b Computing with Logic Gates P272 Figure 4.8 • Outputs are Boolean functions of inputs • Not an assignment operation, just give the circuit a name • Respond continuously to changes in inputs • With some, small delay Falling Delay Rising Delay b Voltage a Time

  31. Acyclic Network Inputs Outputs Combinational Circuits • Acyclic Network of Logic Gates • Continuously responds to changes on inputs • Outputs become (after some delay) Boolean functions of inputs

  32. Bit equal a eq b Bit Equality P272 Figure 4.9 • Generate 1 if a and b are equal • Hardware Control Language (HCL) • Very simple hardware description language • Boolean operations have syntax similar to C logical operations • We’ll use it to describe control logic for processors HCL Expression bool eq = (a&&b)||(!a&&!b)

  33. Bit-Level Multiplexor P273 Figure 4.10 • Control signal s • Data signals a and b • Output a when s=1, b when s=0 • Its name: MUX • Usage: Select one signal from a couple of signals s Bit MUX HCL Expression bool out = (s&&a)||(!s&&b) b out a

  34. b31 Bit equal eq31 a31 b30 Bit equal eq30 a30 Eq B = Eq b1 Bit equal eq1 A a1 b0 Bit equal eq0 a0 Word Equality P274 Figure 4.11 Word-Level Representation • 32-bit word size • HCL representation • Equality operation • Generates Boolean value HCL Representation bool Eq = (A == B)

  35. s b31 out31 a31 s b30 out30 MUX B Out a30 A b0 out0 a0 Word Multiplexor P275 Figure 4.12 Word-Level Representation • Select input word A or B depending on control signal s • HCL representation • Case expression • Series of test : value pairs • Output value for first successful test HCL Representation int Out = [ s : A; 1 : B; ];

  36. MIN3 C Min3 B A s1 s0 MUX4 D0 D1 Out4 D2 D3 HCL Word-Level Examples P277, P266 Minimum of 3 Words • Find minimum of three input words • HCL case expression • Final case guarantees match int Min3 = [ A < B && A < C : A; B < A && B < C : B; 1 : C; ]; 4-Way Multiplexor • Select one of 4 inputs based on two control bits • HCL case expression • Simplify tests by assuming sequential matching int Out4 = [ !s1&&!s0: D0; !s1 : D1; !s0 : D2; 1 : D3; ];

  37. 2 1 0 3 Y Y Y Y A A A A X ^ Y X & Y X + Y X - Y X X X X B B B B OF OF OF OF ZF ZF ZF ZF SF SF SF SF A L U A L U A L U A L U Arithmetic Logic Unit P278 Figure 4.13 • Combinational logic • Continuously responding to inputs • Control signal selects function computed • Corresponding to 4 arithmetic/logical operations in Y86 • Also computes values for condition codes • We will use it as a basic component for our CPU

  38. I O Clock Storage • Registers • Hold single words or bits • Loaded as clock rises • Not onlyprogram registers • Random-access memories • Hold multiple words • E.g. register file, memory • Possible multiple read or write ports • Read word when address input changes • Write word as clock rises

  39. Rising clock State = y y Output = y   Register Operation P279 Figure 4.14 • Stores data bits • For most of time acts as barrier between input and output • As clock rises, loads input State = x x Input = y Output = x Barrier: 屏障

  40. Comb. Logic 0 MUX 0 Out In 1 Load Clock Clock Load A L U x0 x1 x2 x3 x4 x5 In x0 x0+x1 x0+x1+x2 x3 x3+x4 x3+x4+x5 Out State Machine Example • Accumulator circuit • Load or accumulate on each cycle

  41. valA Register file srcA A valW Read ports W dstW Write port valB srcB B Clock Random-Access Memory P280 • Stores multiple words of memory • Address input specifies which word to read or write • Register file • Holds values of program registers • %eax, %esp, etc. • Register identifier serves as address • ID 8 implies no read or write performed • Multiple Ports • Can read and/or write multiple words in one cycle • Each has separate address and data input/output

  42. Register file Register file y 2 valW valW W W dstW dstW x valA Register file srcA A 2 Clock Clock valB srcB B x 2 Rising clock y   2 Register File Timing • Reading • Like combinational logic • Output data generated based on input address • After some delay • Writing • Like register • Update only as clock rises x 2

  43. Hardware Control Language • Very simple hardware description language • Can only express limited aspects of hardware operation • Parts we want to explore and modify • Data Types • bool: Boolean • a, b, c, … • int: words • A, B, C, … • Does not specify word size---bytes, 32-bit words, … • Statements • bool a = bool-expr ; • int A = int-expr ;

  44. HCL Operations • Classify by type of value returned • Boolean Expressions • Logic Operations • a && b, a || b, !a • Word Comparisons • A == B, A != B, A < B, A <= B, A >= B, A > B • Set Membership • A in { B, C, D } • Same as A == B || A == C || A == D • Word Expressions • Case expressions • [ a : A; b : B; c : C ] • Evaluate test expressions a, b, c, … in sequence • Return word expression A, B, C, … for first successful test

  45. Summary • Computation • Performed by combinational logic • Computes Boolean functions • Continuously reacts to input changes • Storage • Registers • Hold single words • Loaded as clock rises • Random-access memories • Hold multiple words • Possible multiple read or write ports • Read word when address input changes • Write word as clock rises

More Related