1 / 73

Lecture 10 Y86 – a RISC Architecture

Lecture 10 Y86 – a RISC Architecture. CSCE 212 Computer Architecture. Topics Structures from Lec10 Y86, a RISC architecture. Feb 16, 2012. Overview. Last Time Test 1 Time before - Aggregate Data Array layout in memory Structures New Arrays of Structures Linux Memory Layout

lucia
Download Presentation

Lecture 10 Y86 – a RISC Architecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 10Y86 – a RISC Architecture CSCE 212 Computer Architecture Topics • Structures from Lec10 • Y86, a RISC architecture Feb 16, 2012

  2. Overview Last Time • Test 1 • Time before - Aggregate Data • Array layout in memory • Structures New • Arrays of Structures • Linux Memory Layout • A restricted Intel Architecture Y86 - RISC • HCL • Test 1 Next Time: • New Lab - datalab

  3. What is Computer Architecture? “Computer Architecture is those aspects of the instruction set available to programmers, independent of the hardware on which the instruction set was implemented.” The term computer architecture was first used in 1964 by Gene Amdahl, G. Anne Blaauw, and Frederick Brooks, Jr., the designers of the IBM System/360. Fred Brooks “Mythical Man-Month” Y86 is a hypothetical machine that is a simplification of the IA32 architecture so that we can fully describe its implementation.

  4. Application Program Compiler OS ISA CPU Design Circuit Design Chip Layout Instruction Set Architecture Assembly Language View • Processor state • Registers, memory, … • Instructions • addl, movl, leal, … • How instructions are encoded as bytes Layer of Abstraction • Above: how to program machine • Processor executes instructions in a sequence • Below: what needs to be built • Use variety of tricks to make it run fast • E.g., execute multiple instructions simultaneously

  5. OF ZF SF Y86 Processor State Program registers Condition codes Memory • Program Registers • Same 8 as with IA32. Each 32 bits • Condition Codes • Single-bit flags set by arithmetic or logical instructions • OF: Overflow ZF: Zero SF:Negative • Program Counter • Indicates address of instruction • Memory • Byte-addressable storage array • Words stored in little-endian byte order %eax %esi %ecx %edi PC %edx %esp %ebx %ebp

  6. Y86 Instructions Format • 1--6 bytes of information read from memory • Can determine instruction length from first byte • Not as many instruction types, and simpler encoding than with IA32 • Each accesses and modifies some part(s) of the program state

  7. %eax 0 %esi 6 %ecx 1 %edi 7 %edx 2 %esp 4 %ebx 3 %ebp 5 Encoding Registers Each register has 4-bit ID • Same encoding as in IA32 Register ID 8 indicates “no register” • Will use this in our hardware design in multiple places

  8. Generic Form Encoded Representation addl rA, rB 6 0 rA rB Instruction Example Addition Instruction • Add value in register rA to that in register rB • Store result in register rB • Note that Y86 only allows addition to be applied to register data • Set condition codes based on result • e.g., addl %eax,%esi Encoding: 60 06 • Two-byte encoding • First indicates instruction type • Second gives source and destination registers

  9. Instruction Code Function Code addl rA, rB xorl rA, rB andl rA, rB subl rA, rB 6 6 6 6 2 3 1 0 rA rA rA rA rB rB rB rB Arithmetic and Logical Operations • Refer to generically as “OPl” • Encodings differ only by “function code” • Low-order 4 bytes in first instruction word • Set condition codes as side effect Add Subtract (rA from rB) And Exclusive-Or

  10. rA 8 rA rB rB rB rrmovl rA, rB 4 3 2 5 0 0 0 0 rA rB irmovl V, rB mrmovl D(rB), rA rmmovl rA, D(rB) V D D Move Operations • Like the IA32 movl instruction • Simpler format for memory addresses • Give different names to keep them distinct Register --> Register Immediate --> Register Register --> Memory Memory --> Register

  11. Move Instruction Examples IA32 Y86 Encoding movl $0xabcd, %edx irmovl $0xabcd, %edx 30 82 cd ab 00 00 movl %esp, %ebx rrmovl %esp, %ebx 20 43 movl -12(%ebp),%ecx mrmovl -12(%ebp),%ecx 50 15 f4 ff ff ff movl %esi,0x41c(%esp) rmmovl %esi,0x41c(%esp) 40 64 1c 04 00 00 movl $0xabcd, (%eax) — movl %eax, 12(%eax,%edx) — movl (%ebp,%eax,4),%ecx —

  12. Jump When Equal Jump When Not Equal Jump Unconditionally Jump When Greater Jump When Greater or Equal Jump When Less or Equal Jump When Less jge Dest jl Dest je Dest jmp Dest jg Dest jne Dest jle Dest Dest Dest Dest Dest Dest Dest Dest 7 7 7 7 7 7 7 0 4 3 2 1 5 6 Jump Instructions • Refer to generically as “jXX” • Encodings differ only by “function code” • Based on values of condition codes • Same as IA32 counterparts • Encode full destination address • Unlike PC-relative addressing seen in IA32

  13. Y86 Program Stack Stack “Bottom” • Region of memory holding program data • Used in Y86 (and IA32) for supporting procedure calls • Stack top indicated by %esp • Address of top stack element • Stack grows toward lower addresses • Top element is at highest address in the stack • When pushing, must first decrement stack pointer • When popping, increment stack pointer • • • Increasing Addresses %esp Stack “Top”

  14. pushl rA popl rA a rA b rA 0 8 0 8 Stack Operations • Decrement %esp by 4 • Store word from rA to memory at %esp • Like IA32 • Read word from memory at %esp • Save in rA • Increment %esp by 4 • Like IA32

  15. call Dest Dest ret 8 9 0 0 Subroutine Call and Return • Push address of next instruction onto stack • Start executing instructions at Dest • Like IA32 • Pop value from stack • Use as address for next instruction • Like IA32

  16. nop halt 0 1 0 0 Miscellaneous Instructions • Don’t do anything • Stop executing instructions • IA32 has comparable instruction, but can’t execute it in user mode • We will use it to stop the simulator

  17. a 5043 6125  3 7395 0 Writing Y86 Code Try to Use C Compiler as Much as Possible • Write code in C • Compile for IA32 with gcc -S • Transliterate into Y86 Coding Example • Find number of elements in null-terminated list int len1(int a[]);

  18. First Try Write typical array code Compile with gcc -O2 -S Problem Hard to do array indexing on Y86 Since don’t have scaled addressing modes Y86 Code Generation Example /* Find number of elements in null-terminated list */ int len1(int a[]) { int len; for (len = 0; a[len]; len++) ; return len; } L18: incl %eax cmpl $0,(%edx,%eax,4) jne L18

  19. Second Try Write with pointer code Compile with gcc -O2 -S Result Don’t need to do indexed addressing Y86 Code Generation Example #2 /* Find number of elements in null-terminated list */ int len2(int a[]) { int len = 0; while (*a++) len++; return len; } L24: movl (%edx),%eax incl %ecx L26: addl $4,%edx testl %eax,%eax jne L24

  20. IA32 Code Setup Y86 Code Setup Y86 Code Generation Example #3 len2: pushl %ebp # Save %ebp xorl %ecx,%ecx # len = 0 rrmovl %esp,%ebp # Set frame mrmovl 8(%ebp),%edx # Get a mrmovl (%edx),%eax # Get *a jmp L26 # Goto entry len2: pushl %ebp xorl %ecx,%ecx movl %esp,%ebp movl 8(%ebp),%edx movl (%edx),%eax jmp L26

  21. IA32 Code Loop + Finish Y86 Code Loop + Finish Y86 Code Generation Example #4 L24: movl (%edx),%eax incl %ecx L26: addl $4,%edx testl %eax,%eax jne L24 movl %ebp,%esp movl %ecx,%eax popl %ebp ret L24: mrmovl (%edx),%eax # Get *a irmovl $1,%esi addl %esi,%ecx # len++ L26: # Entry: irmovl $4,%esi addl %esi,%edx # a++ andl %eax,%eax # *a == 0? jne L24 # No--Loop rrmovl %ebp,%esp # Pop rrmovl %ecx,%eax # Rtn len popl %ebp ret

  22. Y86 Program Structure irmovl Stack,%esp # Set up stack rrmovl %esp,%ebp # Set up frame irmovl List,%edx pushl %edx # Push argument call len2 # Call Function halt # Halt .align 4 List: # List of elements .long 5043 .long 6125 .long 7395 .long 0 # Function len2: . . . # Allocate space for stack .pos 0x100 Stack: • Program starts at address 0 • Must set up stack • Make sure don’t overwrite code! • Must initialize data • Can use symbolic names

  23. Assembling Y86 Program • Generates “object code” file eg.yo • Actually looks like disassembler output • unix> yas eg.ys • 0x000: 308400010000 | irmovl Stack,%esp # Set up stack • 0x006: 2045 | rrmovl %esp,%ebp # Set up frame • 0x008: 308218000000 | irmovl List,%edx • 0x00e: a028 | pushl %edx # Push argument • 0x010: 8028000000 | call len2 # Call Function • 0x015: 10 | halt # Halt • 0x018: | .align 4 • 0x018: | List: # List of elements • 0x018: b3130000 | .long 5043 • 0x01c: ed170000 | .long 6125 • 0x020: e31c0000 | .long 7395 • 0x024: 00000000 | .long 0

  24. Simulating Y86 Program • Instruction set simulator • Computes effect of each instruction on processor state • Prints changes in state from original • unix> yis eg.yo • Stopped in 41 steps at PC = 0x16. Exception 'HLT', CC Z=1 S=0 O=0 • Changes to registers: • %eax: 0x00000000 0x00000003 • %ecx: 0x00000000 0x00000003 • %edx: 0x00000000 0x00000028 • %esp: 0x00000000 0x000000fc • %ebp: 0x00000000 0x00000100 • %esi: 0x00000000 0x00000004 • Changes to memory: • 0x00f4: 0x00000000 0x00000100 • 0x00f8: 0x00000000 0x00000015 • 0x00fc: 0x00000000 0x00000018

  25. CISC Instruction Sets • Complex Instruction Set Computer • Dominant style through mid-80’s Stack-oriented instruction set • Use stack to pass arguments, save program counter • Explicit push and pop instructions Arithmetic instructions can access memory • addl %eax, 12(%ebx,%ecx,4) • requires memory read and write • Complex address calculation Condition codes • Set as side effect of arithmetic and logical instructions Philosophy • Add instructions to perform “typical” programming tasks

  26. RISC Instruction Sets • Reduced Instruction Set Computer • Internal project at IBM, later popularized by Hennessy (Stanford) and Patterson (Berkeley) Fewer, simpler instructions • Might take more to get given task done • Can execute them with small and fast hardware Register-oriented instruction set • Many more (typically 32) registers • Use for arguments, return pointer, temporaries Only load and store instructions can access memory • Similar to Y86 mrmovl and rmmovl No Condition codes • Test instructions return 0/1 in register

  27. MIN3 C Min3 B A s1 s0 MUX4 D0 D1 Out4 D2 D3 HCL Word-Level Examples Minimum of 3 Words • Find minimum of three input words • HCL case expression • Final case guarantees match int Min3 = [ A < B && A < C : A; B < A && B < C : B; 1 : C; ]; 4-Way Multiplexor • Select one of 4 inputs based on two control bits • HCL case expression • Simplify tests by assuming sequential matching int Out4 = [ !s1&&!s0: D0; !s1 : D1; !s0 : D2; 1 : D3; ];

  28. MIN3 C Min3 B A s1 s0 MUX4 D0 D1 Out4 D2 D3 HCL Word-Level Examples Minimum of 3 Words • Find minimum of three input words • HCL case expression • Final case guarantees match int Min3 = [ A < B && A < C : A; B < A && B < C : B; 1 : C; ]; 4-Way Multiplexor • Select one of 4 inputs based on two control bits • HCL case expression • Simplify tests by assuming sequential matching int Out4 = [ !s1&&!s0: D0; !s1 : D1; !s0 : D2; 1 : D3; ];

  29. Hardware Control Language • Very simple hardware description language • Can only express limited aspects of hardware operation • Parts we want to explore and modify Data Types • bool: Boolean • a, b, c, … • int: words • A, B, C, … • Does not specify word size---bytes, 32-bit words, … Statements • bool a = bool-expr ; • int A = int-expr ;

  30. HCL Operations • Classify by type of value returned Boolean Expressions • Logic Operations • a && b, a || b, !a • Word Comparisons • A == B, A != B, A < B, A <= B, A >= B, A > B • Set Membership • A in { B, C, D } • Same as A == B || A == C || A == D Word Expressions • Case expressions • [ a : A; b : B; c : C ] • Evaluate test expressions a, b, c, … in sequence • Return word expression A, B, C, … for first successful test

  31. Byte 0 1 2 3 4 5 nop 0 0 addl 6 0 halt 1 0 subl 6 1 rrmovl rA, rB 2 0 rA rB andl 6 2 irmovl V, rB 3 0 8 rB V xorl 6 3 rmmovl rA, D(rB) 4 0 rA rB D jmp 7 0 mrmovl D(rB), rA 5 0 rA rB D jle 7 1 OPl rA, rB 6 fn rA rB jl 7 2 jXX Dest 7 fn Dest je 7 3 call Dest 8 0 Dest jne 7 4 ret 9 0 jge 7 5 pushl rA A 0 rA 8 jg 7 6 popl rA B 0 rA 8 Y86 Instruction Set

  32. valA Register file srcA A valW W dstW valB srcB B Clock fun A MUX 0 A L U = B 1 Clock Building Blocks Combinational Logic • Compute Boolean functions of inputs • Continuously respond to input changes • Operate on data and implement control Storage Elements • Store bits • Addressable memories • Non-addressable registers • Loaded only as clock rises

  33. Hardware Control Language • Very simple hardware description language • Can only express limited aspects of hardware operation • Parts we want to explore and modify Data Types • bool: Boolean • a, b, c, … • int: words • A, B, C, … • Does not specify word size---bytes, 32-bit words, … Statements • bool a = bool-expr ; • int A = int-expr ;

  34. HCL Operations • Classify by type of value returned Boolean Expressions • Logic Operations • a && b, a || b, !a • Word Comparisons • A == B, A != B, A < B, A <= B, A >= B, A > B • Set Membership • A in { B, C, D } • Same as A == B || A == C || A == D Word Expressions • Case expressions • [ a : A; b : B; c : C ] • Evaluate test expressions a, b, c, … in sequence • Return word expression A, B, C, … for first successful test

  35. newPC SEQ Hardware Structure PC valE , valM Write back valM State • Program counter register (PC) • Condition code register (CC) • Register File • Memories • Access same memory space • Data: for reading/writing program data • Instruction: for reading instructions Instruction Flow • Read instruction at address specified by PC • Process through stages • Update program counter Data Data Memory memory memory Addr , Data valE CC CC ALU ALU Execute Bch aluA , aluB valA , valB srcA , srcB Decode A A B B dstA , dstB M M Register Register Register Register file file file file E E icode , ifun valP rA , rB valC Instruction PC Instruction PC memory increment Fetch memory increment PC

  36. newPC SEQ Stages PC valE , valM Write back valM Fetch • Read instruction from instruction memory Decode • Read program registers Execute • Compute value or address Memory • Read or write data Write Back • Write program registers PC • Update program counter Data Data Memory memory memory Addr , Data valE CC CC ALU ALU Execute Bch aluA , aluB valA , valB srcA , srcB Decode A A B B dstA , dstB M M Register Register Register Register file file file file E E icode , ifun valP rA , rB valC Instruction PC Instruction PC memory increment Fetch memory increment PC

  37. Optional Optional D icode 5 0 rA rB ifun rA rB valC Instruction Decoding Instruction Format • Instruction byte icode:ifun • Optional register byte rA:rB • Optional constant word valC

  38. Fetch Read 2 bytes Decode Read operand registers Execute Perform operation Set condition codes Memory Do nothing Write back Update register PC Update Increment PC by 2 OPl rA, rB 6 fn rA rB Executing Arith./Logical Operation

  39. Fetch icode:ifun  M1[PC] Read instruction byte rA:rB  M1[PC+1] Read register byte valP  PC+2 Compute next PC Decode valA  R[rA] Read operand A valB  R[rB] Read operand B Execute valE  valB OP valA Perform ALU operation Set CC Set condition code register Memory Write back R[rB]  valE Write back result PC update PC  valP Update PC Stage Computation: Arith/Log. Ops OPl rA, rB • Formulate instruction execution as sequence of simple steps • Use same general form for all instructions

  40. Fetch Read 6 bytes Decode Read operand registers Execute Compute effective address Memory Write to memory Write back Do nothing PC Update Increment PC by 6 rA rB 4 0 rmmovl rA, D(rB) D Executing rmmovl

  41. Fetch icode:ifun  M1[PC] Read instruction byte rA:rB  M1[PC+1] Read register byte valC  M4[PC+2] Read displacement D valP  PC+6 Compute next PC Decode valA  R[rA] Read operand A valB  R[rB] Read operand B Execute valE  valB + valC Compute effective address Memory M4[valE]  valA Write value to memory Write back PC update PC  valP Update PC Stage Computation: rmmovl rmmovl rA, D(rB) • Use ALU for address computation

  42. Fetch Read 2 bytes Decode Read stack pointer Execute Increment stack pointer by 4 Memory Read from old stack pointer Write back Update stack pointer Write result to register PC Update Increment PC by 2 popl rA b rA 0 8 Executing popl

  43. Fetch icode:ifun  M1[PC] Read instruction byte rA:rB  M1[PC+1] Read register byte valP  PC+2 Compute next PC Decode valA  R[%esp] Read stack pointer valB  R [%esp] Read stack pointer Execute valE  valB + 4 Increment stack pointer Memory valM  M4[valA] Read from stack Write back R[%esp]  valE Update stack pointer R[rA]  valM Write back result PC update PC  valP Update PC Stage Computation: popl popl rA • Use ALU to increment stack pointer • Must update two registers • Popped value • New stack pointer

  44. Fetch Read 5 bytes Increment PC by 5 Decode Do nothing Execute Determine whether to take branch based on jump condition and condition codes Memory Do nothing Write back Do nothing PC Update Set PC to Dest if branch taken or to incremented PC if not branch jXX Dest Dest Not taken fall thru: target: Taken 7 XX XX fn XX XX Executing Jumps

  45. Fetch icode:ifun  M1[PC] Read instruction byte valC  M4[PC+1] Read destination address valP  PC+5 Fall through address Decode Execute Bch  Cond(CC,ifun) Take branch? Memory Write back PC update PC  Bch ? valC : valP Update PC Stage Computation: Jumps jXX Dest • Compute both addresses • Choose based on setting of condition codes and branch condition

  46. Fetch Read 5 bytes Increment PC by 5 Decode Read stack pointer Execute Decrement stack pointer by 4 Memory Write incremented PC to new value of stack pointer Write back Update stack pointer PC Update Set PC to Dest call Dest Dest return: target: 8 XX XX 0 XX XX Executing call

  47. Fetch icode:ifun  M1[PC] Read instruction byte valC  M4[PC+1] Read destination address valP  PC+5 Compute return point Decode valB  R[%esp] Read stack pointer Execute valE  valB + –4 Decrement stack pointer Memory M4[valE]  valP Write return value on stack Write back R[%esp]  valE Update stack pointer PC update PC  valC Set PC to destination Stage Computation: call call Dest • Use ALU to decrement stack pointer • Store incremented PC

  48. Fetch Read 1 byte Decode Read stack pointer Execute Increment stack pointer by 4 Memory Read return address from old stack pointer Write back Update stack pointer PC Update Set PC to return address 9 XX 0 XX Executing ret ret return:

  49. Fetch icode:ifun  M1[PC] Read instruction byte Decode valA  R[%esp] Read operand stack pointer valB  R[%esp] Read operand stack pointer Execute valE  valB + 4 Increment stack pointer Memory valM  M4[valA] Read return address Write back R[%esp]  valE Update stack pointer PC update PC  valM Set PC to return address Stage Computation: ret ret • Use ALU to increment stack pointer • Read return address from memory

  50. Computation Steps OPl rA, rB • All instructions follow same general pattern • Differ in what gets computed on each step Fetch icode,ifun icode:ifun  M1[PC] Read instruction byte rA,rB rA:rB  M1[PC+1] Read register byte valC [Read constant word] valP valP  PC+2 Compute next PC Decode valA, srcA valA  R[rA] Read operand A valB, srcB valB  R[rB] Read operand B Execute valE valE  valB OP valA Perform ALU operation Cond code Set CC Set condition code register Memory valM [Memory read/write] Write back dstE R[rB]  valE Write back ALU result dstM [Write back memory result] PC update PC PC  valP Update PC

More Related