1 / 70

RISC Processors

RISC Processors. Chapter 14 S. Dandamudi. Introduction Evolution of CISC processors RISC design principles PowerPC processor Architecture Addressing modes Instruction set. Itanium processor Architecture Addressing modes Instruction set Instruction-level parallelism Branch handling

roycampbell
Download Presentation

RISC Processors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. RISC Processors Chapter 14 S. Dandamudi

  2. Introduction Evolution of CISC processors RISC design principles PowerPC processor Architecture Addressing modes Instruction set Itanium processor Architecture Addressing modes Instruction set Instruction-level parallelism Branch handling Speculative execution Outline S. Dandamudi

  3. Introduction • CISC • Complex instruction set • Pentium is the most popular example • RISC • Simple instructions • Reduced complexity • Modern processors use this design philosophy • PowerPC, MIPS, SPARC, Intel Itanium • Borrow some features from CISC • No precise definition • We can identify some common characteristics S. Dandamudi

  4. Evolution of CISC Designs • Motivation to efficiently use expensive resources • Processor • Memory • High density code • Complex instructions • Hardware complexity is handled by microprogramming • Microprogramming is also helpful to • Reduce the impact of memory access latency • Offers flexibility • Low-cost members of the same family • Tailored to high-level language constructs S. Dandamudi

  5. Evolution of CISC Designs (cont’d) S. Dandamudi

  6. Evolution of CISC Designs (cont’d) Example • Autoincrement addressing mode of VAX • Performs the following actions: (R2) = (R2) + R3; R2 = R2 + 1 • RISC equivalent R4 = (R2) R4 = R4 + R3 (R2) = R4 R2 = R2 + 1 S. Dandamudi

  7. Why RISC? • Simple instructions are preferred • Complex instructions are mostly ignored by compilers • Due to semantic gap • Simple data structures • Complex data structures are used relatively infrequently • Better to support a few simple data types efficiently • Synthesize complex ones • Simple addressing modes • Complex addressing modes lead to variable length instructions • Lead to inefficient instruction decoding and scheduling S. Dandamudi

  8. Why RISC? (cont’d) • Large register set • Efficient support for procedure calls and returns • Patterson and Sequin’s study • Procedure call/return: 12-15% of HLL statements • Constitute 31-33% of machine language instructions • Generate nearly half (45%) of memory references • Small activation record • Tanenbaum’s study • Only 1.25% of the calls have more than 6 arguments • More than 93% have less than 6 local scalar variables • Large register set can avoid memory references S. Dandamudi

  9. RISC Design Principles • Simple operations • Simple instructions that can execute in one cycle • Register-to-register operations • Only load and store operations access memory • Rest of the operations on a register-to-register basis • Simple addressing modes • A few addressing modes (1 or 2) • Large number of registers • Needed to support register-to-register operations • Minimize the procedure call and return overhead S. Dandamudi

  10. RISC Design Principles (cont’d) Register windows storing activation records S. Dandamudi

  11. RISC Design Principles (cont’d) • Fixed-length instructions • Facilitates efficient instruction execution • Simple instruction format • Fixed boundaries for various fields • opcode, source operands,… • Other features • Tend to use Harvard architecture • Pipelining is visible at the architecture level S. Dandamudi

  12. PowerPC • Registers • 32 general-purpose registers (GPR0 – GPR31) • 32 floating-point registers (FPR0 – FPR31) • Condition register (CR) • Similar to Pentium’s flags register • Divided into 8 CR fields (4 bits each) • “less than” (LT), “greater than” (GT), “equal to” (EQ), Overflow (SO) • CR1 is for floating-point exceptions • Other CR fields can be used for integer or FP exceptions • Branch instructions can test a specific CR field bit S. Dandamudi

  13. PowerPC (cont’d) S. Dandamudi

  14. PowerPC (cont’d) • XER register serves two distinct purposes • Bits 0, 1, and 2 are used to capture • Summary overflow (SO), overflow (OV), carry (CA) • OV and CA are similar to Pentium’s overflow and carry • SO, once set, only a special instruction can clear it • Bits 25 to 31 (7 bits) • Specifies the number of bytes to be transferred between memory and registers • Two instructions • Load string word indexed (lswx) • Store string word indexed (stswx) • Can load/store all 32 registers (GPR0-GPR31) S. Dandamudi

  15. PowerPC (cont’d) • Link register (LR) • Used to store the procedure return address • Stores the effective address of the instruction following the procedure call instruction • Procedure calls use the branch instructions • Example: b = branch, bl = procedure call • Count register (CTR) • Maintains loop count value • Similar to Pentium's ECX register • Branch instructions can test the value • 32-bit PowerPC implementations use segmentation like the Pentium S. Dandamudi

  16. PowerPC (cont’d) • Addressing modes • Load/store instructions support three addressing modes • Can use GPRs • Register Indirect • Effective address = contents of rA or 0 • Specifying 0 generates address 0 • Register Indirect with Immediate Index • Effective address = Contents of rA or 0 + imm16 • Register Indirect with Index • Effective address = Contents of rA or 0 + contents of rB S. Dandamudi

  17. PowerPC (cont’d) Instruction format S. Dandamudi

  18. PowerPC (cont’d) • Bits 0-5 • Specify primary opcode • Other fields specify suboperations • Depends on instruction type • AA bit • 1 (use absolute address) • 0 (use relative address) • LK bit • 0 (no link --- branch) • 1 (link --- turns branch into a procedure call) S. Dandamudi

  19. PowerPC Instruction Set • Data Transfer instructions • Byte loads lbz rD,disp(rA) ;Load byte and zero lbzu rD,disp(rA) ;Load byte and zero ;with update • Effective address = contents of rA + disp lbzx rD,rA,rB ;Load byte and zero indexed lbzux rD,rA,rB ;Load byte and zero ;with update indexed • Effective address = contents of rA + contents of rB • Upper three bytes of rD are zeroed • Update versions: rA effective address S. Dandamudi

  20. PowerPC Instruction Set (cont’d) • Similar instructions for halfword and word loads lhz, lhzu, lhzx, lhzxu lwz, lwzu, lwzx, lwzxu • For halfword loads, sign extension is possible lha, lhau, lhax, lhaxu • Multiword load lmw rD,disp(rA) • Loads n consecutive words at EA to registers rD, …, r31 S. Dandamudi

  21. PowerPC Instruction Set (cont’d) • Similar instructions for store stbz, stbzu, stbzx, stbzxu sthz, sthzu, sthzx, sthzxu stwz, stwzu, stwzx, stwzxu • Multiword store stmw rD,disp(rA) • Stores n consecutive words at EA to registers rD, …, r31 S. Dandamudi

  22. PowerPC Instruction Set (cont’d) Arithmetic Instructions • Add instructions add rD,rA,rB ; rD  rA + rB • Status and overflow bits of CR0 and XER are not altered add. rD,rA,rB ; alters LT,GT,EQ,SO of CR0 addo rD,rA,rB ; alters SO,OV of XER addo. rD,rA,rB ; alters LT,GT,EQ,SO of CR0 ; and SO,OV of XER • These four instructions do not alter the CA bit of XER S. Dandamudi

  23. PowerPC Instruction Set (cont’d) • To alter CA bit, use adde rD,rA,rB • To alter the other bits, use adde., addeo, addeo. • Immediate operand version addi rD,rA,Simm16 • We can use addi to implement other instructions li rD,value as addi rD,0,value la rD,disp(rA) as addi rD,rA,disp subi rD,rA,value as addi rD,rA,-value S. Dandamudi

  24. PowerPC Instruction Set (cont’d) • Subtract instructions subf rD,rA,rB; rD  rB - rA • subf = subtract from • Like add, other forms are available subf., subfo, subfo. • Negate instruction neg rD,rA; rD  0 - rA S. Dandamudi

  25. PowerPC Instruction Set (cont’d) • Multiply instructions • Two instructions to get upper and lower 32 bits of the 64-bit result mullw rD,rA,rB ; signed/unsigned multiply • Stores the lower-order 32 bits of the result • Use the following to get the upper 32 bits mulhw rD,rA,rB ; signed mulhwu rD,rA,rB ; unsigned • Immediate form mulli rD,rA,Simm16 • Stores only lower 32 bits of the 48-bit result S. Dandamudi

  26. PowerPC Instruction Set (cont’d) • Divide instructions • Two divide instructions • Signed (divw) divw rD,rA,rB ; rD = rA/rB • Unsigned (divwu) • Both give only quotient • For quotient and remainder, use divw rD,rA,rB ; quotient in rD mullw rX,rD,rB subf rC,rX,rA ; remainder in rC S. Dandamudi

  27. PowerPC Instruction Set (cont’d) • Logical instructions and rD,rS,rB and. rD,rS,rB andi. rD,rS,Uimm16 andis. rD,rS,Uimm16 andc rD,rS,rB andc. rD,rS,rB • andis = left shift uimm16 by four positions before ANDing • andc = complement rB before ANDing • Dot versions update the LT, GT, EQ, SO bits of CR0 • Logical OR also has these six versions • Move register instruction is implemented using OR mr rA,RSis equivalent toor rA,rS,rS • NOP is implemented as ori 0,0,0 S. Dandamudi

  28. PowerPC Instruction Set (cont’d) • Other logical operations • NAND • nand • nand. • NOR • nor • nor. • XOR • xor, xor. • xori, xoris • Equivalence (exclusive-NOR) • eqv • eqv. S. Dandamudi

  29. PowerPC Instruction Set (cont’d) • Shift and Rotate instructions • Shift left slw rA,rS,rB; shift left word • Shift left the word in rS by rB positions and store result in rA • Shifted out bits get zeroes • Also have the dot version slw. • Shift right srw srw. (logical) sraw sraw. (arithmetic) • Rotate left instructions rlwnm rA,rS,rB,MB,ME rotlw rA,rS,rB  rlwnm rA,rS,rB,0,31 S. Dandamudi

  30. PowerPC Instruction Set (cont’d) • Compare instructions • Two versions: • For signed and unsigned • Two formats • Register and immediate • Register compare cmp crfD,rA,rB • Updates LT (rA < rB), GT (rA > rB), EQ, SO bits in the crfD • If crfD is not specified, CR0 is used • Immediate version cmp crfD,rA,Simm16 S. Dandamudi

  31. PowerPC Instruction Set (cont’d) • Branch Instructions • Used for both branch (LK = 0) and procedure calls (LK = 1) • Can use absolute (AA = 1) or relative address (AA = 0) b target (AA=0, LK=0) Branch ba target (AA=1, LK=0) Branch Absolute bl target (AA=0, LK=1) Branch then link bla target (AA=1, LK=1) Branch Absolute then link • The last two are procedure calls • Three types of conditional branches • Direct address • Register indirect • CTR or LR S. Dandamudi

  32. PowerPC Instruction Set (cont’d) • Conditional branch instructions (direct address) bc BO,BI,target (AA=0, LK=0) Branch Conditional bca BO,BI,target (AA=1, LK=0) Branch Conditional Absolute bcl BO,BI,target (AA=0, LK=1) Branch Conditional then link bcla BO,BI,target (AA=1, LK=1) Branch Conditional Absolute then link • BO = branch options (5 bits)  specifies branch condition • BI = branch input (5 bits)  specifies a bit in CR field S. Dandamudi

  33. PowerPC Instruction Set (cont’d) • Nine different branch conditions can be specified • Decrement CTR; branch if CTR  0 AND cond = false • Decrement CTR; branch if CTR = 0 AND cond = false • Decrement CTR; branch if CTR  0 AND cond = true • Decrement CTR; branch if CTR = 0 AND cond = true • Branch if cond = false • Branch if cond = true • Decrement CTR; branch if CTR  0 • Decrement CTR; branch if CTR = 0 • Branch always S. Dandamudi

  34. PowerPC Instruction Set (cont’d) • LR-based branch instructions bclr BO,BI (LK=0) Branch Conditional to Link Register bclrl BO,BI (LK=1) Branch Conditional to Link Register then Link • Target address is taken from LR • Used to return from procedure calls • CTR-based branch instructions bcctr BO,BI (LK=0) bcctrl BO,BI (LK=1) • CTR instead of LR is used to get target S. Dandamudi

  35. Itanium • Intel’s 64-bit processor • RISC based • Based on EPIC design philosophy • Explicit Parallel Instruction Computing • Support for ILP • 3-instruction wide word • Speculative computation • Hides memory latency • Predication • Improves branch handling • Large number of registers • 128 integer and 128 FP • Aids in efficient procedure calls S. Dandamudi

  36. Itanium (cont’d) S. Dandamudi

  37. Itanium (cont’d) • Registers • 128 general purpose register (gr0 – gr127) • 64-bit wide • NaT (Not-a-Thing) bit • Used in speculative loading • Divided into static and stacked • Static • First 32 registers (gr0 – gr31) • gr0is read-only (always provides zero) • Stacked • Available for programs • Used as register stack frame S. Dandamudi

  38. Itanium (cont’d) • Registers • Branch registers • 8 in total (br0 – br7) • 64-bit wide • Specify target address for • Conditional branches • Procedure calls • Return • User mask register • Alignment, byte ordering, … • Other registers • Predicate register, Application registers, Current frame marker S. Dandamudi

  39. Itanium (cont’d) • Addressing modes • Load/store instructions can access memory • Specify three registers: r1, r2, r3 • r32 and r3 are used to compute effective address • r1 receives/supplies data • Register indirect addressing • Effective address = contents of r3 • Register indirect with immediate addressing • Effective address = contents of r3 + imm9 • r3 = Effective address • Register indirect with index addressing • Effective address = contents of r3 + contents of r2 • r3 = Effective address S. Dandamudi

  40. Itanium (cont’d) • Instruction Format [(qp)] mnemonic[.comp] dests = srcs • qp = qualifying predicate • Specifies a predicate register • 64 1-bit registers • Executed if the specified PR is 1 • Otherwise, instruction is treated as NOP • mnemonic • Identifies an instruction (e.g., compare) • comp • Gives more information to completely specify instruction • E.g., Type of comparison is equality S. Dandamudi

  41. Itanium (cont’d) S. Dandamudi

  42. Itanium (cont’d) S. Dandamudi

  43. Itanium (cont’d) • Examples add r1 = r2,r3 Predicate instruction (p4) add r1 = r2,r3 add r1 = r2,r3,1 Compare instructions cmp.eq p3 = r2,r4 cmp.gt p2,p3 = r3,r4 Branch instruction br.cloop.sptk loop_back S. Dandamudi

  44. Instruction-level Parallelism • Itanium provides • Runtime support for explicit parallelism • Compiler/assembler can indicate parallelism • Instruction groups • Large number of registers • Instruction groups • Set of instructions that do not have conflicting dependencies • Can be executed in parallel • Compiler/assembler can indicate this by ;; notation S. Dandamudi

  45. Instruction-level Parallelism • Example: Logical expression with four terms if (r10 || r11 || r12 || r13) { /* if-block code */ } can be done using or-tree evaluation or r1 = r10,r11 /* Group 1 */ or r2 = r12,r13 ;; or r3 = r1,r2 /* Group 2 */ Other instructions /* Group 3 */ • Processor can execute as many instructions from group as it can • Depends on the available resources S. Dandamudi

  46. Itanium Instruction Bundle • Each instruction is encoded using 41 bits • Three instructions are bundled together • 128-bit Instruction bundle • No conflicting dependencies among the three instructions • Aids in instruction–level parallelism • 5-bit template • Specifies mapping of instruction slots to execution instruction types • Six instruction types • Integer ALU, non-ALU integer, memory, branch, FP, extended S. Dandamudi

  47. Itanium Instructions • Data transfer instructions • Load and store instructions are more complicated than a typical RISC processor • Load instructions (qp) ldSZ.ldtype.ldhint r1=[r3] (qp) ldSZ.ldtype.ldhint r1=[r3],r2 (qp) ldSZ.ldtype.ldhint r1=[r3],imm9 • Loads SZ bytes from memory • SZ can be 1, 2, 4, or 8 to load 1, 2, 4, or 8 bytes • Example: ld8 r5 = [r6] Locality of memory access Special load operations: advanced, speculative S. Dandamudi

  48. Itanium Instructions (cont’d) • ldtype • This completer can be used to specify special load operations • Advanced ld8.a r5 = [r6] • Speculative ld8.s r5 = [r6] • ldhint • Locality of memory access None – Temporal locality, level 1 nt 1 – No temporal locality, level 1 nt a – No temporal locality, all levels S. Dandamudi

  49. Itanium Instructions (cont’d) • Store instructions • Simpler than load instructions (qp) stSZ.sttype.sthint r1=[r3] (qp) stSZ.sttype.sthint r1=[r3],imm9 • Move instructions (qp) mov r1 = r3 (qp) mov r1 = imm2 (qp) mov r1 = imm64 • First two are pseudo-instructions • Implemented using other processor instructions S. Dandamudi

  50. Itanium Instructions (cont’d) • Arithmetic instructions • Simpler than load instructions (qp) add r1 = r2,r3 (qp) add r1 = r2,r3,1 (qp) add r1 = imm,r4 • Move instruction (qp) mov r1 = r3 implemented as (qp) add r1 = 0,r3 • Move instruction (qp) mov r1 = imm22 implemented as (qp) add r1 = imm22,r0 can be imm14 or imm22 S. Dandamudi

More Related