Chapter 2
This presentation is the property of its rightful owner.
Sponsored Links
1 / 55

Chapter 2 PowerPoint PPT Presentation


  • 94 Views
  • Uploaded on
  • Presentation posted in: General

CprE 381 Computer Organization and Assembly Level Programming, Fall 2013. Chapter 2. Instructions: Language of the Computer. Zhao Zhang Iowa State University Revised from original slides provided by MKP. Review of Week 4. MIPS procedure/function call convention Leaf and non-leaf examples

Download Presentation

Chapter 2

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Chapter 2

CprE 381 Computer Organization and Assembly Level Programming, Fall 2013

Chapter 2

Instructions: Language of the Computer

Zhao Zhang

Iowa State University

Revised from original slides provided by MKP


Review of week 4

Review of Week 4

  • MIPS procedure/function call convention

  • Leaf and non-leaf examples

  • Clearing array example

  • String copy example

  • Other issues:

    • Load 32-bit immediate

    • Assembler, loader, and compiler effects

§2.8 Supporting Procedures in Computer Hardware

Chapter 2 — Instructions: Language of the Computer — 2


Announcements

Announcements

  • Exam 1 on Friday Oct. 4

  • Course review on Wednesday Oct. 2

  • HW4 is due on Sep. 27

  • HW5 will be due on Oct. 11

    • Do HW5 as exercise before Exam 1

    • No HW and quizzes next week

  • Lab 2 demo is due this week and Lab 3 demo due next week

  • Lab 4 starts next week, due in one week

Chapter 1 — Computer Abstractions and Technology — 3


Exam 1

Exam 1

  • Open book, open notes, calculator are allowed

  • E-book reader is allowed

    • Must be put in airplane mode

  • Coverage

    • Chapter 1, Computer Abstraction and Technology

    • Chapter 2, Instructions: Language of the Computer

    • Some contents from Appendix B

    • MIPS floating-point instructions

Chapter 1 — Computer Abstractions and Technology — 4


Exam question types

Exam Question Types

  • Short conceptual questions

  • Calculation: speedup, power saving, CPI, etc.

  • MIPS assembly programming

    • Translate C statements to MIPS (arithmetic, load/store, branch and jump, others)

    • Translate C functions to MIPS (call convention)

  • Among others

    Suggestions:

  • Review slidesand textbook

  • Review homework and quizzes

Chapter 1 — Computer Abstractions and Technology — 5


Overview for week 5

Overview for Week 5

Overview for Week 5, Sep. 23 - 27

  • Bubble sorting example

    • It will be used in Mini-Projects

  • Floating point instructions

  • ARM and x86 instruction set overview

Chapter 1 — Computer Abstractions and Technology — 6


Classic bubble sorting

Classic Bubble Sorting

  • Bubble sort: Swap two adjacent elements if they are out of order

  • Pass the array n times, each time a largest element will float to the top

  • Look at the first pass of five elements

    1st try: 5 3 8 2 7 => 3 5 8 2 7

    2nd try: 3 5 8 2 7 => 3 5 8 2 7

    3rd try: 3 5 827 => 3 5 2 87

    4th try: 3 5 2 7 8=> 3 5 2 7 8

Chapter 1 — Computer Abstractions and Technology — 7


Classic bubble sorting1

Classic Bubble Sorting

  • Pass i only has to check for (n-i) swaps

    • In each pass, an element may float up until it meets a larger element

    • The sorted sub-array increments by one

      1st pass: 5 3 8 2 7 => 3 5 2 7 8

      2nd pass: 3 5 2 7 8=> 3 2 5 7 8

      3ndpass: 3 2 5 7 8 => 2 3 5 7 8

      4ndpass: 2 3 5 7 8 => 2 3 5 7 8

Chapter 1 — Computer Abstractions and Technology — 8


Revised bubble sorting

Revised Bubble Sorting

  • The textbook bubble-sort is optimized to reduce comparisons

    void sort (int v[], int n)

    {

    inti, j;

    for (i= 0; i < n; i++) {

    for (j = i – 1; j >= 0 && v[j] > v[j+1]; j--)

    swap(v, j);

    }

    }

Chapter 1 — Computer Abstractions and Technology — 9


Revised bubble sorting1

Revised Bubble Sorting

  • The classic one let a largest element float to the top of the unsorted sub-array

  • The revised one let an element float to its right place in the sorted sub-array

    1stpass: 538 2 7 => 3 58 2 7

    2ndpass: 3 58 2 7 => 3582 7

    3nd pass: 3582 7 => 2 3 5 8 7

    4nd pass: 2 3 5 87=> 2 3 5 7 8

Chapter 1 — Computer Abstractions and Technology — 10


The swap function

The Swap Function

  • The swap function is a leaf function

    void swap(int v[], int k){int temp; temp = v[k]; v[k] = v[k+1]; v[k+1] = temp;}

    • v in $a0, k in $a1, temp in $t0

§2.13 A C Sort Example to Put It All Together

Chapter 2 — Instructions: Language of the Computer — 11


The swap function1

The Swap Function

swap: sll $t1, $a1, 2 # $t1 = k * 4

add $t1, $a0, $t1 # $t1 = v+(k*4)

# (address of v[k])

lw $t0, 0($t1) # $t0 (temp) = v[k]

lw $t2, 4($t1) # $t2 = v[k+1]

sw $t2, 0($t1) # v[k] = $t2 (v[k+1])

sw $t0, 4($t1) # v[k+1] = $t0 (temp)

jr $ra # return to calling routine

Chapter 2 — Instructions: Language of the Computer — 12


The sort function

The Sort Function

for (i = 0; i < n; i++) {

for (j = i – 1; j >= 0 && v[j] > v[j+1]; j--)

swap(v, j);

}

  • Save $ra to stack, as it’s a non-leaf function

  • Assign i and j to $s0 and $s1

    • They must be preserved when calling swap()

  • Move v, n from $a0 and $a1 to $s2 and $s2

    • They must be preserved, too

    • $a0 and $a1 are used when calling swap()

  • We need a stack frame of 5 words or 20 bytes

Chapter 1 — Computer Abstractions and Technology — 13


Sort prologue and epilogue

Sort Prologue and Epilogue

sort: addi $sp,$sp, –20 # make room on stack for 5 registers

sw $ra, 16($sp) # save $ra on stack

sw $s3,12($sp) # save $s3 on stack

sw $s2, 8($sp) # save $s2 on stack

sw $s1, 4($sp) # save $s1 on stack

sw $s0, 0($sp) # save $s0 on stack

… # procedure body

exit1: lw $s0, 0($sp) # restore $s0 from stack

lw $s1, 4($sp) # restore $s1 from stack

lw $s2, 8($sp) # restore $s2 from stack

lw $s3,12($sp) # restore $s3 from stack

lw $ra,16($sp) # restore $ra from stack

addi $sp,$sp, 20 # restore stack pointer

jr $ra # return to calling routine

  • Entry: Get a frame, save $ra and $s3-$s0

  • Exit: Restore $s0-$s3 and $ra, free the frame

Chapter 2 — Instructions: Language of the Computer — 14


Sort function body

Sort Function Body

A new pseudo instruction

moverd, rs

is equivalent to

add rd, rs, $zero

Example

move $s2, $a0 # $s2 = $zero

move $s3, $a1 # $s3 = $a1

No use of pseudo assembly instructions in Exam 1

Chapter 1 — Computer Abstractions and Technology — 15


Sort function body1

Sort Function Body

Moveparams

move $s2, $a0 # save $a0 into $s2

move $s3, $a1 # save $a1 into $s3

move $s0, $zero # i = 0

for1tst: slt $t0, $s0, $s3 # $t0 = 0 if $s0 ≥ $s3 (i ≥ n)

beq $t0, $zero, exit1 # go to exit1 if $s0 ≥ $s3 (i ≥ n)

addi $s1, $s0, –1 # j = i – 1

for2tst: slti $t0, $s1, 0 # $t0 = 1 if $s1 < 0 (j < 0)

bne $t0, $zero, exit2 # go to exit2 if $s1 < 0 (j < 0)

sll $t1, $s1, 2 # $t1 = j * 4

add $t2, $s2, $t1 # $t2 = v + (j * 4)

lw $t3, 0($t2) # $t3 = v[j]

lw $t4, 4($t2) # $t4 = v[j + 1]

slt $t0, $t4, $t3 # $t0 = 0 if $t4 ≥ $t3

beq $t0, $zero, exit2 # go to exit2 if $t4 ≥ $t3

move $a0, $s2 # 1st param of swap is v (old $a0)

move $a1, $s1 # 2nd param of swap is j

jal swap # call swap procedure

addi $s1, $s1, –1 # j –= 1

j for2tst # jump to test of inner loop

exit2: addi $s0, $s0, 1 # i += 1

j for1tst # jump to test of outer loop

Outer loop

Inner loop

Passparams& call

Inner loop

Outer loop

Chapter 2 — Instructions: Language of the Computer — 16


Sort function optimized

Sort Function Optimized

Old version:

void sort(int v[], int n)

inti, j;

for (i = 0; i < n; i++) {

for (j = i – 1; j >= 0 && v[j] > v[j+1]; j--)

swap(v, j);

}

New version:

void sort(int v[], int n)

{

int *pi, *pj;

for (pi = v; pi < &v[n]; pi++)

for (pj= pj - 1; pj>= v && swap(pj); pj--)

{}

}

Chapter 1 — Computer Abstractions and Technology — 17


New swap function

New Swap Function

  • A more efficient swap function that reduces memory loads

    // swap two adjacent elements if they are

    // out of order. Return 1 if swapped, 0

    // otherwise

    int swap(int *p)

    {

    if (p[0] > p[1]) {

    inttmp = p[0];

    p[0] = p[1];

    p[1] = tmp;

    return 1;

    }

    else

    return 0;

    }

Chapter 1 — Computer Abstractions and Technology — 18


New swap function1

New Swap Function

  • A new swap function

    swap:

    lw $t0, 0($a0) # load p[0]

    lw $t1, 4($a0) # load p[1]

    slt $t2, $t1, $t0 # p[1] < p[0]?

    beq$t2, $zero, else

    sw $t1, 0($a0) # swap

    sw $t0, 4($a0) # swap

    addi $v0, $zero, 1 # $v0 = 1

    jr $ra

    else:

    addi $v0, $zero, 0 # $v0 = 0

    jr $ra

Chapter 1 — Computer Abstractions and Technology — 19


New sort function

New Sort Function

The sort() function optimized

  • Register usage

    • $s0: v

    • $s1: &v[n]

    • $s2: pi

    • $s3: pj

  • Need a frame of 5 words to save $ra and $s0-$s2

Chapter 1 — Computer Abstractions and Technology — 20


Sort prologue and epilogue1

Sort Prologue and Epilogue

sort:

addi $sp, $sp, -20 # frame of 5 words

sw $ra, 16($sp)

sw $s3, 12($sp)

sw$s2, 8($sp)

sw$s1, 4($sp)

sw$s0, 0($sp)

lw $s0, 0($sp)

lw$s1, 4($sp)

lw$s2, 8($sp)

lw$s3, 12($sp)

lw $ra, 16($sp)

addi $sp, $sp, 20 # release frame

jr $ra

MIPS code for sort function body

Chapter 1 — Computer Abstractions and Technology — 21


New sort outer loop

New Sort: Outer Loop

for (pi = v; pi < &v[n]; pi++)

for (pj = pj - 1; pj >= v && swap(pj); pj--)

{}

add $s0, $a0, $zero # $s0 = v

sll $a1, $a1, 2 # $a1 = 4*n

add $s1, $s0, $a1 # $s1 = &v[n]

add $s2, $s0, $zero # pi = v

j for1_tst

for1_loop:

addi$s2, $s2, 4 # pi++

for1_tst:

slt $t0, $s2, $s1 # pi < &v[n]?

bne $t0, $zero, for1_loop # yes? repeat

C code for the inner loop

MIPS code for the inner loop

Chapter 1 — Computer Abstractions and Technology — 22


New sort inner loop

New Sort: Inner Loop

for (pj= pi-1; pj>= v && swap(pj); pj--)

{}

addi $s3, $s2, -4 # pj = pi-1

j for2_tst

for2_loop:

addi $s3, $s3, -4 # pj--

for2_tst:

slt $t0, $s3, $s0 # pj < v?

bne $t0, $zero,for2_exit # yes? exit

add $a0, $s3, $zero # $a0 = pj

jal swap # swap(pj)

bne $v0, $zero,for2_loop # ret 1? cont

for2_exit:

Chapter 1 — Computer Abstractions and Technology — 23


Lab mini projects

Lab Mini-Projects

  • You will use the sorting code to test your CPU design in the lab mini-projects

  • Use the new sorting code

    • The new code is more optimized

    • It will simplify the debugging

Chapter 1 — Computer Abstractions and Technology — 24


Fp instructions in mips

FP Instructions in MIPS

Reading: Textbook Ch. 3.5 and B-71 – B80

  • FP hardware is coprocessor 1

    • Adjunct processor that extends the ISA

  • Separate FP registers

    • 32 single-precision: $f0, $f1, … $f31

    • Paired for double-precision: $f0/$f1, $f2/$f3, …

      • Release 2 of MIPS ISA supports 32 × 64-bit FP reg’s

Chapter 3 — Arithmetic for Computers — 25


Fp instructions in mips1

FP Instructions in MIPS

  • FP instructions operate only on FP registers

    • Programs generally don’t do integer ops on FP data, or vice versa

    • More registers with minimal code-size impact

Chapter 1 — Computer Abstractions and Technology — 26


Fp instructions in mips2

FP Instructions in MIPS

  • FP load and store instructions

    • lwc1, ldc1, swc1, sdc1

      • e.g., ldc1 $f8, 32($sp)

    • lwc1, swc1: Load/store single-precision

    • ldc1, swc1: Load/store double-precision

Chapter 1 — Computer Abstractions and Technology — 27


Fp instructions in mips3

FP Instructions in MIPS

  • Single-precision arithmetic

    • add.s, sub.s, mul.s, div.s

      • e.g., add.s $f0, $f1, $f6

  • Double-precision arithmetic

    • add.d, sub.d, mul.d, div.d

      • e.g., mul.d $f4, $f4, $f6

Chapter 3 — Arithmetic for Computers — 28


Fp instructions in mips4

FP Instructions in MIPS

  • Single- and double-precision comparison

    • c.xx.s, c.xx.d (xx is eq, lt, le, …)

    • Sets or clears FP condition-code bit

      • e.g. c.lt.s $f3, $f4

  • Branch on FP condition code true or false

    • bc1t, bc1f

      • e.g., bc1t TargetLabel

Chapter 1 — Computer Abstractions and Technology — 29


Mips call convention fp

MIPS Call Convention: FP

  • The first two FP parameters in registers

    • 1st parameter in $f12 or $f12:$f13

      • A double-precision parameter takes two registers

    • 2nd FP parameter in $f14or $f14:$f15

    • Extra parameters in stack

  • $f0 stores single-precision FP return value

  • $f0:$f1 stores double-precision FP return value

  • $f0-$f19 are FP temporary registers

  • $f20-$f31 are FP saved temporary registers

Chapter 1 — Computer Abstractions and Technology — 30


Fp example f to c

FP Example: °F to °C

  • C code:

    float f2c (float fahr)

    { return ((5.0/9.0) * (fahr - 32.0));}

  • fahr in $f12, result in $f0

  • Assume literals in global memory space, e.g. const5 for 5.0 and const9 for 9.0

    • Can FP immediate be encoded in MIPS instructions?

Chapter 3 — Arithmetic for Computers — 31


Fp example f to c1

FP Example: °F to °C

  • Compiled MIPS code:

    f2c: lwc1 $f16, const5($gp)lwc1 $f18, const9($gp)div.s $f16, $f16, $f18 lwc1 $f18, const32($gp)sub.s $f18, $f12, $f18mul.s $f0, $f16, $f18jr $ra

Chapter 1 — Computer Abstractions and Technology — 32


Fp example function call

FP Example: Function Call

extern float fahr, cel;

cel = f2c(fahr);

Assume fahris at 100($gp), celis at 104($gp)

lwc1 $f12, 100($gp) # load 1stpara

jal f2c

swcl $f0, 104($gp); # save ret val

Chapter 1 — Computer Abstractions and Technology — 33


Fp example max

FP Example: Max

double max(double x, double y)

{

return (x > y) ? x : y;

}

max:

c.lt.d $f14, $f12 # y < x?

bc1f else # if false, do else

mov.d $f0, $f12 # $f0:$f1 = x

jr $ra

else:

mov.d $f0, $f14 # $f0:$f1 = y

jr $ra

Chapter 1 — Computer Abstractions and Technology — 34


Fp example max1

FP Example: Max

  • How to call max?

    • Assume a, b, c at 100($gp), 108($gp), and 116($gp)

      extern double a, b, c;

      c = max(a, b);

      ldc1 $f12, 100($gp) # $f12:$f13 = a

      ldc1 $f14, 108($gp) # $f14:$f15 = b

      jal max

      sdc1 $f0, 116($gp) # c = $f0:$f1

Chapter 1 — Computer Abstractions and Technology — 35


Fp example search value

FP Example: Search Value

int search(double X[], int size, double value)

{

for (inti = 0; i < size; i++)

if (X[i] == value)

return 1;

return 0;

}

Note 1: There are integer and FP parameters, and the return value is integer

Note 2: A real program may search a value in a range, e.g. [value - delta, value + delta]

Chapter 1 — Computer Abstractions and Technology — 36


Fp example search value1

FP Example: Search Value

search:

add $t0, $zero, $zero # i = 0

j for_cond

for_loop:

sll $t1, $t0, 3 # $t1 = 8*i

add $t1, $a0, $t1 # $t1 = &X[i]

lwc1 $f2, 0($t1) # $f2 = X[i]

c.eq.d $f2, $f12 # X[i] == value?

bc1f endif # if false, skip

addi $v0, $zero, 1 # $v0 = 1

jr $ra # return

endif:

addi $t0, $t0, 1 # i++

for_cond:

slt $t1, $t0, $a1 # i < size?

bne $t1, $zero, for_loop # repeat if true

add $v0, $zero, $zero # to return 0

jr $ra

Chapter 1 — Computer Abstractions and Technology — 37


Fp example array multiplication

FP Example: Array Multiplication

  • X = X + Y × Z

    • All 32 × 32 matrices, 64-bit double-precision elements

  • C code:

    void mm (double x[][], double y[][], double z[][]) { int i, j, k; for (i = 0; i! = 32; i = i + 1) for (j = 0; j! = 32; j = j + 1) for (k = 0; k! = 32; k = k + 1) x[i][j] = x[i][j] + y[i][k] * z[k][j];}

    • Addresses of x, y, z in $a0, $a1, $a2, andi, j, k in $s0, $s1, $s2

Chapter 3 — Arithmetic for Computers — 38


Fp example array multiplication1

FP Example: Array Multiplication

  • MIPS code:

    li $t1, 32 # $t1 = 32 (row size/loop end) li $s0, 0 # i = 0; initialize 1st for loopL1: li $s1, 0 # j = 0; restart 2nd for loopL2: li $s2, 0 # k = 0; restart 3rd for loop sll $t2, $s0, 5 # $t2 = i * 32 (size of row of x)addu $t2, $t2, $s1 # $t2 = i * size(row) + jsll $t2, $t2, 3 # $t2 = byte offset of [i][j] addu $t2, $a0, $t2 # $t2 = byte address of x[i][j] l.d $f4, 0($t2) # $f4 = 8 bytes of x[i][j]L3: sll $t0, $s2, 5 # $t0 = k * 32 (size of row of z) addu $t0, $t0, $s1 # $t0 = k * size(row) + j sll $t0, $t0, 3 # $t0 = byte offset of [k][j] addu $t0, $a2, $t0 # $t0 = byte address of z[k][j] l.d $f16, 0($t0) # $f16 = 8 bytes of z[k][j] …

Chapter 3 — Arithmetic for Computers — 39


Fp example array multiplication2

FP Example: Array Multiplication

…sll $t0, $s0, 5 # $t0 = i*32 (size of row of y) addu $t0, $t0, $s2 # $t0 = i*size(row) + k sll $t0, $t0, 3 # $t0 = byte offset of [i][k] addu $t0, $a1, $t0 # $t0 = byte address of y[i][k] l.d $f18, 0($t0) # $f18 = 8 bytes of y[i][k]mul.d $f16, $f18, $f16 # $f16 = y[i][k] * z[k][j] add.d $f4, $f4, $f16 # f4=x[i][j] + y[i][k]*z[k][j] addiu $s2, $s2, 1 # $k k + 1 bne $s2, $t1, L3 # if (k != 32) go to L3 s.d $f4, 0($t2) # x[i][j] = $f4 addiu $s1, $s1, 1 # $j = j + 1 bne $s1, $t1, L2 # if (j != 32) go to L2 addiu $s0, $s0, 1 # $i = i + 1 bne $s0, $t1, L1 # if (i != 32) go to L1

Chapter 3 — Arithmetic for Computers — 40


Arm mips similarities

ARM & MIPS Similarities

  • ARM: the most popular embedded core

  • Similar basic set of instructions to MIPS

§2.16 Real Stuff: ARM Instructions

Chapter 2 — Instructions: Language of the Computer — 41


Compare and branch in arm

Compare and Branch in ARM

  • Uses condition codes for result of an arithmetic/logical instruction

    • Negative, zero, carry, overflow

    • Compare instructions to set condition codes without keeping the result

  • Each instruction can be conditional

    • Top 4 bits of instruction word: condition value

    • Can avoid branches over single instructions

Chapter 2 — Instructions: Language of the Computer — 42


Instruction encoding

Instruction Encoding

Chapter 2 — Instructions: Language of the Computer — 43


The intel x86 isa

The Intel x86 ISA

  • Evolution with backward compatibility

    • 8080 (1974): 8-bit microprocessor

      • Accumulator, plus 3 index-register pairs

    • 8086 (1978): 16-bit extension to 8080

      • Complex instruction set (CISC)

    • 8087 (1980): floating-point coprocessor

      • Adds FP instructions and register stack

    • 80286 (1982): 24-bit addresses, MMU

      • Segmented memory mapping and protection

    • 80386 (1985): 32-bit extension (now IA-32)

      • Additional addressing modes and operations

      • Paged memory mapping as well as segments

§2.17 Real Stuff: x86 Instructions

Chapter 2 — Instructions: Language of the Computer — 44


The intel x86 isa1

The Intel x86 ISA

  • Further evolution…

    • i486 (1989): pipelined, on-chip caches and FPU

      • Compatible competitors: AMD, Cyrix, …

    • Pentium (1993): superscalar, 64-bit datapath

      • Later versions added MMX (Multi-Media eXtension) instructions

      • The infamous FDIV bug

    • Pentium Pro (1995), Pentium II (1997)

      • New microarchitecture (see Colwell, The Pentium Chronicles)

    • Pentium III (1999)

      • Added SSE (Streaming SIMD Extensions) and associated registers

    • Pentium 4 (2001)

      • New microarchitecture

      • Added SSE2 instructions

Chapter 2 — Instructions: Language of the Computer — 45


The intel x86 isa2

The Intel x86 ISA

  • And further…

    • AMD64 (2003): extended architecture to 64 bits

    • EM64T – Extended Memory 64 Technology (2004)

      • AMD64 adopted by Intel (with refinements)

      • Added SSE3 instructions

    • Intel Core (2006)

      • Added SSE4 instructions, virtual machine support

    • AMD64 (announced 2007): SSE5 instructions

      • Intel declined to follow, instead…

    • Advanced Vector Extension (announced 2008)

      • Longer SSE registers, more instructions

  • If Intel didn’t extend with compatibility, its competitors would!

    • Technical elegance ≠ market success

Chapter 2 — Instructions: Language of the Computer — 46


Basic x86 registers

Basic x86 Registers

Chapter 2 — Instructions: Language of the Computer — 47


Basic x86 addressing modes

Basic x86 Addressing Modes

  • Two operands per instruction

  • Memory addressing modes

    • Address in register

    • Address = Rbase + displacement

    • Address = Rbase + 2scale× Rindex (scale = 0, 1, 2, or 3)

    • Address = Rbase + 2scale× Rindex + displacement

Chapter 2 — Instructions: Language of the Computer — 48


X86 instruction encoding

x86 Instruction Encoding

  • Variable length encoding

    • Postfix bytes specify addressing mode

    • Prefix bytes modify operation

      • Operand length, repetition, locking, …

Chapter 2 — Instructions: Language of the Computer — 49


Implementing ia 32

Implementing IA-32

  • Complex instruction set makes implementation difficult

    • Hardware translates instructions to simpler microoperations

      • Simple instructions: 1–1

      • Complex instructions: 1–many

    • Microengine similar to RISC

    • Market share makes this economically viable

  • Comparable performance to RISC

    • Compilers avoid complex instructions

Chapter 2 — Instructions: Language of the Computer — 50


Fallacies

Fallacies

  • Powerful instruction  higher performance

    • Fewer instructions required

    • But complex instructions are hard to implement

      • May slow down all instructions, including simple ones

    • Compilers are good at making fast code from simple instructions

  • Use assembly code for high performance

    • But modern compilers are better at dealing with modern processors

    • More lines of code  more errors and less productivity

§2.18 Fallacies and Pitfalls

Chapter 2 — Instructions: Language of the Computer — 51


Fallacies1

Fallacies

  • Backward compatibility  instruction set doesn’t change

    • But they do accrete more instructions

x86 instruction set

Chapter 2 — Instructions: Language of the Computer — 52


Pitfalls

Pitfalls

  • Sequential words are not at sequential addresses

    • Increment by 4, not by 1!

  • Keeping a pointer to an automatic variable after procedure returns

    • e.g., passing pointer back via an argument

    • Pointer becomes invalid when stack popped

Chapter 2 — Instructions: Language of the Computer — 53


Concluding remarks

Concluding Remarks

  • Design principles

    1.Simplicity favors regularity

    2.Smaller is faster

    3.Make the common case fast

    4.Good design demands good compromises

  • Layers of software/hardware

    • Compiler, assembler, hardware

  • MIPS: typical of RISC ISAs

    • c.f. x86

§2.19 Concluding Remarks

Chapter 2 — Instructions: Language of the Computer — 54


Concluding remarks1

Concluding Remarks

  • Measure MIPS instruction executions in benchmark programs

    • Consider making the common case fast

    • Consider compromises

Chapter 2 — Instructions: Language of the Computer — 55


  • Login