250 likes | 535 Views
CS 105 Tour of Black Holes of Computing. Machine-Level Programming: X86-64. Topics Registers Stack Function Calls Local Storage. X86-64.ppt. Interesting Features of x86-64. Pointers and long Integers are 64 bits Arithmetic ops support 8, 16, 32, and 64-bit ints
E N D
CS 105Tour of Black Holes of Computing Machine-Level Programming:X86-64 Topics • Registers • Stack • Function Calls • Local Storage X86-64.ppt
Interesting Features of x86-64 • Pointers and long Integers are 64 bits • Arithmetic ops support 8, 16, 32, and 64-bit ints • Set of general purpose registers is 16 • Full compatibility with X32 • Conditional ops are implemented using conditional move instructions (when possible) – faster than branch • Much of pgm state held in registers • Up to 6 arguments passed via registers (not stack) • Some procedures do not need to access the stack at all?? • Floating-point ops implemented using register-oriented instructions rather than stack-based • With move to x86-64, gcc updated to take advantage of many architecture changes
Data Representations: IA32 + x86-64 Sizes of C Objects (in Bytes) C Data Type Typical 32-bit Intel IA32 x86-64 • unsigned 4 4 4 • int 4 4 4 • long int 4 4 8 • char 1 1 1 • short 2 2 2 • float 4 4 4 • double 8 8 8 • long double 8 10/12 16 • char * 4 4 8Or any other pointer x86-64: 64 bit ints and 64 bit pointers
x86-64 Integer Registers - 16 %rax %r8 %eax %r8d • Extend existing registers. Add 8 new ones, word (16 bit and byte (8 bit) registers still available, first 2 bytes available (rax, rbx, rcx, rdx) • Make %ebp/%rbp general purpose, note 32 bit names %rbx %r9 %ebx %r9d %rcx %r10 %ecx %r10d %rdx %r11 %edx %r11d %rsi %r12 %esi %r12d %rdi %r13 %edi %r13d %rsp %r14 %esp %r14d %rbp %r15 %ebp %r15d
Instructions • Long word l (4 Bytes) ↔ Quad word q (8 Bytes) • New instructions: • movl → movq • addl → addq • sall → salq • etc. • 32-bit instructions that generate 32-bit results • Set higher order bits of destination register to 0 • Example: addl
Swap in 32-bit Mode swap: pushl %ebp movl %esp,%ebp pushl %ebx movl 12(%ebp),%ecx movl 8(%ebp),%edx movl (%ecx),%eax movl (%edx),%ebx movl %eax,(%edx) movl %ebx,(%ecx) movl -4(%ebp),%ebx movl %ebp,%esp popl %ebp ret void swap(int *xp, int *yp) { int t0 = *xp; int t1 = *yp; *xp = t1; *yp = t0; } Setup Body Finish
Swap in 64-bit Mode Operands passed in registers (why useful?) • First (xp) in %rdi, second (yp) in %rsi • 64-bit pointers No stack operations required, actually no stack frame 32-bit data • Data held in registers %eax and %edx • movl operation void swap(int *xp, int *yp) { int t0 = *xp; int t1 = *yp; *xp = t1; *yp = t0; } swap: movl (%rdi), %edx movl (%rsi), %eax movl %eax, (%rdi) movl %edx, (%rsi) retq
x86-64 Integer Registers, 6 used for arguments %rax %r8 Return value Argument #5 %rbx %r9 Callee saved Argument #6 %rcx %r10 Callee saved Argument #4 %rdx %r11 Used for linking Argument #3 %rsi %r12 C: Callee saved Argument #2 %rdi %r13 Argument #1 Callee saved %rsp %r14 Stack pointer Callee saved %rbp %r15 Callee saved Callee saved
Interesting Features of Stack Frame Allocate entire frame at once • All stack accesses can be relative to %rsp • Do so by decrementing stack pointer • Can delay allocation, since safe to temporarily use red zone- 128 bytes available below the stack Simple deallocation • Increment stack pointer • No base/frame pointer needed
x86-64 Registers Arguments passed to functions via registers • If more than 6 integral parameters, then pass rest on stack • These registers can be used as caller-saved as well All references to stack frame via stack pointer • Eliminates need to update %ebp/%rbp Other Registers • R12-R15, Rbx, Rbp, calleesaved • 2 or 3 have special uses • Rax holds return value for a function • Rsp is the stack pointer
Reading Condition Codes: x86-64 • SetX Instructions: • Set single byte based on combination of condition codes • Does not alter remaining 3 bytes int gt (long x, long y) { return x > y; } long lgt (long x, long y) { return x > y; } Body (same for both) xorl %eax, %eax # eax = 0 cmpq %rsi, %rdi # Compare x and y setg %al # al = x > y Will disappear Blackboard?
Reading Condition Codes: x86-64 • SetX Instructions: • Set single byte based on combination of condition codes • Does not alter remaining 3 bytes int gt (long x, long y) { return x > y; } long lgt (long x, long y) { return x > y; } Body (same for both) xorl %eax, %eax # eax = 0 cmpq %rsi, %rdi # Compare x and y setg %al # al = x > y Is %rax zero? Yes: 32-bit instructions set high order 32 bits to 0!
Conditionals: x86-64 int absdiff( int x, int y) { int result; if (x > y) { result = x-y; } else { result = y-x; } return result; } absdiff: # x in %edi, y in %esi movl %edi, %eax # eax = x movl %esi, %edx # edx = y subl %esi, %eax # eax = x-y subl %edi, %edx # edx = y-x cmpl %esi, %edi # x:y cmovle %edx, %eax # eax=edx if <= ret Will disappear Blackboard?
Conditionals: x86-64 Conditional move instruction • cmovCsrc, dest • Move value from src to dest if condition C holds • More efficient than conditional branching (simple control flow) • But overhead: both branches are evaluated int absdiff( int x, int y) { int result; if (x > y) { result = x-y; } else { result = y-x; } return result; } absdiff: # x in %edi, y in %esi movl %edi, %eax # eax = x movl %esi, %edx # edx = y subl %esi, %eax # eax = x-y subl %edi, %edx # edx = y-x cmpl %esi, %edi # x:y cmovle %edx, %eax # eax=edx if <= ret
x86-64 Procedure Summary Heavy use of registers • Parameter passing • More temporaries since more registers Minimal use of stack • Sometimes none • Allocate/deallocate entire block Many tricky optimizations • What kind of stack frame to use • Calling with jump • Various allocation techniques
IA32 Example Addresses not drawn to scale FF Stack address range ~232 $esp 0xffffbcd0 p3 0x65586008 p1 0x55585008 p4 0x1904a110 p2 0x1904a008 &p2 0x18049760 beyond 0x08049744 big_array 0x18049780 huge_array 0x08049760 main() 0x080483c6 useless() 0x08049744 final malloc() 0x006be166 80 Heap Data malloc() is dynamically linked address determined at runtime Text 08 00
x86-64 Example Addresses not drawn to scale 00007F Stack address range ~247 $rsp 0x7ffffff8d1f8 p3 0x2aaabaadd010 p1 0x2aaaaaadc010 p4 0x000011501120 p2 0x000011501010 &p2 0x000010500a60 beyond 0x000000500a44 big_array 0x000010500a80 huge_array 0x000000500a50 main() 0x000000400510 useless() 0x000000400500 final malloc() 0x00386ae6a170 000030 Heap Data malloc() is dynamically linked address determined at runtime Text 000000
x86-64 Summary Pointers and long integers are 64 bit 16 Registers, with 64, 32, 16, 8, bit uses Minimal use of stack • Parameters passed in registers • Allocate/deallocate entire block for a procedure • Can use 128 bytes passed (beyond) stack pointer (red zone) • can use stack without messing with stack pointer • No Frame Pointer Many tricky optimizations • What kind of stack frame to use • Calling with jump • Various allocation techniques • Compilers need to be smarter
x86-64 Locals in the Red Zone Avoiding Stack Pointer Change • Can hold all information within small window beyond stack pointer swap_a: movq (%rdi), %rax movq %rax, -24(%rsp) movq (%rsi), %rax movq %rax, -16(%rsp) movq -16(%rsp), %rax movq %rax, (%rdi) movq -24(%rsp), %rax movq %rax, (%rsi) ret /* Swap, using local array */ void swap_a(long *xp, long *yp) { volatile long loc[2]; loc[0] = *xp; loc[1] = *yp; *xp = loc[1]; *yp = loc[0]; } rtn Ptr %rsp unused −8 loc[1] −16 loc[0] −24
x86-64 Stack Frame Example • Keeps values of a and i in callee save registers • Must set up stack frame to save these registers swap_ele_su: movq %rbx, -16(%rsp) movslq %esi,%rbx movq %r12, -8(%rsp) movq %rdi, %r12 leaq (%rdi,%rbx,8), %rdi subq $16, %rsp leaq 8(%rdi), %rsi call swap movq (%r12,%rbx,8), %rax addq %rax, sum(%rip) movq (%rsp), %rbx movq 8(%rsp), %r12 addq $16, %rsp ret long sum = 0; /* Swap a[i] & a[i+1] */ void swap_ele_su (long a[], int i) { swap(&a[i], &a[i+1]); sum += a[i]; } Blackboard?
Understanding x86-64 Stack Frame swap_ele_su: movq %rbx, -16(%rsp) # Save %rbx movslq %esi,%rbx # Extend & save i movq %r12, -8(%rsp) # Save %r12 movq %rdi, %r12 # Save a – array pointer leaq (%rdi,%rbx,8), %rdi # Create &a[i] subq $16, %rsp # Allocate stack frame leaq 8(%rdi), %rsi # &a[i+1] call swap # swap() movq (%r12,%rbx,8), %rax # a[i] addq %rax, sum(%rip) # sum += a[i] movq (%rsp), %rbx # Restore %rbx movq 8(%rsp), %r12 # Restore %r12 addq $16, %rsp # Deallocate stack frame ret
%rsp rtn addr rtn addr +8 %r12 %r12 −8 %rsp %rbx %rbx −16 Understanding x86-64 Stack Frame swap_ele_su: movq %rbx, -16(%rsp) # Save %rbx movslq %esi,%rbx # Extend & save i movq %r12, -8(%rsp) # Save %r12 movq %rdi, %r12 # Save a leaq (%rdi,%rbx,8), %rdi # &a[i] subq $16, %rsp # Allocate stack frame leaq 8(%rdi), %rsi # &a[i+1] call swap # swap() movq (%r12,%rbx,8), %rax # a[i] addq %rax, sum(%rip) # sum += a[i] movq (%rsp), %rbx # Restore %rbx movq 8(%rsp), %r12 # Restore %r12 addq $16, %rsp # Deallocate stack frame ret
Interesting Features of Stack Frame Allocate entire frame at once • All stack accesses can be relative to %rsp • Do by decrementing stack pointer • Can delay allocation, since safe to temporarily use red zone Simple deallocation • Increment stack pointer • No base/frame pointer needed