1 / 23

Machine-Level Programming: X86-64

CS 105 Tour of Black Holes of Computing. Machine-Level Programming: X86-64. Topics Registers Stack Function Calls Local Storage. X86-64.ppt. Interesting Features of x86-64. Pointers and long Integers are 64 bits Arithmetic ops support 8, 16, 32, and 64-bit ints

herman
Download Presentation

Machine-Level Programming: X86-64

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 105Tour of Black Holes of Computing Machine-Level Programming:X86-64 Topics • Registers • Stack • Function Calls • Local Storage X86-64.ppt

  2. Interesting Features of x86-64 • Pointers and long Integers are 64 bits • Arithmetic ops support 8, 16, 32, and 64-bit ints • Set of general purpose registers is 16 • Full compatibility with X32 • Conditional ops are implemented using conditional move instructions (when possible) – faster than branch • Much of pgm state held in registers • Up to 6 arguments passed via registers (not stack) • Some procedures do not need to access the stack at all?? • Floating-point ops implemented using register-oriented instructions rather than stack-based • With move to x86-64, gcc updated to take advantage of many architecture changes

  3. Data Representations: IA32 + x86-64 Sizes of C Objects (in Bytes) C Data Type Typical 32-bit Intel IA32 x86-64 • unsigned 4 4 4 • int 4 4 4 • long int 4 4 8 • char 1 1 1 • short 2 2 2 • float 4 4 4 • double 8 8 8 • long double 8 10/12 16 • char * 4 4 8Or any other pointer x86-64: 64 bit ints and 64 bit pointers

  4. x86-64 Integer Registers - 16 %rax %r8 %eax %r8d • Extend existing registers. Add 8 new ones, word (16 bit and byte (8 bit) registers still available, first 2 bytes available (rax, rbx, rcx, rdx) • Make %ebp/%rbp general purpose, note 32 bit names %rbx %r9 %ebx %r9d %rcx %r10 %ecx %r10d %rdx %r11 %edx %r11d %rsi %r12 %esi %r12d %rdi %r13 %edi %r13d %rsp %r14 %esp %r14d %rbp %r15 %ebp %r15d

  5. Instructions • Long word l (4 Bytes) ↔ Quad word q (8 Bytes) • New instructions: • movl → movq • addl → addq • sall → salq • etc. • 32-bit instructions that generate 32-bit results • Set higher order bits of destination register to 0 • Example: addl

  6. Swap in 32-bit Mode swap: pushl %ebp movl %esp,%ebp pushl %ebx movl 12(%ebp),%ecx movl 8(%ebp),%edx movl (%ecx),%eax movl (%edx),%ebx movl %eax,(%edx) movl %ebx,(%ecx) movl -4(%ebp),%ebx movl %ebp,%esp popl %ebp ret void swap(int *xp, int *yp) { int t0 = *xp; int t1 = *yp; *xp = t1; *yp = t0; } Setup Body Finish

  7. Swap in 64-bit Mode Operands passed in registers (why useful?) • First (xp) in %rdi, second (yp) in %rsi • 64-bit pointers No stack operations required, actually no stack frame 32-bit data • Data held in registers %eax and %edx • movl operation void swap(int *xp, int *yp) { int t0 = *xp; int t1 = *yp; *xp = t1; *yp = t0; } swap: movl (%rdi), %edx movl (%rsi), %eax movl %eax, (%rdi) movl %edx, (%rsi) retq

  8. x86-64 Integer Registers, 6 used for arguments %rax %r8 Return value Argument #5 %rbx %r9 Callee saved Argument #6 %rcx %r10 Callee saved Argument #4 %rdx %r11 Used for linking Argument #3 %rsi %r12 C: Callee saved Argument #2 %rdi %r13 Argument #1 Callee saved %rsp %r14 Stack pointer Callee saved %rbp %r15 Callee saved Callee saved

  9. Interesting Features of Stack Frame Allocate entire frame at once • All stack accesses can be relative to %rsp • Do so by decrementing stack pointer • Can delay allocation, since safe to temporarily use red zone- 128 bytes available below the stack Simple deallocation • Increment stack pointer • No base/frame pointer needed

  10. x86-64 Registers Arguments passed to functions via registers • If more than 6 integral parameters, then pass rest on stack • These registers can be used as caller-saved as well All references to stack frame via stack pointer • Eliminates need to update %ebp/%rbp Other Registers • R12-R15, Rbx, Rbp, calleesaved • 2 or 3 have special uses • Rax holds return value for a function • Rsp is the stack pointer

  11. Reading Condition Codes: x86-64 • SetX Instructions: • Set single byte based on combination of condition codes • Does not alter remaining 3 bytes int gt (long x, long y) { return x > y; } long lgt (long x, long y) { return x > y; } Body (same for both) xorl %eax, %eax # eax = 0 cmpq %rsi, %rdi # Compare x and y setg %al # al = x > y Will disappear Blackboard?

  12. Reading Condition Codes: x86-64 • SetX Instructions: • Set single byte based on combination of condition codes • Does not alter remaining 3 bytes int gt (long x, long y) { return x > y; } long lgt (long x, long y) { return x > y; } Body (same for both) xorl %eax, %eax # eax = 0 cmpq %rsi, %rdi # Compare x and y setg %al # al = x > y Is %rax zero? Yes: 32-bit instructions set high order 32 bits to 0!

  13. Conditionals: x86-64 int absdiff( int x, int y) { int result; if (x > y) { result = x-y; } else { result = y-x; } return result; } absdiff: # x in %edi, y in %esi movl %edi, %eax # eax = x movl %esi, %edx # edx = y subl %esi, %eax # eax = x-y subl %edi, %edx # edx = y-x cmpl %esi, %edi # x:y cmovle %edx, %eax # eax=edx if <= ret Will disappear Blackboard?

  14. Conditionals: x86-64 Conditional move instruction • cmovCsrc, dest • Move value from src to dest if condition C holds • More efficient than conditional branching (simple control flow) • But overhead: both branches are evaluated int absdiff( int x, int y) { int result; if (x > y) { result = x-y; } else { result = y-x; } return result; } absdiff: # x in %edi, y in %esi movl %edi, %eax # eax = x movl %esi, %edx # edx = y subl %esi, %eax # eax = x-y subl %edi, %edx # edx = y-x cmpl %esi, %edi # x:y cmovle %edx, %eax # eax=edx if <= ret

  15. x86-64 Procedure Summary Heavy use of registers • Parameter passing • More temporaries since more registers Minimal use of stack • Sometimes none • Allocate/deallocate entire block Many tricky optimizations • What kind of stack frame to use • Calling with jump • Various allocation techniques

  16. IA32 Example Addresses not drawn to scale FF Stack address range ~232 $esp 0xffffbcd0 p3 0x65586008 p1 0x55585008 p4 0x1904a110 p2 0x1904a008 &p2 0x18049760 beyond 0x08049744 big_array 0x18049780 huge_array 0x08049760 main() 0x080483c6 useless() 0x08049744 final malloc() 0x006be166 80 Heap Data malloc() is dynamically linked address determined at runtime Text 08 00

  17. x86-64 Example Addresses not drawn to scale 00007F Stack address range ~247 $rsp 0x7ffffff8d1f8 p3 0x2aaabaadd010 p1 0x2aaaaaadc010 p4 0x000011501120 p2 0x000011501010 &p2 0x000010500a60 beyond 0x000000500a44 big_array 0x000010500a80 huge_array 0x000000500a50 main() 0x000000400510 useless() 0x000000400500 final malloc() 0x00386ae6a170 000030 Heap Data malloc() is dynamically linked address determined at runtime Text 000000

  18. x86-64 Summary Pointers and long integers are 64 bit 16 Registers, with 64, 32, 16, 8, bit uses Minimal use of stack • Parameters passed in registers • Allocate/deallocate entire block for a procedure • Can use 128 bytes passed (beyond) stack pointer (red zone) • can use stack without messing with stack pointer • No Frame Pointer Many tricky optimizations • What kind of stack frame to use • Calling with jump • Various allocation techniques • Compilers need to be smarter

  19. x86-64 Locals in the Red Zone Avoiding Stack Pointer Change • Can hold all information within small window beyond stack pointer swap_a: movq (%rdi), %rax movq %rax, -24(%rsp) movq (%rsi), %rax movq %rax, -16(%rsp) movq -16(%rsp), %rax movq %rax, (%rdi) movq -24(%rsp), %rax movq %rax, (%rsi) ret /* Swap, using local array */ void swap_a(long *xp, long *yp) { volatile long loc[2]; loc[0] = *xp; loc[1] = *yp; *xp = loc[1]; *yp = loc[0]; } rtn Ptr %rsp unused −8 loc[1] −16 loc[0] −24

  20. x86-64 Stack Frame Example • Keeps values of a and i in callee save registers • Must set up stack frame to save these registers swap_ele_su: movq %rbx, -16(%rsp) movslq %esi,%rbx movq %r12, -8(%rsp) movq %rdi, %r12 leaq (%rdi,%rbx,8), %rdi subq $16, %rsp leaq 8(%rdi), %rsi call swap movq (%r12,%rbx,8), %rax addq %rax, sum(%rip) movq (%rsp), %rbx movq 8(%rsp), %r12 addq $16, %rsp ret long sum = 0; /* Swap a[i] & a[i+1] */ void swap_ele_su (long a[], int i) { swap(&a[i], &a[i+1]); sum += a[i]; } Blackboard?

  21. Understanding x86-64 Stack Frame swap_ele_su: movq %rbx, -16(%rsp) # Save %rbx movslq %esi,%rbx # Extend & save i movq %r12, -8(%rsp) # Save %r12 movq %rdi, %r12 # Save a – array pointer leaq (%rdi,%rbx,8), %rdi # Create &a[i] subq $16, %rsp # Allocate stack frame leaq 8(%rdi), %rsi # &a[i+1] call swap # swap() movq (%r12,%rbx,8), %rax # a[i] addq %rax, sum(%rip) # sum += a[i] movq (%rsp), %rbx # Restore %rbx movq 8(%rsp), %r12 # Restore %r12 addq $16, %rsp # Deallocate stack frame ret

  22. %rsp rtn addr rtn addr +8 %r12 %r12 −8 %rsp %rbx %rbx −16 Understanding x86-64 Stack Frame swap_ele_su: movq %rbx, -16(%rsp) # Save %rbx movslq %esi,%rbx # Extend & save i movq %r12, -8(%rsp) # Save %r12 movq %rdi, %r12 # Save a leaq (%rdi,%rbx,8), %rdi # &a[i] subq $16, %rsp # Allocate stack frame leaq 8(%rdi), %rsi # &a[i+1] call swap # swap() movq (%r12,%rbx,8), %rax # a[i] addq %rax, sum(%rip) # sum += a[i] movq (%rsp), %rbx # Restore %rbx movq 8(%rsp), %r12 # Restore %r12 addq $16, %rsp # Deallocate stack frame ret

  23. Interesting Features of Stack Frame Allocate entire frame at once • All stack accesses can be relative to %rsp • Do by decrementing stack pointer • Can delay allocation, since safe to temporarily use red zone Simple deallocation • Increment stack pointer • No base/frame pointer needed

More Related