1 / 43

Machine-Level Representation of Programs I

Machine-Level Representation of Programs I. Outline. Compiler drivers History of the Intel IA-32 architecture Assembly code and object code Memory and Registers Addressing Mode Data Formats Suggested reading Chap 1.2, 1.4.1, 1.7.3, 3.1, 3.2, 3.3, 3.4.1. The Hello Program.

Download Presentation

Machine-Level Representation of Programs I

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Machine-Level Representation of Programs I

  2. Outline • Compiler drivers • History of the Intel IA-32 architecture • Assembly code and object code • Memory and Registers • Addressing Mode • Data Formats • Suggested reading • Chap 1.2, 1.4.1, 1.7.3, 3.1, 3.2, 3.3, 3.4.1

  3. The Hello Program • It begins life as a high-level C program • Can be read and understand by human beings • The individual C statements must be translated by compiler drivers • So that the hello program can run on a computer system • Compiler:编译器

  4. The Hello Program • The C programs are translated into • A sequence of low-level machine-language instructions • These instructions are then packaged in a form • called an object program • Object program are stored as a binary disk file • Also referred to as executable object files

  5. Preprocessor (cpp) hello.i Modified source program (text) Compiler (cc1) hello.s Assembly program (text) Assembler (as) hello.o Relocatable object program (binary) Linker (ld) hello Executable object program (binary) The Context of a Compiler (gcc) Figure 1.3 P5 hello.c Source program (text) Compiler:编译器 Assembler:汇编器 Linker:连接器

  6. Characteristics of the high level programming languages • Abstraction • Productive • reliable • Type checking • As efficient as hand written code • Can be compiled and executed on a number of different machines, whereas assembly code is highly machine specific Productive:多产的 Reliable: 可靠的

  7. Characteristics of the assembly programming languages • Managing memory • Low level instructions to carry out the computation • Highly machine specific

  8. Why should we understand the assembly code • Understand the optimization capabilities of the compiler • Analyze the underlying inefficiencies in the code • Sometimes the run-time behavior of a program is needed

  9. From writing assembly code to understand assembly code • Different set of skills • Transformations • Relation between source code and assembly code • Reverse engineering • Trying to understand the process by which a system was created • By studying the system and • By working backward Backward:回溯

  10. A Historical Perspective • Long evolutionary development • Started from rather primitive 16-bit processors • Added more features • Take the advantage of the technology improvements • Satisfy the demands for higher performance and for supporting more advanced operating systems • Laden with features providing backward compatibility that are obsolete * laden with:承载 * compatibility: 兼容性 * obsolete:陈旧的

  11. X86 family • 8086(1978, 29K) • The heart of the IBM PC & DOS • 1M bytes addressable, 640K for users • 80286(1982, 134K) • More (now obsolete) addressing modes • Basis of the IBM PC-AT & Windows

  12. X86 family • i386(1985, 275K) • 32 bits architecture, flat addressing model • Support a Unix operating system • I486(1989, 1.9M) • Integrated the floating-point unit onto the processor chip

  13. X86 family • Pentium(1993, 3.1M) • PentiumPro(1995, 6.5M) • P6 microarchitecture • Conditional mov • Pentium/MMX(1997, 4.5M) • New class of instructions for manipulating vectors of integers

  14. X86 family • Pentium II(1997, 7M) • Implementing MMX instructions within P6 • Pentium III(1999, 8.2M) • New class of instructions for manipulating vectors of floating-point numbers(SSE, Stream SIMD Extension)

  15. X86 family • Pentium 4(2001, 42M) • Netburst microarchitecture • 144 new SSE2 instructions

  16. X86 family • Advanced Micro Devices (AMD) • Now are close competitors to Intel • Developing own extension to 64-bits

  17. X86 family • Transmeta • In January of 2002, introduced CrucoeTM processor • Radically different approach to implementation • Translates x86 code into “Very Long Instruction Word” (VLIW) code • High degree of parallelism • Shooting for low-power market such as lap-top computers

  18. Hardware Organization Figure 1.4 P7 • CPU: Central Processing Unit • ALU: Arithmetic/Logic Unit • PC: Program Counter • USB: Universal Serial Bus

  19. Virtual spaces • A linear array of bytes • each with its own unique address (array index) starting at zero 0xffffffff 0xfffffffe 0x2 0x1 0x0 contents addresses

  20. Data layout • Object model in C • Different data types can be declared

  21. Data layout • Object model in assembly • A large, byte-addressable array • No distinctions even between signed or unsigned integers • Code, user data, OS data • Run-time stack for managing procedure call and return • Blocks of memory allocated by user

  22. Figure 1.13 P17

  23. Operations in C constructs • Arithmetic expression evaluation • Loops • Procedure calls and returns • Translated into sequences of instructions

  24. Operations in Assembly Instructions • Performs only a very elementary operation • Normally one by one in sequential • Operate data stored in registers • Transfer data between memory and a register • Conditionally branch to a new instruction address

  25. FF C0 %eax %ah %al Addresses BF Stack %edx %dh %dl %ecx %ch %cl Data %ebx %bh %bl 80 Heap 7F %esi %edi Instructions %esp 40 DLLs %ebp 3F Heap %eip Data %eflag 08 Text 00 Assembly Programmer’s View Figure 3.2P136

  26. Programmer-Visible States P129 • Program Counter(%eip) • Address of the next instruction • Register File • Heavily used program data • Integer and floating-point

  27. Programmer-Visible States • Conditional code register • Hold status information about the most recently executed instruction • Implement conditional changes in the control flow

  28. Code Examples P130

  29. Code Examples P131

  30. Code Examples

  31. C Code • Add two signed integers • int t = x+y;

  32. Assembly Code • Operands: • x: Register %eax • y: Memory M[%ebp+8] • t: Register %eax • Instruction • addl 8(%ebp),%eax • Add 2 4-byte integers • Similar to expression x +=y • Return function value in %eax

  33. Object Code • 3-byte instruction • Stored at address 0x80483b7 • 0x80483b7: 03 45 08

  34. variable constant Operands P137 • In high level languages • Either constants (常数) • Or variable (变量) • Example • A = A + 4

  35. memory register immediate Operands • Counterparts in assembly languages • Immediate ( constant ) • Register ( variable ) • Memory ( variable ) • Example movl 8(%ebp),%eax addl $4, %eax

  36. Simple Addressing Mode • Immediate • represents a constant • The format is $imm ($4, $0xffffffff) • Registers • The fastest storage units in computer systems • Typically 32-bit long • Register mode Ea • The value stored in the register • Noted as R[Ea]

  37. Virtual spaces • A linear array of bytes • each with its own unique address (array index) starting at zero 0xffffffff 0xfffffffe 0x2 0x1 0x0 contents addresses

  38. Memory References • The name of the array is annotated as M • If addr is a memory address • M[addr] is the content of the memory starting at addr • addris used as an array index • How many bytes are there in M[addr]? • It depends on the context

  39. Memory Addressing Mode • An expression for • a memory address (or an array index) • Most general form • imm (Eb, Ei, s) • s: 1, 2, 4, 8 • The address represented by the above form • imm + R[Eb] + R[Ei] * s • It gives the value • M[imm + R[Eb] + R[Ei] * s]

  40. Addressing Mode Figure 3.3 P137

  41. Practice problem 3.1 P138 Operand Value Comment %eax 0x100 Register (%eax) 0xFF Address 0x100 Immediate $0x108 0x108 0x108 0x13 Absolute address 260(%ecx,%edx) 0x13 Address 0x108 (%eax,%edx,4) 0x11 Address 0x10C

  42. Data Formats Figure 3.1 P135

  43. Data Formats • Move data instruction • mov (general) • movb (move byte) • movw (move word) • movl (move double word)

More Related