1 / 44

Computer System Chapter 7. Linking

Computer System Chapter 7. Linking. Lynn Choi Korea University. A Simpl e Program Translation. m.c. ASCII source file. Translator. Binary executable object file (memory image on disk). p. Problems: Efficiency: small change requires complete recompilation

lovie
Download Presentation

Computer System Chapter 7. Linking

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computer SystemChapter 7. Linking Lynn Choi Korea University

  2. A Simple Program Translation m.c ASCII source file Translator Binary executable object file (memory image on disk) p • Problems: • Efficiency: small change requires complete recompilation • Modularity: hard to share common functions (e.g. printf) • Solution: • Separate compilation: use linker

  3. A Better Scheme Using a Linker m.c a.c Translators Translators Separately compiled relocatable object files m.o a.o Linker (ld) Executable object file (contains code and data for all functions defined in m.c and a.c) p

  4. Compiler Driver • Compiler driver coordinates all steps in the translation and linking process. • Typically included with each compilation system (e.g., gcc) • Invokes preprocessor (cpp), compiler (cc1), assembler (as), and linker (ld). • Passes command line arguments to appropriate phases • Example: create executable p from m.c and a.c: bass> gcc -O2 -v -o p m.c a.c cpp [args] m.c /tmp/cca07630.i cc1 /tmp/cca07630.i m.c -O2 [args] -o /tmp/cca07630.s as [args] -o /tmp/cca076301.o /tmp/cca07630.s <similar process for a.c> ld -o p [system obj files] /tmp/cca076301.o /tmp/cca076302.o bass>

  5. Example Program 1 /* main.c */ void swap(); int buf[2] = {1, 2}; int main() { swap(); return 0; } /* swap.c */ extern int buf[]; int *bufp0 = &buf[0]; int *bufp1; void swap() { int temp; bufp1 = &buf[1]; temp = *bufp0; *bufp0 = *bufp1; *bufp1 = temp; } • Unix> gcc –O2 –g –o p main.c swap.c • This generates the following compilation processes • cpp [other arguments] main.c /tmp/main.i • c11 /tmp/main.i main.c –O2 [other arguments] –o /tmp/main.s • as [other arguments] –o /tmp/main.o /tmp/main.s • (The driver goes through the same process to generate swap.o.) • Ld –o p [system object files and args] /tmp/main.o /tmp/swap.o

  6. Linker • Linking • The process of collecting and combining various pieces of code and data into a single file that can be loaded (copied) into memory and executed. • Linking time • Can be done at compile time, i.e. when the source code is translated • Or, at load time, i.e. when the program is loaded into memory • Or, even at run time. • Static Linker • Performs the linking at compile time. • Takes a collection of relocatable object files and command line arguments and generate a fully linked executable object file that can be loaded and run. • Performs two main tasks • Symbol resolution: associate each symbol reference with exactly one symbol definition • Relocation: relocate code and data sections and modify symbol references to the relocated memory locations • Dynamic Linker • Performs the linking at load time or at run time. Will be discussed later..

  7. Object Files • Relocatable object file • Contains binary code and data in a form that can be combined with other relocatable object files at compile time to create an executable object file • Compilers and assemblers generate relocatable object files • Executable object file • Contains binary code and data in a form that can be copied directly into memory and executed • Linkers generate executable object files • Shared object file • A special type of relocatable object file that can be loaded into memory and linked dynamically, at either load time or at run time

  8. What Does a Linker Do? • Merges object files • Merges multiple relocatable (.o) object files into a single executable object file that can loaded and executed by the loader. • Resolves external references • As part of the merging process, resolves external references. • External reference: reference to a symbol defined in another object file. • Relocates symbols • Relocates symbols from their relative locations in the .o files to new absolute positions in the executable. • Updates all references to these symbols to reflect their new positions. • References can be in either code or data • code: a(); /* reference to symbol a */ • data: int *xp=&x; /* reference to symbol x */

  9. Why Linkers? • Modularity • Program can be written as a collection of smaller source files, rather than one monolithic mass. • Can build libraries of common functions (more on this later) • e.g., Math library, standard C library • Efficiency • Time: • Change one source file, compile, and then relink. • No need to recompile other source files. • Space: • Libraries of common functions can be aggregated into a single file... • Yet executable files and running memory images contain only code for the functions they actually use.

  10. Object File Format • Object file format varies from system to system • Examples: a.out (early UNIX systems), COFF (Common Object File Format, used by early versions of System V), PE (Portable Executable, a variation of COFF used by Microsoft Windows), ELF (Used by modern UNIX including Linux) • ELF (Executable and Linkable Format) • Standard binary format for object files • Derives from AT&T System V Unix • Later adopted by BSD Unix variants and Linux • One unified format for • Relocatable object files (.o), • Executable object files • Shared object files (.so) • Generic name: ELF binaries • Better support for shared libraries than old a.out formats.

  11. ELF Object File Format • ELF header • Word size, byte ordering of the system, the machine type (e.g. IA32) • The size of ELF header, the object file type (.e.g. relocatable, executable, or shared), the file offset of the section header table and the size and number of entries in the section header table. • Section header table • Contains the locations and sizes of various sections • A fixed size entry for each section • Segment header table • Page size, virtual and physical addresses of memory segments (sections), segment sizes. • .text section: machine code • .data section: initialized global variables • Local C variables are maintained at run time on the stack, and do not appear in .data or .bss section • .bss section: uninitialized global variables • “Block Storage Start” or “Better Save Space!” • Just a place holder and do not occupy space 0 ELF header Segment header table (required for executables) .text section .data section .bss section .symtab .rel.txt .rel.data .debug Section header table (required for relocatables)

  12. ELF Object File Format (cont) 0 ELF header Program header table (required for executables) .text section .data section .bss section .symtab .rel.text .rel.data .debug .line .strtab Section header table (required for relocatables) • .symtab section • Symbol table • Information about function and global variables • .rel.text section • Relocation info for .text section • Addresses of instructions that will need to be modified in the executable • .rel.data section • Relocation info for .data section • Addresses of pointer data that will need to be modified in the merged executable • .debug section • Info for symbolic debugging (gcc -g) • .line section • Mapping between line numbers in source code and machine instructions in the .text section • .strtab section • String table for symbols in the .symtab and .debug

  13. Life and Scope of an Object • Life vs. scope • Life of an object determines whether the object is still in memory (of the process) whereas the scope of an object determines whether the object can be accessed at this position • It is possible that an object is live but not visible. • It is not possible that an object is visible but not live. • Local variables • Variables defined inside a function • The scope of these variables is only within this function • The life of these variables ends when this function completes • So when we call the function again, storage for variables is created andvalues are reinitialized. • Static local variables - If we want the value to be extent throughout the life of a program, we can define the local variable as "static." • Initialization is performed only at the first call and data is retained between func calls.

  14. Life and Scope of an Object • Global variables • Variables defined outside a function • The scope of these variables is throughout the entire program • The life of these variables ends when the program completes • Static variables • Static variables are local in scope to their module in which they are defined, but life is throughout the program. • Static local variables: static variables inside a function cannot be called from outside the function (because it's not in scope) but is alive and exists in memory. • Static variables: if a static variable is defined in a global space (say at beginning of file) then this variable will be accessible only in this file (file scope) • If you have a global variable and you are distributing your files as a library and you want others not to access your global variable, you may make it static by just prefixing keyword static

  15. Symbols • Three kinds of linker symbols • Global symbols that are defined by module mand that can be referenced by other modules. • Nonstatic C functions and nonstatic global variables • Functions or global variables that are defined without the C static attribute. • Global symbols that are referenced by module m, but defined by other module • These are called externals. • Local symbols that are defined and referenced exclusively by module m • Static C functions and static variables • Local procedure variables that are defined with C static attribute are not managed on the stack. Instead, the compiler allocates space in .data or in .bss • Local linker symbol ≠ local program variable

  16. ELF Symbol Table typedef struct { int name; /* string table offset */ int value; /* section offset, or VM address */ int size; /* object size in bytes */ char type:4, /* data, func, section, or src file name (4 bits) */ binding:4; /* local or global (4 bits) */ char reserved; /* unused */ char section; /* section header index, ABS, UNDEF, */ /* or COMMON */ } Elf_Symbol; Num: Value Size Type Bind Ot Ndx Name 8: 0 8 OBJECT GLOBAL 0 3 buf 9: 0 17 FUNC GLOBAL 0 1 main 10: 0 0 NOTYPE GLOBAL 0 UND swap Three entries in the symbol table for main.o

  17. Strong and Weak Symbols • Symbol resolution • The linker associates each reference with exactly one symbol definition • References to local symbols defined in the same module is straightforward • Resolving references to global variables is trickier • The same symbol might be defined by multiple object files • Program symbols are either strong or weak • strong: procedures and initialized globals • weak: uninitialized globals • At compile time, the compiler exports each global symbol to the assembler as either strong or weak, and the assembler encodes this information implicitly in the symbol table of the relocatable object file p1.c p2.c weak int foo=5; p1() { } int foo; p2() { } strong strong strong

  18. Linker’s Symbol Resolution Rules • Rule 1. A strong symbol can only appear once. • Rule 2. A weak symbol can be overridden by a strong symbol of the same name. • references to the weak symbol resolve to the strong symbol. • Rule 3. If there are multiple weak symbols, the linker can pick an arbitrary one.

  19. Linker Puzzles References to x will refer to the same initialized variable. Nightmare scenario: two identical weak structs, compiled by different compilers with different alignment rules. int x; p1() {} p1() {} Link time error: two strong symbols (p1) References to x will refer to the same uninitialized int. Is this what you really want? int x; p1() {} int x; p2() {} int x; int y; p1() {} double x; p2() {} Writes to x in p2 might overwrite y! Evil! int x=7; int y=5; p1() {} double x; p2() {} Writes to x in p2 will overwrite y! Nasty! int x=7; p1() {} int x; p2() {}

  20. Relocation • After the symbol resolution phase • Once the symbol resolution step has been completed, the linker associates each symbol reference in the code with exactly one symbol definition • Linker knows the exact sizes of the code and data sections in each object module • Relocation consists of 2 steps • Relocating sections and symbol definitions • Merges all sections of the same type into a new aggregate section • .data sections from all the input modules are merged into a single .data section • Assign run-time memory addresses to the new aggregate sections and to each symbol defined internally in each module • Relocating symbol references within sections • Modify external references so that they point to the correct run-time addresses • To perform this step, the linker relies on relocation entries in the relocatable object modules

  21. Example C Program m.c a.c extern int e; int *ep=&e; int x=15; int y; int a() { return *ep+x+y; } int e=7; int main() { int r = a(); exit(0); }

  22. Relocating Sections Relocatable Object Files Executable Object File system code 0 .text headers .data system data system code main() .text a() main() .text m.o more system code .data int e = 7 system data .data int e = 7 a() int *ep = &e .text int x = 15 .bss a.o .data int *ep = &e uninitialized data int x = 15 .symtab .debug .bss int y

  23. Resolving External References Def of local symbol e Ref to external symbol e Def of local symbol ep Defs of local symbols x and y Ref to external symbol exit (defined in libc.so) Ref to external symbol a Def of local symbol a Refs of local symbols ep,x,y • Symbols are lexical entities that name functions and variables. • Each symbol has a value (typically a memory address). • Code consists of symbol definitions and references. • References can be either localorexternal. m.c a.c extern int e; int *ep=&e; int x=15; int y; int a() { return *ep+x+y; } int e=7; int main() { int r = a(); exit(0); }

  24. m.o Relocation Info m.c int e=7; int main() { int r = a(); exit(0); } Disassembly of section .text: 00000000 <main>: 00000000 <main>: 0: 55 pushl %ebp 1: 89 e5 movl %esp,%ebp 3: e8 fc ff ff ff call 4 <main+0x4> 4: R_386_PC32 a 8: 6a 00 pushl $0x0 a: e8 fc ff ff ff call b <main+0xb> b: R_386_PC32 exit f: 90 nop Disassembly of section .data: 00000000 <e>: 0: 07 00 00 00 source: objdump

  25. a.o Relocation Info (.text) a.c Disassembly of section .text: 00000000 <a>: 0: 55 pushl %ebp 1: 8b 15 00 0000 movl 0x0,%edx 6: 00 3: R_386_32 ep 7: a1 00 00 00 00 movl 0x0,%eax 8: R_386_32 x c: 89 e5 movl %esp,%ebp e: 03 02 addl (%edx),%eax 10: 89 ec movl %ebp,%esp 12: 03 05 00 00 00 addl 0x0,%eax 17: 00 14: R_386_32 y 18: 5d popl %ebp 19: c3 ret extern int e; int *ep=&e; int x=15; int y; int a() { return *ep+x+y; }

  26. a.o Relocation Info (.data) a.c Disassembly of section .data: 00000000 <ep>: 0: 00 00 00 00 0: R_386_32 e 00000004 <x>: 4: 0f 00 00 00 extern int e; int *ep=&e; int x=15; int y; int a() { return *ep+x+y; }

  27. Executable After Relocation (.text) 08048530 <main>: 8048530: 55 pushl %ebp 8048531: 89 e5 movl %esp,%ebp 8048533: e8 08 00 00 00 call 8048540 <a> 8048538: 6a 00 pushl $0x0 804853a: e8 35 ff ff ff call 8048474 <_init+0x94> 804853f: 90 nop 08048540 <a>: 8048540: 55 pushl %ebp 8048541: 8b 15 1c a0 04 movl 0x804a01c,%edx 8048546: 08 8048547: a1 20 a0 04 08 movl 0x804a020,%eax 804854c: 89 e5 movl %esp,%ebp 804854e: 03 02 addl (%edx),%eax 8048550: 89 ec movl %ebp,%esp 8048552: 03 05 d0 a3 04 addl 0x804a3d0,%eax 8048557: 08 8048558: 5d popl %ebp 8048559: c3 ret

  28. Executable After Relocation (.data) m.c int e=7; int main() { int r = a(); exit(0); } Disassembly of section .data: 0804a018 <e>: 804a018: 07 00 00 00 0804a01c <ep>: 804a01c: 18 a0 04 08 0804a020 <x>: 804a020: 0f 00 00 00 a.c extern int e; int *ep=&e; int x=15; int y; int a() { return *ep+x+y; }

  29. Linking with Static Libraries • Static library • A package of related object modules, which can be supplied as input to the linker at compile time • The linker copies only the object modules in the library that are actually referenced by the program • Examples • libc.a: ANSI C standard C library that includes printf, scanf, strcpy, etc. • libm.a: ANSI C math library • Instead of • unix> gcc main.c /usr/lib/printf.o /usr/lib/scanf.o … • Do • unix> gcc main.c /usr/lib/libc.a • To create a library, use AR tool • unix> gcc –c addvec.c multvec.c • unix> ar rcs libvector.a addvec.o multvet.o

  30. Source files main2.c vector.h Linking with Static Libraries Translators (cpp, cc1, as) Static libraries libvector.a libc.a Relocatable object files printf.o and any other modules called by printf.o main2.o addvec.o Linker (ld) Fully linked executable object file p2

  31. Packaging Commonly Used Functions • How to package functions commonly used by programmers? • Math, I/O, memory management, string manipulation, etc. • Awkward, given the linker framework so far: • Option 1: Put all functions in a single source file • Programmers link big object file into their programs • Space and time inefficient • Each executable object file occupies disk space as well as memory space • Any change in one function would require the recompilation of the entire source file • Option 2: Put each function in a separate source file • Programmers explicitly link appropriate binaries into their programs • More efficient, but burdensome on the programmer • Solution: static libraries (.aarchive files) • Concatenate related relocatable object files into a single file with an index (called an archive). • Enhance linker so that it tries to resolve unresolved external references by looking for the symbols in one or more archives. • If an archive member file resolves reference, link the member file into executable.

  32. Static Libraries (archives) p1.c p2.c Translator Translator static library (archive) of relocatable object files concatenated into one file. p1.o p2.o libc.a Linker (ld) executable object file (only contains code and data for libc functions that are actually called from p1.c and p2.c) p Further improves modularity and efficiency by packaging commonly used functions [e.g., C standard library (libc), math library (libm)] Linker selects only the .o files in the archive that are actually needed by the program.

  33. Creating Static Libraries atoi.c printf.c random.c ... Translator Translator Translator atoi.o printf.o random.o ar rs libc.a \ atoi.o printf.o … random.o Archiver (ar) C standard library libc.a • Archiver allows incremental updates: • Recompile function that changes and replace .o file in archive.

  34. Commonly Used Libraries • libc.a (the C standard library) • 8 MB archive of 900 object files. • Standard I/O, memory allocation, signal handling, string handling, data and time, random numbers, integer math • libm.a (the C math library) • 1 MB archive of 226 object files. • floating point math (sin, cos, tan, log, exp, sqrt, …) % ar -t /usr/lib/libc.a | sort … fork.o … fprintf.o fpu_control.o fputc.o freopen.o fscanf.o fseek.o fstab.o … % ar -t /usr/lib/libm.a | sort … e_acos.o e_acosf.o e_acosh.o e_acoshf.o e_acoshl.o e_acosl.o e_asin.o e_asinf.o e_asinl.o …

  35. Using Static Libraries • Linker’s algorithm for resolving external references: • Scan .o files and .a files in the command line order. • During the scan, keep a list of the current unresolved references. • As each new .o or .a file obj is encountered, try to resolve each unresolved reference in the list against the symbols in obj. • If there exist any entries in the unresolved list at end of scan, then error. • Problem: • Command line order matters! • Moral: put libraries at the end of the command line. bass> gcc -L. libtest.o –lmine.a bass> gcc -L. –lmine.a libtest.o libtest.o: In function `main': libtest.o(.text+0x4): undefined reference to `libfun'

  36. Executable Object File • After the linking with static libraries • The input C program (in ASCII text) file has been transformed into a single binary file that contains all of the information needed to load the program into memory and run. • The format of an executable object file • Similar to the format of a relocatable object file, except the followings • ELF header includes program’s entry point, which is the address of the 1st instruction • .init section defines a small function called _init that will be called by the program’s initialization code • No relocation information • Segment header table describes • Mapping between the segments in the executable object files and the memory segments in the virtual address space • Read/write/executable permission (alignment information between segments)

  37. Executable Object File 0 ELF header Segment header table Read-only memory segment (code segment) .init .text .rodata .data Read/write memory segment (data segment) .bss .symtab .debug Symbol table and debugging info are not loaded into memory .line .strtab Describes object file sections Section header table

  38. Loading Executable Binaries Executable object file for example program p 0 ELF header Process image Segment header table (required for executables) init and shared lib segments .text section .data section .text segment (r/o) .bss section .symtab .debug .data segment (initialized r/w) .line .strtab Section header table (required for relocatables) .bss segment (uninitialized r/w)

  39. Memory invisible to user code Linux Run-time Memory Image Kernel virtual memory 0xc0000000 User stack (created at runtime) %esp (stack pointer) Memory-mapped region for shared libraries 0x40000000 brk Run-time heap (created by malloc) Read/write segment (.data, .bss) Loaded from the executable file Read-only segment (.init, .text, .rodata) 0x08048000 Unused 0

  40. Startup Routines for C program • When the loader (execve) runs • It creates the memory image by copying chunks of the executable object files into the code and data segments (guided by the segment header table) • The loader jumps to the program’s entry point, • Which is the address of the _start symbol. • The startup code at the _start address is defined in the object file crt1.o • 0x080480c0 <_start>: /* entry point */ • call __libc_init_first /* startup code in .text */ • call _init /* startup code in .init */ • Initialization routines from .text and .init sections • call atexit /* startup code in .text */ • Appends a list of routines that should be called when the application calls the exit function • call main /* application code */ • exit function runs the functions registered by atexit and returns control to OS by calling _exit • call _exit /* return control to OS */

  41. Shared Libraries • Static libraries have the following disadvantages: • Potential for duplicating lots of common code in the executable files on a file system. • Every C program needs the standard C library • Potential for duplicating lots of code in the text segment of each process. • Minor bug fixes of system libraries require each application to explicitly relink. • Solution: Shared libraries • Whose members are dynamically loaded and linked at run-time. • Called shared objects (.so) in UNIX or dynamic link libraries (DLL) in Windows • Shared library routines can be shared by multiple processes. • There is exactly one .so file for a particular library in any given file system. • The code and data in this .so file are shared by all the executable object files • A single copy of the .text section of a shared library in memory can be shared by multiple processes • Dynamic linking can occur when executable is first loaded and run. • Common case for Linux, handled automatically by ld-linux.so. • Dynamic linking can also occur after program has begun. • An application can request the dynamic linker to load and link shared libraries. • In Linux, this is done explicitly by user with dlopen().

  42. Dynamic Linking with Shared Library main2.c vector.h Translators (cpp, cc1, as) libc.so libvector.so Relocatable object file Relocation and symbol table info main2.o Linker (ld) Partially linked executable object file (on disk) p2 Loader (execve) libc.so libvector.so Code and data Fully linked executable in memory Dynamic linker (ld-linux.so) gcc –shared –fPIC –o libvector.so addvec.c multvec.c gcc –o p2 main2.c ./libvector.so loader loads and runs the dynamic linker, and then passes control to the application

  43. The Complete Picture m.c a.c Translator Translator m.o a.o libwhatever.a Static Linker (ld) p libc.so libm.so Loader/Dynamic Linker (ld-linux.so) p’

  44. Homework 3 • Read Chapter 3 (from Computer Organization and Design Textbook) • Exercise (from Computer Systems Textbook) • 7.6 • 7.9 • 7.12 • 7.13 • 7.14

More Related