1 / 39

Linker and Loader

Linker and Loader. Program building into four stages (C Program). Preprocessing (Preprocessor) It processes include files, conditional compilation instructions and macros. Command: $cpp hello.c hello.i hello.c is source program, hello.i is ASCII intermediate code Compiling (Compiler)

tamar
Download Presentation

Linker and Loader

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Linker and Loader

  2. Program building into four stages (C Program) • Preprocessing (Preprocessor) • It processes include files, conditional compilation instructions and macros. • Command: $cpp hello.c hello.i • hello.c is source program, hello.i is ASCII intermediate code • Compiling (Compiler) • It takes the output of the preprocessor and generates assembler source code • Command: $cc hello.i –o hello.s • Assembly (Assembler) • It takes the assembly source code and produces an assembly listing with offsets. • The assembler output is stored in an object file. • Command:$as –o hello.o hello.s • Linking (Linker) • It takes one or more object files or libraries as input and combines them to produce a single (usually executable) file. • Linux command for linker is ld

  3. Example C Program main.c swap.c int buf[2] = {1, 2}; int main() { swap(); return 0; } extern intbuf[]; int*bufp0 = &buf[0]; static int *bufp1; void swap() { int temp; bufp1 = &buf[1]; temp = *bufp0; *bufp0 = *bufp1; *bufp1 = temp; }

  4. Static Linking • Programs are translated and linked using a compiler driver: • unix>gcc -O2 -g –o p main.c swap.c • unix>./p main.c swap.c Source files Translators (cpp, cc1, as) Translators (cpp, cc1, as) Separately compiled relocatable object files main.o swap.o Linker (ld) Fully linked executable object file (contains code and data for all functions defined in main.c and swap.c) p

  5. Static Linking • Unix ld program takes as input a collection of relocatable object files and command line arguments and generate as output a fully linked executable object file that can be loaded. • Relocatable object file consist of code and data sections. • Code section contains read-only instruction binary • Data section contains initialized global variables and uninitialized global variables

  6. Three Kinds of Object Files (Modules) • Relocatable object file (.o file) • Contains code and data in a form that can be combined with other relocatable object files to form executable object file. • Each .o file is produced from exactly one source (.c) file • Executable object file (a.out file) • Contains code and data in a form that can be copied directly into memory and then executed. • Shared object file (.so file) • Special type of relocatable object file that can be loaded into memory and linked dynamically, at either load time or run-time. • Called Dynamic Link Libraries (DLLs) by Windows

  7. Executable and Linkable Format (ELF) • Standard binary format for object files • Originally proposed by AT&T System V Unix • Later adopted by BSD Unix variants and Linux • One unified format for • Relocatable object files (.o), • Executable object files (a.out) • Shared object files (.so) • Generic name: ELF binaries

  8. ELF Object File Format • Elf header • Word size, byte ordering, file type (.o, exec, .so), machine type, etc. • Segment header table • Page size, virtual addresses memory segments (sections), segment sizes. • .text section • Code • .rodatasection • Read only data: jump tables, ... • .data section • Initialized global variables • .bss section • Uninitialized global variables • “Block Started by Symbol” • “Better Save Space” 0 ELF header Segment header table (required for executables) .text section .rodatasection .data section .bss section .symtabsection .rel.txtsection .rel.datasection .debug section Section header table

  9. ELF Object File Format (cont.) • .symtab section • Symbol table • Procedure and static variable names • Section names and locations • .rel.text section • Relocation info for .text section • Addresses of instructions that will need to be modified in the executable • Instructions for modifying. • .rel.data section • Relocation info for .data section • Addresses of pointer data that will need to be modified in the merged executable • .debug section • Info for symbolic debugging (gcc -g) • Section header table • Offsets and sizes of each section 0 ELF header Segment header table (required for executables) .text section .rodatasection .data section .bss section .symtabsection .rel.txtsection .rel.datasection .debug section Section header table

  10. Building Executable involves two tasks • Linker performs two main tasks to build executable • Symbol Resolution • Object file define and reference symbols. • Symbol resolution associates each symbol reference to its definition. • Relocation • Compiler and Assembler generate code and data sections that start at address zero. • Linker relocates these sections by associating a memory location with each symbol definition.

  11. Linker Symbols • Global symbols • Symbols defined by module m that can be referenced by other modules. • E.g.: non-static C functions and non-static global variables. • External symbols • Global symbols that are referenced by module m but defined by some other module. • Local symbols • Symbols that are defined and referenced exclusively by module m. • E.g.: C functions and variables defined with the static attribute. • Local linker symbols are not local program variables

  12. Resolving Symbols Global External Local Global int buf[2] = {1, 2}; int main() { swap(); return 0; } extern int buf[]; Int *bufp0 = &buf[0]; static int *bufp1; void swap() { int temp; bufp1 = &buf[1]; temp = *bufp0; *bufp0 = *bufp1; *bufp1 = temp; } Global main.c External Linker knows nothing of temp swap.c

  13. Relocating Code and Data Relocatable Object Files Executable Object File System code .text 0 Headers .data System data System code main() .text main.o swap() main() .text More system code .data intbuf[2]={1,2} System data .data swap.o int buf[2]={1,2} int *bufp0=&buf[0] swap() .text .bss int *bufp1 .symtab .debug .data int *bufp0=&buf[0] .bss static int *bufp1 Even though private to swap, requires allocation in .bss

  14. Symbol Table • It is built by the assembler using the symbols exported by the compiler in assembly language .s file. • Assembler converts the .s file into .o relocatable obj file. • This .o file is in ELF format which contain a section called .symtab. • Each entry in .symtab is as below

  15. Example Symbol Table Entries

  16. Symbol Resolution • Linker associates each symbol reference to its definition • Symbol resolution is pretty straightforward for referencing local symbols that are defined in the same module. • When assembler finds a symbol that is defined in the current module, it generates an entry in the symbol table deferring resolution by linker. • If linker even does not fine its definition then it printa an error message.

  17. Example;

  18. Symbol Resolution for multiply defined global symbols • If same symbol is defined in multiple object files, then linker prints an error. • There are different cases and linker handles them based on type of symbol (strong/weak) and rules to deal with multiply defined symbols. • Compiler exports each symbol as strong/weak to the assembler

  19. Strong and Weak Symbols • Strong Symbols • Functions and initialized global variables are Strong Symbols. • Weak Symbols • Uninitialized global variables are weak symbols

  20. Linker’s Symbol Rules • Rule 1: Multiple strong symbols are not allowed • Each item can be defined only once • Otherwise: Linker error • Rule 2: Given a strong symbol and multiple weak symbol, choose the strong symbol • References to the weak symbol resolve to the strong symbol • Rule 3: If there are multiple weak symbols, pick an arbitrary one • Can override this with gcc –fno-common

  21. Example

  22. Linker Puzzles int x; p1() {} p1() {} Link time error: two strong symbols (p1) References to x will refer to the same uninitialized int. Is this what you really want? int x; p1() {} int x; p2() {} int x; int y; p1() {} double x; p2() {} Writes to x in p2 might overwrite y! Evil! int x=7; int y=5; p1() {} double x; p2() {} Writes to x in p2will overwrite y! Nasty! References to x will refer to the same initialized variable. int x=7; p1() {} int x; p2() {} Nightmare scenario: two identical weak structs, compiled by different compilers with different alignment rules.

  23. Linking with Static Libraries • Linker reads the relocatable object files, links them and generates executable object file. • Static Library is a collection of common object modules. • These libraries are supplied as a input to linker. • Linker copies only the referenced object modules and link them. • Common functions such as atoi(),printf(),scanf(),strcpy etc are available in libc.a. • Functions such as sin,cos,sqrt etc are available in libm.a.

  24. Static Library Linking: Different Approaches • Approach 1: Put all functions into a single source file • Programmers link big object file into their programs • Space and time inefficient • Approach 2: Put each function in a separate source file • Programmers explicitly link appropriate binaries into their programs • More efficient, but burdensome on the programmer

  25. Solution: Static Libraries • Static libraries (.a archive files) • Concatenate related relocatable object files into a single file with an header that describes the size and location of each member object module(called an archive). • Enhance linker so that it tries to resolve unresolved external references by looking for the symbols in one or more archives. • Linker will copy the object modules that are referenced by the program, which reduces size of the executable on the dosk and in memory. • If an archive member file resolves reference, link it into the executable.

  26. Creating Static Libraries atoi.c printf.c random.c ... Translator Translator Translator atoi.o printf.o random.o unix>arrslibc.a \ atoi.oprintf.o … random.o Archiver (ar) C standard library libc.a • Archiver allows incremental updates • Recompile function that changes and replace .o file in archive.

  27. Commonly Used Libraries libc.a (the C standard library) • 8 MB archive of 1392 object files. • I/O, memory allocation, signal handling, string handling, data and time, random numbers, integer math libm.a (the C math library) • 1 MB archive of 401 object files. • floating point math (sin, cos, tan, log, exp, sqrt, …)

  28. Symbol Resolution with Static Libraries • Programmer enters the .o files and .a files on the command line for linking $gcc main.c /usr/lib/libc.a /usr/lib/libm.a • Linker scans it from left to right in sequential order to resolve the references • Linker maintains three sets • E : Set of relocatable object files that will be merged to form an executable • U: set of unresolved references • D: Set of symbols that are defined in previous input files.

  29. Steps • Linker determines whether the input file is object file or an archive • If f is a object file, linker adds it to E, updates U and D to reflect the symbol definition and references in f and proceed to the next input file • If the input file f is an archive, linker attempts to match the unresolved symbols in U against the symbol defined in archive. • If there is any unresolved symbol in U, linker prints an error. • If .a files are independent, then write in any order • But order is imp if archive files are dependent

  30. Relocation • After symbol resolution, linker has to do relocation where it has to merge the input modules and assigns run-time addresses to each symbol. • This is done in two steps • Relocation sections and symbol definition • Here linker merges all sections of the same type into a new aggregate section of the same type. (.dat or .text) • Linker assigns run-time memory address to each new section, to each symbol • Relocation symbol references within sections • Using relocation entries • Linker decides run-time addresses of each symbol here

  31. ELF executable object file

  32. Details of Segment header table

  33. Loading Executing Object Files • $./a.out or $./exe • Invokes execve function or system call to invoke loader • Loader copies the executable obj file from disk to memory and runs it from its entry point.

  34. Shared Libraries • Static libraries have the following disadvantages: • Duplication in the stored executables (every function need std libc) • Duplication in the running executables • Minor bug fixes of system libraries require each application to explicitly relink • Modern solution: Shared Libraries • Object files that contain code and data that are loaded and linked into an application dynamically, at either load-time or run-time • Also called: dynamic link libraries, DLLs, .so files

More Related