generating programs and linking n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Generating Programs and Linking PowerPoint Presentation
Download Presentation
Generating Programs and Linking

Loading in 2 Seconds...

play fullscreen
1 / 24

Generating Programs and Linking - PowerPoint PPT Presentation


  • 56 Views
  • Uploaded on

Generating Programs and Linking. Professor Rick Han Department of Computer Science University of Colorado at Boulder. CSCI 3753 Announcements. Moodle - posted last Thursday’s lecture Programming shell assignment 0 due Thursday at 11:55 pm, not 11 am Introduction to Operating Systems

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Generating Programs and Linking' - van


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
generating programs and linking

Generating Programs and Linking

Professor Rick Han

Department of Computer Science

University of Colorado at Boulder

csci 3753 announcements
CSCI 3753 Announcements
  • Moodle - posted last Thursday’s lecture
  • Programming shell assignment 0 due Thursday at 11:55 pm, not 11 am
  • Introduction to Operating Systems
  • Read Chapters 3 and 4 in the textbook
operating system architecture

Posix, Win32,

Java, C libraryAPI

System call API

Device

Manager

Operating System Architecture

App2

App1

App3

System Libraries and Tools

(Compilers, Shells, GUIs)

OS

“Kernel”

Scheduler

VM

File

System

CPU

Memory

Disk

Display

Mouse

I/O

what is an application
What is an Application?

Program P1

  • A software program consist of a sequence of code instructions and data
    • for now, let a simple app = a program
  • Computer executes the instructions line by line
    • code instructions operate on data

Code

Data

loading and executing a program

OS Loader

Main

Memory

Disk

Fetch Code

and Data

CPU

P1

binary

P2

binary

Program

P1

binary

Program

Counter (PC)

Code

Code

Code

Registers

ALU

Data

Data

Data

Write Data

Loading and Executing a Program
loading and executing a program1

Machine Code instructions

of binary executable

Disk

P1

binary

P2

binary

shift left by 2 register R1

and put in address A

Code

Code

Code

invoke low level system

call n to OS: syscall n

jump to address B

Data

Data

Data

Loading and Executing a Program

OS Loader

Main

Memory

Program

P1

binary

generating a program s binary executable

gcc can generate

any of these stages

Code

Generating a Program’s Binary Executable
  • We program source code in a high-level language like C or Java, and use tools like compilers to create a program’s binary executable

Program P1’s

Binary

Executable

file P1.c

Source

Code

Compiler

Assembler

Linker

P1.s

P1.o

Data

technically, there is a preprocessing step before the compiler.

“gcc -c” will generate relocatable object files, and not run linker

linking multiple object files into an executable

Code

Linking Multiple Object Files Into an Executable

P1 or P1.exe

file P1.c

  • linker combines multiple .o object files into one binary executable file
    • why split a program into multiple objects and then relink them?
    • breaking up a program into multiple files, and compiling them separately, reduces amount of recompilation if a single file is edited
      • don’t have to recompile entire program, just the object file of the changed source file, then relink object files

foo2.o

Source

Code

Compiler

cc1

Assembler

as

Linker

ld

P1.s

P1.o

Data

foo3.o

linking multiple object files into an executable1

Code

Linking Multiple Object Files Into an Executable

P1 or P1.exe

file P1.c

  • in combining multiple object files, the linker must
    • resolve references to variables and functions defined in other object files - this is called symbol resolution
    • relocate each object’s internal addresses so that the executable’s combination of objects is consistent in its memory references
      • an object’s code and data are compiled in its own private world to start at address zero

foo2.o

Source

Code

Compiler

cc1

Assembler

as

Linker

ld

P1.s

P1.o

Data

foo3.o

linker resolves unknown symbols

extern void f1(...);

extern int globalvar1;

P1.o

foo2.o

the P1.o object file will contain a list of

unknown symbols, e.g. f1, in a symbol table

foo2.o’s symbol table lists

unknown symbols, e.g. globalvar1

Linker Resolves Unknown Symbols

P1.c

int globalvar1=0;

main(...) {

-----

f1(...)

-----

}

foo2.c

void f1(...) {

----

}

void f2(...) {

----

globalvar1 = 4;

----

}

linker resolves unknown symbols1
Linker Resolves Unknown Symbols

ELF relocatable object file

  • ELF relocatable object file contains following sections:
    • ELF header (type, size, size/# sections)
    • code (.text)
    • data (.data, .bss, .rodata)
      • .data = initialized global variables
      • .bss = uninitialized global variables (does not actually occupy space on disk, just a placeholder)
    • symbol table (.symtab)
    • relocation info (.rel.text, .rel.data)
    • debug symbol table (.debug only if “-g” compile flag used)
    • line info (map C & .text line #s only if “-g”)
    • string table (for symbol tables)

ELF header

.text

.rodata

.data

.bss

.symtab

.rel.text

.rel.data

.debug

.line

.strtab

Section header table

linker resolves unknown symbols2
Linker Resolves Unknown Symbols
  • Symbol table contains 3 types of symbols:
    • global symbols - defined in this object
    • global symbols referenced but not defined here
    • local symbols defined and referenced exclusively by this object, e.g. static global variables and functions
      • local symbols are not equivalent to local variables, which get allocated on the stack at run time
linker resolves unknown symbols3
Linker Resolves Unknown Symbols

global symbol referenced here

but defined elsewhere

  • The symbol table informs the Linker where symbols referenced or referenceable by each object file can be found:
    • if another file references globalvar1, then look here for info
    • if this file reference f2, then another object file’s symbol table will mention f2

extern float f1();

int globalvar1=0;

void f2(...) {

static int x=-1;

-----

}

global symbols defined here

“local” symbol

linker resolves unknown symbols4
Linker Resolves Unknown Symbols
  • Each entry in the ELF symbol table looks like:

typedef struct {

int name; /* string table offset */

int value; /* section offset or VM address */

int size; /* object size in bytes */

char type:4, /* data, func, section or src file name (4 bits) */

binding:4;/* local or global (4 bits) */

char reserved; /* unused */

char section; /* section header index, ABS, UNDEF, */

} ELF_Symbol;

here’s where we flag the undefined status

linker resolves unknown symbols5

P1.o relocatableobject file

P2.o

P3.o

Code

Code

Code

Data

Data

Data

defined

in P2?

defined in

P3?

No

function f1() in P1.o

is referenced but

not defined, hence

unknown

.symtab

.symtab

.symtab

Yes

Linker Resolves Unknown Symbols
  • During linking, the linker goes through each input object file and determines if unknown symbols are defined in other object files

Linker

linker resolves unknown symbols6
Linker Resolves Unknown Symbols
  • What if two object files use the same name for a global variable?
    • Linker resolves multiply defined global symbols
    • functions and initialized global variables are defined as strong symbols, while uninitialized global variables are weak symbols

Rule 1: multiple strong symbols are not allowed

Rule 2: choose the strong symbol over the weak symbol

Rule 3: given multiple weak symbols, choose any one

linker resolves unknown symbols7
Linker Resolves Unknown Symbols
  • Linking with static libraries
    • Bundle together many related .o files together into a single file called a library or .a file
      • e.g. the C library libc.a contains printf(), strcpy(), random(), atoi(), etc.
      • library is created using the archive ar tool
    • the library is input to the linker as one file
    • linker can accept multiple libraries
    • linker copies only those object modules in the library that are referenced by the application program
    • Example: gcc main.c /usr/lib/libm.a /usr/lib/libc.a
linker resolves unknown symbols8
Linker Resolves Unknown Symbols

libfoo.a

  • a static library is a collection of relocatable object modules
    • group together related object modules
    • within each object, can further group related functions
    • if an application links to libfoo.a, and only calls a function in foo3.o, then only foo3.o will be linked into the program

foo1.o

foo2.o

foo3.o

foo4.o

linker resolves unknown symbols9
Linker Resolves Unknown Symbols
  • Linker scans object files and libraries sequentiallyleft to right on command line to resolve unknown symbols
    • for each input file on command line, linker
      • updates a list of defined symbols with object’s defined symbols
      • tries to resolve the undefined symbols (from object and from list of previously undefined symbols) with the list of previously defined symbols
      • carries over the list of defined and undefined symbols to next input object file
    • so linker looks for undefined symbols only after they’re undefined!
      • it doesn’t go back over the entire set of input files to resolve the unknown symbol
      • if an unknown symbol becomes referenced after it was defined, then linker won’t be able to resolve the symbol!
      • Thus, order on the command line is important - put libraries last!
linker resolves unknown symbols10
Linker Resolves Unknown Symbols
  • Example: gcc libfoo.a main.c
    • main.c calls a function f1 defined in libfoo.a
    • scanning left to right, when linker hits libfoo.a, there are no unresolved symbols, so no object modules are copied
    • when linker hits main.c, f1 is unresolved and gets added to unresolved list
    • Since there are no more input files, the linker stops and generates a linking error:

/tmp/something.o: In function ‘main’:

/tmp/something.o: undefined reference to ‘f1’

linker resolves unknown symbols11
Linker Resolves Unknown Symbols
  • Example: gcc main.c libfoo.a
    • main.c calls a function f1 defined in libfoo.a
    • scanning left to right, when linker hits main.c, it will add f1 to the list of unresolved references
    • when linker next hits libfoo.a, it will look for f1 in the library’s object modules, see that it is found, and add the object module to the linked program
    • No errors are generated. A binary executable is generated.
  • Lesson #1: the order of linking can be important, so put libraries at the end of command lines
  • Lesson #2: an undefined symbol error can also mean that you
    • didn’t link in the right libraries, didn’t add right library path
    • forgot to define the symbol somewhere in your code
linker relocates addresses
Linker Relocates Addresses
  • After resolving symbols, the linker relocates addresses when combining the different object modules
    • merges separate code .text sections into a single .text section
    • merges separate .data sections into a single .data section
    • each section is assigned a memory address
    • then each symbol reference in the code and data sections is reassigned to the correct memory address
      • looks at .relo.text and .relo.data to find relocation entries of references that needed address translation
    • these are virtual memory addresses that are translated at load time into real run-time memory addresses
linked elf executable object file
Linked ELF Executable Object File

ELF executable object file

  • ELF executable object file contains following sections:
    • ELF header (type, size, size/# sections)
    • segment header table
    • .init (program’s entry point, i.e. address of first instruction)
    • other sections similar
    • Note the absence of .rel.tex and .rel.data - they’ve been relocated!
  • Ready to be loaded into memory and run
    • only sections through .bss are loaded into memory
    • .symtab and below are not loaded into memory
    • code section is read-only
    • .data and .bss are read/write

ELF header

segment header table

.init

.text

.rodata

.data

.bss

.symtab

.debug

.line

.strtab

Section header table

loading executable object files
Loading Executable Object Files

Run-time memory

  • Run-time memory image
  • Essentially code, data, stack, and heap
  • Code and data loaded from executable file
  • Stack grows downward, heap grows upward

User stack

Unallocated

Heap

Read/write .data, .bss

Read-only .init, .text, .rodata