1 / 28

Compiling C Programs

COMP 40: Machine Structure and Assembly Language Programming (Spring 2014). Compiling C Programs. Noah Mendelsohn Tufts University Email: noah@cs.tufts.edu Web: http:// www.cs.tufts.edu/~noah. How do we get from source to executable program?. Executable files. Executable file:

johnna
Download Presentation

Compiling C Programs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. COMP 40: Machine Structure and Assembly Language Programming (Spring 2014) Compiling C Programs Noah Mendelsohn Tufts UniversityEmail: noah@cs.tufts.edu Web: http://www.cs.tufts.edu/~noah

  2. How do we get from source to executable program?

  3. Executable files • Executable file: • A single file with all code ready to run at a fixed address in memory • Typically the same address for all programs • Requirements • Code divided into multiple source files (.c files and .h files) • Functions in shared .c files need to show up in lots of executables • Often we want to share only the compiled versions (.o files) [you don’t have the source for printf() but you use it all the time] • The challenge • In different executables using the same shared code… • … the same functions and global variables may wind up at different addresses … • … but we still need to make references work across source files

  4. Resolving external references #include <stdio.h>int main(intargc, char *argv[]) { printf(“The sum is %d\n”,sum(1,2)); } int sum(int a, int b) { return a+b; } arith.c two_plus_one.c How do we know where sum() wound up? code for sum() call to sum(1,2) two_plus_one (executable)

  5. From source code to executable (simplified) #include <stdio.h>int main(intargc, char *argv[]) { printf(“The sum is %d\n”,sum(1,2)); } int sum(int a, int b) { return a+b; } arith.c two_plus_one.c gcc –c arith.c gcc –c two_plus_one.c Relocateable object code for sum() arith.o Relocateable object code for main() two_plus_one.o

  6. Relocatable .o files • Contain machine code • References within the file are resolved • References to external files not resolved • Some address fields may need adjusting later depending on final location in executable program • Includes lists of: 1) Names and addresses of defined externals2) Names and referents of things needing relocation From source code to executable (simplified) #include <stdio.h>int main(intargc, char *argv[]) { printf(“The sum is %d\n” sum(1,2)); } int sum(int a, int b) { return a+b; } arith.c two_plus_one.c gcc –c arith.c gcc –c two_plus_one.c Relocateable object code for sum() arith.o Relocateable object code for main() two_plus_one.o

  7. Linking .o files to create executable Relocateable object code for sum() two_plus_one.o Relocateable object code for sum() arith.o gcc –o two_plus_onetwo_plus_one.oarith.o Executable Program two_plus_one

  8. gcc actually runs a program named “ld” to create the executable. Linking .o files to create executable Relocateable object code for sum() two_plus_one.o Relocateable object code for sum() arith.o gcc –o two_plus_onetwo_plus_one.oarith.o Executable Program two_plus_one

  9. To create executable: Code from all .o files collected in one executable Fixed load address assumed All references resolved – code & vars updated Linking .o files to create executable Relocateable object code for sum() two_plus_one.o Relocateable object code for sum() arith.o gcc –o two_plus_onetwo_plus_one.oarith.o Executable Program two_plus_one

  10. The executable contains all the code, with references resolved, loadable at a fixed addr. It is ready to be invoked using the exec_() family of system calls or from the command line [which uses exec()]. Linking .o files to create executable Relocateable object code for sum() two_plus_one.o Relocateable object code for sum() arith.o gcc –o two_plus one two_plus_one.oarith.o Executable Program two_plus_one

  11. Linking .o files to create executable The default name for an executable is a.outso programmers sometimes informally refer to any executable as an “a.out”. Relocateable object code for sum() two_plus_one.o Relocateable object code for sum() arith.o gcc –o two_plus_onetwo_plus_one.oarith.o Executable Program two_plus_one

  12. We left out two important steps!

  13. Before the compiler even sees the code… …the preprocessor rewrites the code handling all #define, #include, #ifdef and macro substitution… These are gone before the compiler sees the code Preprocessor #include <stdio.h> #define TWO 2int main(intargc, char *argv[]) { printf(“The sum is %d\n”, sum(1,TWO)); }

  14. We also left out the assembler step • The object code in a .o is binary (not human-readable) • Assembly language is a human-reable form of machine code • Symbolic names for machine instructions • Symbolic labels for addresses (like variables and branch targets in code) • Etc. • When you run gcc –c it actually does three steps: • Run the preprocessor • Run the compiler itself to create an assembler file • Run the assembler to create a .o • Normally, we do these steps together, but you can use switches to run them separately

  15. Common invocations of gcc gcc –c two_plus_two.c Runs preprocessor, compiler & assembler to make two_plus_two.o gcc –c arith.c Same: makes arith.o gcc –o two_plus_twotwo_plus_two.oarith.o Use ld to link .o files + system libraries to make two_plus_twoexecutale gcc –E two_plus_two.c Runs just preprocessor gcc –S two_plus_two.c Runs just preprocessor & compiler, produces assembler in .s file gcc –c two_plus_two.s Notices .s extension, runs assembler

  16. Putting it All Together

  17. Compiling a program #include <stdio.h>int main(intargc, char *argv[]) { printf(“The sum is %d\n” sum(1,2)); } Preprocessor(cpp) AssemblerSource Preprocessedsource Compiler (cpp) Assembler(as) .o file Two_plus_two(executable) Loader(ld) Preprocessor(cpp) AssemblerSource Preprocessedsource Compiler (cpp) Assembler(as) .o file int sum(int a, int b) { return a+b; }

  18. Shared Libraries(not required for COMP 40)(these slides on shared libraries were used in COMP 111…you may find them interesting to read)

  19. Routines like printf live in libraries. Ooops! Where does printf come from? Relocateable object code for sum() two_plus_one.o Relocateable object code for sum() arith.o gcc –o two_plus one two_plus_one.oarith.olibc.a Executable Program two_plus_one

  20. Routines like printf live in libraries.These are created with the “ar” command, which packages up several .o files together into a “.a” archive or library. You can list the .a along with your separate .o files and ld will pull from it any .o files it needs. Ooops! Where does printf come from? Relocateable object code for sum() two_plus_one.o Relocateable object code for sum() arith.o gcc –o two_plus one two_plus_one.oarith.o Executable Program two_plus_one

  21. Routines like printf live in libraries.These are created with the “ar” command, which packages up several .o files together into a “.a” archive or library. You can list the .a along with your separate .o files and ld will pull from it any .o files it needs. printf used to live in the system library named libc.a, which the compiler links automatically into the executable (so you don’t have to list it). Ooops! Where does printf come from? Relocateable object code for sum() two_plus_one.o Relocateable object code for sum() arith.o gcc –o two_plus one two_plus_one.oarith.o Executable Program two_plus_one

  22. Why shared libraries? • Problem: if printf is linked from the libc.a, then we get a separate copy in each program that uses printf • Idea: what if we could have one copy and use memory mapping to put it into every executable that needs it? • Challenges: • We can’t link it when ld builds the rest of the executable: we can just note we need it • The same copy is likely to be mapped at different addresses in different programs

  23. Why shared libraries? We’ll use printf as an example even though it’s built in to the system… Compile the source with –fPIC to make a position-independent .o file. • Problem: if printf is linked from the libc.a, then we get a separate copy in each program that uses printf • Idea: what if we could have one copy and use memory mapping to put it into every executable that needs it? • Challenges: • We can’t link it when ld builds the rest of the executable: we can just note we need it • The same copy is likely to be mapped at different addresses in different programs • Solution: compiler, linker and OS work together to support shared libraries • gcc –fPICprintf.c generates “position-independent code” that can load at any address • gcc –shared –o libc.so printf.oxxx.o obj3.o  creates shared library • gcc –o two_plus_onetwo_plus_one.oarith.o libc.so

  24. Why shared libraries? Link that printf.o and any other files with the –shared option to create a shared library (.so) file. • Problem: if printf is linked from the libc.a, then we get a separate copy in each program that uses printf • Idea: what if we could have one copy and use memory mapping to put it into every executable that needs it? • Challenges: • We can’t link it when ld builds the rest of the executable: we can just note we need it • The same copy is likely to be mapped at different addresses in different programs • Solution: compiler, linker and OS work together to support shared libraries • gcc –fPICprintf.c generates “position-independent code” that can load at any address • gcc –shared –o libc.so printf.oxxx.o obj3.o  creates shared library • gcc –o two_plus_onetwo_plus_one.oarith.o libc.so

  25. The linker recognizes .so files…instead of including the code, it leaves a little stub that tells the OS to find and map the shared copy of the .so file when exec loads the program. (Actually, libc.so is so widely used that it’s automatically linked, so you don’t need to list it as you would your own .so libraries). Why shared libraries? • Problem: if printf is linked from the libc.a, then we get a separate copy in each program that uses printf • Idea: what if we could have one copy and use memory mapping to put it into every executable that needs it? • Challenges: • We can’t link it when ld builds the rest of the executable: we can just note we need it • The same copy is likely to be mapped at different addresses in different programs • Solution: compiler, linker and OS work together to support shared libraries • gcc –fPICprintf.c generates “position-independent code” that can load at any address • gcc –shared –o libc.so printf.oxxx.o obj3.o  creates shared library • gcc –o two_plus_onetwo_plus_one.oarith.olibc.so

  26. Memory mapping allows sharing of .so libraries argv, environ argv, environ Stack (Angry Birds Call Stack) Stack (Browser Call Stack) libc.so (with printf code) shows up at different locations in the two programs MAIN MEMORY Angry Birds Heap (malloc’d) OPERATING SYSTEM Heap (malloc’d) libc.so CPU Angry Birds ??? libc.so Static uninitialized (Angry Birds Data) Static uninitialized (Browser Data) Static initialized (Angry Birds Data) Static initialized (Browser Data) Play Video Browser Text(Angry Birds code) Text(Browser code)

  27. Memory mapping allows sharing of .so libraries argv, environ argv, environ Stack (Angry Birds Call Stack) Stack (Angry Birds Call Stack) Only one copy lives in memory… everyone shares it! MAIN MEMORY Angry Birds Heap (malloc’d) OPERATING SYSTEM Heap (malloc’d) libc.so CPU Angry Birds ??? libc.so Static uninitialized (Angry Birds Data) Static uninitialized (Browser Data) libc.so Static initialized (Angry Birds Data) Static initialized (Browser Data) Play Video Browser Text(Angry Birds code) Text(Browser code)

  28. Memory mapping allows sharing of .so libraries argv, environ argv, environ Stack (Angry Birds Call Stack) Stack (Angry Birds Call Stack) Memory mapping hardware can do this… Code must be position-independent! MAIN MEMORY Angry Birds Heap (malloc’d) OPERATING SYSTEM Heap (malloc’d) libc.so CPU Angry Birds ??? libc.so Static uninitialized (Angry Birds Data) Static uninitialized (Browser Data) libc.so Static initialized (Angry Birds Data) Static initialized (Browser Data) Play Video Browser Text(Angry Birds code) Text(Browser code)

More Related