About C compilation process and the linking of libraries - c

Are C libraries linked with object code or first with source code so only later with object code? I mean, look at the image found at Cardiff School of Computer Science & Informatics's website
:
It's "strange" that after generating object-code the libraries are being linked. I mean, we use the source code while putting the includes!
So.. How this actually works? Thanks!

That diagram is correct.
When you #include a header, it essentially copies that header into your file. A header is a list of types and function declarations and constants, etc., but doesn't contain any actual code (C++ and inline functions notwithstanding).
Let's have an example: library.h
int foo(int num);
library.c
int foo(int num)
{
return num * 2;
}
yourcode.c
#include <stdio.h>
#include "library.h"
int main(void)
{
printf("%d\n", foo(100));
return 0;
}
When you #include library.h, you get the declaration of foo(). The compiler, at this point, knows nothing else about foo() or what it does. The compiler can freely insert calls to foo() despite this. The linker, seeing a call to foo() in youcode.c, and seeing the code in library.c, knows that any calls to foo() should go to that code.
In other words, the compiler tells the linker what function to call, and the linker (given all the object code) knows where that function actually is.
(Thanks to Fiddling Bits for corrections.)

Includes from libraries normally contain only library interface - so in the simplest case the .h file provided with the library contains function declaration, and the compiled function is in the library file. So you compile the sources with provided library functions declarations from library headers, and then linker adds the compiler library functions to your executable.

It might be instructive to look at what each piece in the tool-chain does, so using the boxes in your image.
pre-processor
This is really a text-editor doing a bunch of substitutions (ok, really really oversimplified). Some of the things that the pre-processor does is:
performs simple textual based substitution on #defines. So if we have #define PI 3.1415 in our file and then later on we have a line such as angle = angle * PI / 180; the pre=processor will convert this line into angle = angle * 3.1414 / 180;
anytime we encounter an #include, we can imagine that the pre-processor goes and gets the entire contents of that file and pastes the contents on the file on to where the #include is. (and then we go back and perform the substitutions.
we can also pass options to the compiler with the #pragma directive.
Finally, we can see the results of running the pre-processor by using the -E option to gcc.
compiler
The output of the pre-processor is still text, and it not contains everything that the compiler needs to be able to process the file. Now the compiler does a lot of things (and I normally break the box up when I describe this process). The compiler will process the text, do a lexical analysis of it, pass it to the parser that verifies that the program satisfies the grammar of the language, output an intermediate representation of the language, perform optimization and produce assembly code.
We can see the results of running up to the assembler by using the -s option to gcc.
assembler
The output of the compiler is an assembly listing, which is then passed to an assembler (most commonly `gas' (GNU assembler) on Linux), that converts the assembly code into machine code. In addition, on task of the assembler is to build a list of undefined referenced (i.e. a library function of a function that you wrote that is implemented in another source file.)
We can see the results of getting the output of the assembler by using the -c option to gcc.
linker
The input to the linker will be the output from the assembler (typically called object files and use an extention 'o'), as well as various libraries. Conceptually, the linker is responsible for hooking everything together, including fixing up the calls to functions that are found in libraries. Normally, the program that performs the linking in Linux is ld, and we can see the results of linking just by running gcc without any special command line options.
I have simplified the discussion of the linker, I hope I gave you a flavor of what the linker does.
The only issue that I have with the image you referenced, is that I would have move the phase "Object Code" to immediately below the assembler box, and at the same time I would move the arrow labeled "Libraries" down. I feel that this would indicate that the object code from the assembler is combined with libraries and these are combined by the linker to make an executable.

The Compilation Process of C with

Related

Where are the header functions defined? [duplicate]

When I include some function from a header file in a C++ program, does the entire header file code get copied to the final executable or only the machine code for the specific function is generated. For example, if I call std::sort from the <algorithm> header in C++, is the machine code generated only for the sort() function or for the entire <algorithm> header file.
I think that a similar question exists somewhere on Stack Overflow, but I have tried my best to find it (I glanced over it once, but lost the link). If you can point me to that, it would be wonderful.
You're mixing two distinct issues here:
Header files, handled by the preprocessor
Selective linking of code by the C++ linker
Header files
These are simply copied verbatim by the preprocessor into the place that includes them. All the code of algorithm is copied into the .cpp file when you #include <algorithm>.
Selective linking
Most modern linkers won't link in functions that aren't getting called in your application. I.e. write a function foo and never call it - its code won't get into the executable. So if you #include <algorithm> and only use sort here's what happens:
The preprocessor shoves the whole algorithm file into your source file
You call only sort
The linked analyzes this and only adds the source of sort (and functions it calls, if any) to the executable. The other algorithms' code isn't getting added
That said, C++ templates complicate the matter a bit further. It's a complex issue to explain here, but in a nutshell - templates get expanded by the compiler for all the types that you're actually using. So if have a vector of int and a vector of string, the compiler will generate two copies of the whole code for the vector class in your code. Since you are using it (otherwise the compiler wouldn't generate it), the linker also places it into the executable.
In fact, the entire file is copied into .cpp file, and it depends on compiler/linker, if it picks up only 'needed' functions, or all of them.
In general, simplified summary:
debug configuration means compiling in all of non-template functions,
release configuration strips all unneeded functions.
Plus it depends on attributes -> function declared for export will be never stripped.
On the other side, template function variants are 'generated' when used, so only the ones you explicitly use are compiled in.
EDIT: header file code isn't generated, but in most cases hand-written.
If you #include a header file in your source code, it acts as if the text in that header was written in place of the #include preprocessor directive.
Generally headers contain declarations, i.e. information about what's inside a library. This way the compiler allows you to call things for which the code exists outside the current compilation unit (e.g. the .cpp file you are including the header from). When the program is linked into an executable that you can run, the linker decides what to include, usually based on what your program actually uses. Libraries may also be linked dynamically, meaning that the executable file does not actually include the library code but the library is linked at runtime.
It depends on the compiler. Most compilers today do flow analysis to prune out uncalled functions. http://en.wikipedia.org/wiki/Data-flow_analysis

How do I get a full assembly code from C file?

I'm currently trying to figure out the way to produce equivalent assembly code from corresponding C source file.
I've been using the C language for several years, but have little experience with assembly language.
I was able to output the assembly code using the -S option in gcc. However, the resulting assembly code contained call instructions which in turn make a jump to another function like _exp. This is not what I wanted, I needed a fully functional assembly code in a single file, with no dependency to other code.
Is it possible to achieve what I'm looking for?
To better describe the problem, I'm showing you my code here:
#include <math.h>
float sigmoid(float i){
return 1/(1+exp(-i));
}
The platform I am working on is Windows 10 64-bit, the compiler I'm using is cl.exe from MSbuild.
My initial objective was to see, at a lowest level possible, how computers calculate mathematical functions. The level where I decided to observe the calculation process is assembly code, and the mathematical function I've chosen was sigmoid defined as above.
_exp is the standard math library function double exp(double); apparently you're on a platform that prepends a leading underscore to C symbol names.
Given a .s that calls some library functions, build it the same way you would a .c file that calls library functions:
gcc foo.S -o foo -lm
You'll get a dynamic executable by default.
But if you really want all the code in one file with no external dependencies, you can link your .c into a static executable and disassemble that.
gcc -O3 -march=native foo.c -o foo -static -lm
objdump -drwC -Mintel foo > foo.s
There's no guarantee that the _exp implementation in libm.a (static library) is identical to the one you'd get in libm.so or libm.dll or whatever, because it's a different file. This is especially true for a function like memcpy where dynamic-linker tricks are often used to select an optimal version (for your CPU) at run-time.
It is not possible in general, there are exceptions sure, I could craft one so that means other folks can too, but it isnt an interesting program.
Normally your C program, your main() entry point is only a percentage of the code. There is a bootstrap that contains the actual entry point for the operating system to launch your program, this does some things that prepare your virtual memory space so that your program can run. Zeros .bss and other such things. that is often and or should be written in assembly language (otherwise you get a chicken and egg problem) but not an assembly language file you will see unless you go find the sources for the C library, you will often get an object as part of the toolchain along with other compiler libraries, etc.
Then if you make any C calls or create code that results in a compiler library call (perform a divide on a platform that doesnt support divide, perform floating point on a platform that doesnt have floating point, etc) that is another object that came from some other C or assembly that is part of the library or compiler sources and is not something you will see during the compile/assemble/link (the chain in toolchain) process.
So except for specifically crafted trivial programs or specifically crafted tools for this purpose (for specific likely baremetal platforms), you will not see your whole program turn into one big assembly source file before it gets assembled then linked.
If not baremetal then there is of course the operating system layer which you certainly would not get to see as part of your source code, ultimately the C library calls that need the system will have a place where they do that, all compiled to object/lib before you use them, and the assembly sources for the operating system side is part of some other source and build process somewhere else.

How do the preprocessor's and linker's jobs not interfere? [duplicate]

Okay, until this morning I was thoroughly confused between these terms. I guess I have got the difference, hopefully.
Firstly, the confusion was that since the preprocessor already includes the header files into the code which contains the functions, what library functions does linker link to the object file produced by the assembler/compiler? Part of the confusion primarily arose due to my ignorance about the difference between a header file and a library.
After a bit of googling, and stack-overflowing (is that the term? :p), I gathered that the header file mostly contains the function declarations whereas the actual implementation is in another binary file called the library (I am still not 100% sure about this).
So, suppose in the following program:-
#include<stdio.h>
int main()
{
printf("whatever");
return 0;
}
The preprocessor includes the contents of the header file in the code. The compiler/compiler+assembler does its work, and then finally linker combines this object file with another object file which actually has stored the way printf() works.
Am I correct in my understanding? I may be way off...so could you please help me?
Edit: I have always wondered about the C++ STL. It always confused me as to what it exactly is, a collection of all those headers or what? Now after reading the responses, can I say that STL is an object file/something that resembles an object file?
And also, I thought where I could read the function definitions of functions like pow(), sqrt() etc etc. I would open the header files and not find anything. So, is the function definition in the library in binary unreadable form?
A C source file goes through two main stages, (1) the preprocessor stage where the C source code is processed by the preprocessor utility which looks for preprocessor directives and performs those actions and (2) the compilation stage where the processed C source code is then actually compiled to produce object code files.
The preprocessor is a utility that does text manipulation. It takes as input a file that contains text (usually C source code) that may contain preprocessor directives and outputs a modified version of the file by applying any directives found to the text input to generate a text output.
The file does not have to be C source code because the preprocessor is doing text manipulation. I have seen the C Preprocssor used to extend the make utility by allowing preprossor directives to be included in a make file. The make file with the C Preprocessor directives is run through the C Preprocessor utility and the resulting output then fed into make to do the actual build of the make target.
Libraries and linking
A library is a file that contains object code of various functions. It is a way to package the output from several source files when they are compiled into a single file. Many times a library file is provided along with a header file (include file), typically with a .h file extension. The header file contains the function declarations, global variable declarations, as well as preprocessor directives needed for the library. So to use the library, you include the header file provided using the #include directive and you link with the library file.
A nice feature of a library file is that you are providing the compiled version of your source code and not the source code itself. On the other hand since the library file contains compiled source code, the compiler used to generate the library file must be compatible with the compiler being used to compile your own source code files.
There are two types of libraries commonly used. The first and older type is the static library. The second and more recent is the dynamic library (Dynamic Link Library or DLL in Windows and Shared Library or SO in Linux). The difference between the two is when the functions in the library are bound to the executable that is using the library file.
The linker is a utility that takes the various object files and library files to create the executable file. When an external or global function or variable is used the C source file, a kind of marker is used to tell the linker that the address of the function or variable needs to be inserted at that point.
The C compiler only knows what is in the source it compiles and does not know what is in other files such as object files or libraries. So the linker's job is to take the various object files and libraries and to make the final connections between parts by replacing the markers with actual connections. So a linker is a utility that "links" together the various components, replacing the marker for a global function or variable in the object files and libraries with a link to the actual object code that was generated for that global function or variable.
During the linker stage is when the difference between a static library and a dynamic or shared library becomes evident. When a static library is used, the actual object code of the library is included in the application executable. When a dynamic or shared library is used, the object code included in the application executable is code to find the shared library and connect with it when the application is run.
In some cases the same global function name may be used in several different object files or libraries so the linker will normally just use the first one it comes across and issue a warning about others found.
Summary of compile and link
So the basic process for a compile and link of a C program is:
preprocessor utility generates the C source to be compiled
compiler compiles the C source into object code generating a set of object files
linker links the various object files along with any libraries into executable file
The above is the basic process however when using dynamic libraries it can get more complicated especially if part of the application being generated has dynamic libraries that it is generating.
The loader
There is also the stage of when the application is actually loaded into memory and execution starts. An operating system provides a utility, the loader, which reads the application executable file and loads it into memory and then starts the application running. The starting point or entry point for the executable is specified in the executable file so after the loader reads the executable file into memory it will then start the executable running by jumping to the entry point memory address.
One problem the linker can run into is that sometimes it may come across a marker when it is processing the object code files that requires an actual memory address. However the linker does not know the actual memory address because the address will vary depending on where in memory the application is loaded. So the linker marks that as something for the loader utility to fix when the loader is loading the executable into memory and getting ready to start it running.
With modern CPUs with hardware supported virtual address to physical address mapping or translation, this issue of actual memory address is seldom a problem. Each application is loaded at the same virtual address and the hardware address translation deals with the actual, physical address. However older CPUs or lower cost CPUs such as micro-controllers that are lacking the memory management unit (MMU) hardware support for address translation still need this issue addressed.
Entry points and the C Runtime
A final topic is the C Runtime and the main() and the executable entry point.
The C Runtime is object code provided by the compiler manufacturer that contains the entry point for an application that is written in C. The main() function is the entry point provided by the programmer writing the application however this is not the entry point that the loader sees. The main() function is called by the C Runtime after the application is started and the C Runtime code sets up the environment for the application.
The C Runtime is not the Standard C Library. The purpose of the C Runtime is to manage the runtime environment for the application. The purpose of the Standard C Library is to provide a set of useful utility functions so that a programmer doesn't have to create their own.
When the loader loads the application and jumps to the entry point provided by the C Runtime, the C Runtime then performs the various initialization actions needed to provide the proper runtime environment for the application. Once this is done, the C Runtime then calls the main() function so that the code created by the application developer or programmer starts to run. When the main() returns or when the exit() function is called, the C Runtime performs any actions needed to clean up and close out the application.
This is an extremely common source of confusion. I think the easiest way to understand what's happening is to take a simple example. Forget about libraries for a moment and consider the following:
$ cat main.c
extern int foo( void );
int main( void ) { return foo(); }
$ cat foo.c
int foo( void ) { return 0; }
$ cc -c main.c
$ cc -c foo.c
$ cc main.o foo.o
The declaration extern int foo( void ) is performing exactly the same function as the header file of a library. foo.o is performing the function of the library. If you understand this example, and why neither cc main.c nor cc main.o work, then you understand the difference between header files and libraries.
Yes, almost correct. Except that the linker does not links object files, but also libraries - in thise case, it's the C standard library (libc) is what is linked to your object file. The rest of your assumptions appear to be true about the compilation stages + difference between a header and a library.

Do I need to compile the header files in a C program?

Sometimes I see someone compile a C program like this:
gcc -o hello hello.c hello.h
As I know, we just need to put the header files into the C program like:
#include "somefile"
and compile the C program: gcc -o hello hello.c.
When do we need to compile the header files or why?
Firstly, in general:
If these .h files are indeed typical C-style header files (as opposed to being something completely different that just happens to be named with .h extension), then no, there's no reason to "compile" these header files independently. Header files are intended to be included into implementation files, not fed to the compiler as independent translation units.
Since a typical header file usually contains only declarations that can be safely repeated in each translation unit, it is perfectly expected that "compiling" a header file will have no harmful consequences. But at the same time it will not achieve anything useful.
Basically, compiling hello.h as a standalone translation unit equivalent to creating a degenerate dummy.c file consisting only of #include "hello.h" directive, and feeding that dummy.c file to the compiler. It will compile, but it will serve no meaningful purpose.
Secondly, specifically for GCC:
Many compilers will treat files differently depending on the file name extension. GCC has special treatment for files with .h extension when they are supplied to the compiler as command-line arguments. Instead of treating it as a regular translation unit, GCC creates a precompiled header file for that .h file.
You can read about it here: http://gcc.gnu.org/onlinedocs/gcc/Precompiled-Headers.html
So, this is the reason you might see .h files being fed directly to GCC.
Okay, let's understand the difference between active and passive code.
The active code is the implementation of functions, procedures, methods, i.e. the pieces of code that should be compiled to executable machine code. We store it in .c files and sure we need to compile it.
The passive code is not being execute itself, but it needed to explain the different modules how to communicate with each other. Usually, .h files contains only prototypes (function headers), structures.
An exception are macros, that formally can contain an active pieces, but you should understand that they are using at the very early stage of building (preprocessing) with simple substitution. At the compile time macros already are substituted to your .c file.
Another exception are C++ templates, that should be implemented in .h files. But here is the story similar to macros: they are substituted on the early stage (instantiation) and formally, each other instantiation is another type.
In conclusion, I think, if the modules formed properly, we should never compile the header files.
When we include the header file like this: #include <header.h> or #include "header.h" then your preprocessor takes it as an input and includes the entire file in the source code. the preprocessor replaces the #include directive by the contents of the specified file.
You can check this by -E flag to GCC, which generates the .i (information file) temporary file or can use the cpp(LINUX) module specifically which is automatically used by the compiler driver when we execute GCC.
So its actually going to compile along with your source code, no need to compile it.
In some systems, attempts to speed up the assembly of fully resolved '.c' files call the pre-assembly of include files "compiling header files". However, it is an optimization technique that is not necessary for actual C development.
Such a technique basically computed the include statements and kept a cache of the flattened includes. Normally the C toolchain will cut-and-paste in the included files recursively, and then pass the entire item off to the compiler. With a pre-compiled header cache, the tool chain will check to see if any of the inputs (defines, headers, etc) have changed. If not, then it will provide the already flattened text file snippets to the compiler.
Such systems were intended to speed up development; however, many such systems were quite brittle. As computers sped up, and source code management techniques changed, fewer of the header pre-compilers are actually used in the common project.
Until you actually need compilation optimization, I highly recommend you avoid pre-compiling headers.
I think we do need preprocess(maybe NOT call the compile) the head file. Because from my understanding, during the compile stage, the head file should be included in c file. For example, in test.h we have
typedef enum{
a,
b,
c
}test_t
and in test.c we have
void foo()
{
test_t test;
...
}
during the compile, i think the compiler will put the code in head file and c file together and code in head file will be pre-processed and substitute the code in c file. Meanwhile, we'd better to define the include path in makefile.
You don't need to compile header files. It doesn't actually do anything, so there's no point in trying to run it. However, it is a great way to check for typos and mistakes and bugs, so it'll be easier later.

What are the differences between a compiler and a linker?

What is the difference between a compiler and a linker in C?
The compiler converts code written in a human-readable programming language into a machine code representation which is understood by your processor. This step creates object files.
Once this step is done by the compiler, another step is needed to create a working executable that can be invoked and run, that is, associate the function calls (for example) that your compiled code needs to invoke in order to work. For example, your code could call sprintf, which is a routine in the C standard library. Your code has nothing that does the actual service provided by sprintf, it just reports that it must be called, but the actual code resides somewhere in the common C library. To perform this (and many others) linkages, the linker must be invoked. After linking, you obtain the actual executable that can run.
A compiler generates object code files (machine language) from source code.
A linker combines these object code files into an executable.
Many IDEs invoke them in succession, so you never actually see the linker at work. Some languages/compilers do not have a distinct linker and linking is done by the compiler as part of its work.
In Simple words -> Linker comes into act whenever a '.obj' file needs to be linked with its library functions as compiler doesn't understand what is (scanf or printf..etc) , compiler just converts '.c' file to '.obj' file if there's no error without understanding library functions we used. So To make 'obj' file to 'exe'(executable file) we need linker because it makes compiler understand of library functions.

Resources