Is linker used for simple C programs? - c

Suppose I have a C source file which does not contain any reference to any other file. You may assume it only contains -
int main(void) {
int a=5, b=10;
}
Will this source file go to the linker? What will be the task of the linker in this case?

It will because linker will be invoked to form the runnable executable. No matter it's one source file or many, each translation unit will be first compile to object file, and then linked against the C's runtime to form the executable program. So even you see only one source file, it is still linked to the runtime by the linker.

The linker is Always needed, also if you don't use any explicit library. Any program needs anyway to include in his binary the OS basic startup instructions, and the linker add them to your executable

Related

How do the preprocessor's and linker's jobs not interfere? [duplicate]

Okay, until this morning I was thoroughly confused between these terms. I guess I have got the difference, hopefully.
Firstly, the confusion was that since the preprocessor already includes the header files into the code which contains the functions, what library functions does linker link to the object file produced by the assembler/compiler? Part of the confusion primarily arose due to my ignorance about the difference between a header file and a library.
After a bit of googling, and stack-overflowing (is that the term? :p), I gathered that the header file mostly contains the function declarations whereas the actual implementation is in another binary file called the library (I am still not 100% sure about this).
So, suppose in the following program:-
#include<stdio.h>
int main()
{
printf("whatever");
return 0;
}
The preprocessor includes the contents of the header file in the code. The compiler/compiler+assembler does its work, and then finally linker combines this object file with another object file which actually has stored the way printf() works.
Am I correct in my understanding? I may be way off...so could you please help me?
Edit: I have always wondered about the C++ STL. It always confused me as to what it exactly is, a collection of all those headers or what? Now after reading the responses, can I say that STL is an object file/something that resembles an object file?
And also, I thought where I could read the function definitions of functions like pow(), sqrt() etc etc. I would open the header files and not find anything. So, is the function definition in the library in binary unreadable form?
A C source file goes through two main stages, (1) the preprocessor stage where the C source code is processed by the preprocessor utility which looks for preprocessor directives and performs those actions and (2) the compilation stage where the processed C source code is then actually compiled to produce object code files.
The preprocessor is a utility that does text manipulation. It takes as input a file that contains text (usually C source code) that may contain preprocessor directives and outputs a modified version of the file by applying any directives found to the text input to generate a text output.
The file does not have to be C source code because the preprocessor is doing text manipulation. I have seen the C Preprocssor used to extend the make utility by allowing preprossor directives to be included in a make file. The make file with the C Preprocessor directives is run through the C Preprocessor utility and the resulting output then fed into make to do the actual build of the make target.
Libraries and linking
A library is a file that contains object code of various functions. It is a way to package the output from several source files when they are compiled into a single file. Many times a library file is provided along with a header file (include file), typically with a .h file extension. The header file contains the function declarations, global variable declarations, as well as preprocessor directives needed for the library. So to use the library, you include the header file provided using the #include directive and you link with the library file.
A nice feature of a library file is that you are providing the compiled version of your source code and not the source code itself. On the other hand since the library file contains compiled source code, the compiler used to generate the library file must be compatible with the compiler being used to compile your own source code files.
There are two types of libraries commonly used. The first and older type is the static library. The second and more recent is the dynamic library (Dynamic Link Library or DLL in Windows and Shared Library or SO in Linux). The difference between the two is when the functions in the library are bound to the executable that is using the library file.
The linker is a utility that takes the various object files and library files to create the executable file. When an external or global function or variable is used the C source file, a kind of marker is used to tell the linker that the address of the function or variable needs to be inserted at that point.
The C compiler only knows what is in the source it compiles and does not know what is in other files such as object files or libraries. So the linker's job is to take the various object files and libraries and to make the final connections between parts by replacing the markers with actual connections. So a linker is a utility that "links" together the various components, replacing the marker for a global function or variable in the object files and libraries with a link to the actual object code that was generated for that global function or variable.
During the linker stage is when the difference between a static library and a dynamic or shared library becomes evident. When a static library is used, the actual object code of the library is included in the application executable. When a dynamic or shared library is used, the object code included in the application executable is code to find the shared library and connect with it when the application is run.
In some cases the same global function name may be used in several different object files or libraries so the linker will normally just use the first one it comes across and issue a warning about others found.
Summary of compile and link
So the basic process for a compile and link of a C program is:
preprocessor utility generates the C source to be compiled
compiler compiles the C source into object code generating a set of object files
linker links the various object files along with any libraries into executable file
The above is the basic process however when using dynamic libraries it can get more complicated especially if part of the application being generated has dynamic libraries that it is generating.
The loader
There is also the stage of when the application is actually loaded into memory and execution starts. An operating system provides a utility, the loader, which reads the application executable file and loads it into memory and then starts the application running. The starting point or entry point for the executable is specified in the executable file so after the loader reads the executable file into memory it will then start the executable running by jumping to the entry point memory address.
One problem the linker can run into is that sometimes it may come across a marker when it is processing the object code files that requires an actual memory address. However the linker does not know the actual memory address because the address will vary depending on where in memory the application is loaded. So the linker marks that as something for the loader utility to fix when the loader is loading the executable into memory and getting ready to start it running.
With modern CPUs with hardware supported virtual address to physical address mapping or translation, this issue of actual memory address is seldom a problem. Each application is loaded at the same virtual address and the hardware address translation deals with the actual, physical address. However older CPUs or lower cost CPUs such as micro-controllers that are lacking the memory management unit (MMU) hardware support for address translation still need this issue addressed.
Entry points and the C Runtime
A final topic is the C Runtime and the main() and the executable entry point.
The C Runtime is object code provided by the compiler manufacturer that contains the entry point for an application that is written in C. The main() function is the entry point provided by the programmer writing the application however this is not the entry point that the loader sees. The main() function is called by the C Runtime after the application is started and the C Runtime code sets up the environment for the application.
The C Runtime is not the Standard C Library. The purpose of the C Runtime is to manage the runtime environment for the application. The purpose of the Standard C Library is to provide a set of useful utility functions so that a programmer doesn't have to create their own.
When the loader loads the application and jumps to the entry point provided by the C Runtime, the C Runtime then performs the various initialization actions needed to provide the proper runtime environment for the application. Once this is done, the C Runtime then calls the main() function so that the code created by the application developer or programmer starts to run. When the main() returns or when the exit() function is called, the C Runtime performs any actions needed to clean up and close out the application.
This is an extremely common source of confusion. I think the easiest way to understand what's happening is to take a simple example. Forget about libraries for a moment and consider the following:
$ cat main.c
extern int foo( void );
int main( void ) { return foo(); }
$ cat foo.c
int foo( void ) { return 0; }
$ cc -c main.c
$ cc -c foo.c
$ cc main.o foo.o
The declaration extern int foo( void ) is performing exactly the same function as the header file of a library. foo.o is performing the function of the library. If you understand this example, and why neither cc main.c nor cc main.o work, then you understand the difference between header files and libraries.
Yes, almost correct. Except that the linker does not links object files, but also libraries - in thise case, it's the C standard library (libc) is what is linked to your object file. The rest of your assumptions appear to be true about the compilation stages + difference between a header and a library.

Does the linker refer to the main code

Let assume I am having three source files main.c, a.c and b.c. In the main.c are called some of the functions (not all) that are defined in a.c. None of the functions defined in b.c are called (used) by main.c. In main.c is the main function. Then we have a makefile that compiles all the source files(main.c, a.c and b.c) and then links them to produce executable file, in my case intel hex file. My question is: Does the linker know in which file the main function resides and knowing that to determine what part of the object files to link together? I mean if the linker produces the exe file based only on the recipe of the rule to make the target then no matter how many functions are called in our application code the size of the executable will be the same because the recipe says to link all the object files. For example we compile the three source files and we get three object files: main.o a.o and b.o (the bigger the object files are, the bigger the exe file is). I know you would say if you dont want anything from the b.c then do not include it in the build. But it means that every time I want to change the application (include/exclide modules) I need to change the makefile too. And another thing is how the linker knows what part of the object file to take, does it understand the C language? I hope you understand my question, excuse my bad English.
1) Does the linker know in which file the main function resides and knowing that to determine what part of the object files to link together?
Maybe there are options of your toolchain (compiler/linker) to enable this kind of optimizations, I mean removing unused functions from link, but I have big doubt for global functions (could be possible for static functions).
2) And another thing is how the linker knows what part of the object file to take, does it understand the C language?
Linker may detect if a function or variable is not used by the application (once again, check the available options), but it is not really the objective of this tool. However if you compile/link some functions as library functions (see options), you can generate a "library" file and then link this library with other object files. The functions of the library will then be included by the linker ONLY if they are used.
What I suggest: use compilation flags (#ifdef...) to include or exclude parts of code from compilation/link.
If you want only those functions in the executable that are eventually called from main, use a library of object files.
Basically the smallest unit the linker will extract from a library is the object file. Whatever symbols are in that object file will also be resolved, until all symbols are resolved.
In other words, if none of the symbols in an object file are needed, it won't end up in the result. If at least one symbol is needed, it will get linked in its entirety.
No, the linker does not understand C. Note that a lot of language compilers create object files (C++, FORTRAN, ..., and assemblers). A linker resolves symbols, which are names attached to values.
John Levine has written a book, "Linkers and Loaders", available on the 'net, which will give you an in-depth understanding of linkers, symbols, and object files.

How the compiler knows where my main function is?

I am working on a project that contains multiple modules (source files, header files, libraries). One of the files in all that soup contains my main function.
My questions are:
How the compiler knows which modules to compile and which not?
How does the compiler recognize the module with the main() inside?
The compiler itself doesn't care about what file contains which functions; main() is not special. However, in the linking stage, all these symbols from different files (and compilation units, possibly) are matched. The linker has a hidden "template" which has code at a fixed address that the OS will always call when you run a program. That code will call your main; hence, the linker looks for a main in all files. If it isn't there, you get an unresolved symbol error, exactly like if you used a function that you forgot to implement.
The same as for any other function applies to main: You can only have one implementation; having two main in two files that get linked together, you get a linker error, because the linker can't decide which of these to use.
How the compiler knows which modules to compile and which not?
It does not. You tell him which ones you want to compile, typically though the compilation statement(s) present in a makefile.
How does the compiler recognize the module with the main() inside?
Altogether it's a big process, already answered in this related question.
To summarize, while compiling a program with standard C library, the entry point of your program is set to _start. Now that has a reference to main() function internally. So, at compilation time, there is no (need for) checking the presence of main(). At linking time, linker should be able to locate one instance of main() which it can link to. That way, main() will serve as the entry point to your program.
So, to answer
How the compiler knows where my main function is?
It does (and need) not. It's the job of a linker, specifically.
The assembly code (often referred as startup code by embedded people) that starts up the program specifically calls main().
The prototype for main() is included in compiler documentation.
When you compile a program, an object file is produced. The object file from your source code is then linked with a startup runtime component (usually called crt0.o[bj]) and the C library components, etc.
If main() is changed to an unrecognizable signature, the compilation unit will complain about an unresolved external reference to _main or __main.

About C compilation process and the linking of libraries

Are C libraries linked with object code or first with source code so only later with object code? I mean, look at the image found at Cardiff School of Computer Science & Informatics's website
:
It's "strange" that after generating object-code the libraries are being linked. I mean, we use the source code while putting the includes!
So.. How this actually works? Thanks!
That diagram is correct.
When you #include a header, it essentially copies that header into your file. A header is a list of types and function declarations and constants, etc., but doesn't contain any actual code (C++ and inline functions notwithstanding).
Let's have an example: library.h
int foo(int num);
library.c
int foo(int num)
{
return num * 2;
}
yourcode.c
#include <stdio.h>
#include "library.h"
int main(void)
{
printf("%d\n", foo(100));
return 0;
}
When you #include library.h, you get the declaration of foo(). The compiler, at this point, knows nothing else about foo() or what it does. The compiler can freely insert calls to foo() despite this. The linker, seeing a call to foo() in youcode.c, and seeing the code in library.c, knows that any calls to foo() should go to that code.
In other words, the compiler tells the linker what function to call, and the linker (given all the object code) knows where that function actually is.
(Thanks to Fiddling Bits for corrections.)
Includes from libraries normally contain only library interface - so in the simplest case the .h file provided with the library contains function declaration, and the compiled function is in the library file. So you compile the sources with provided library functions declarations from library headers, and then linker adds the compiler library functions to your executable.
It might be instructive to look at what each piece in the tool-chain does, so using the boxes in your image.
pre-processor
This is really a text-editor doing a bunch of substitutions (ok, really really oversimplified). Some of the things that the pre-processor does is:
performs simple textual based substitution on #defines. So if we have #define PI 3.1415 in our file and then later on we have a line such as angle = angle * PI / 180; the pre=processor will convert this line into angle = angle * 3.1414 / 180;
anytime we encounter an #include, we can imagine that the pre-processor goes and gets the entire contents of that file and pastes the contents on the file on to where the #include is. (and then we go back and perform the substitutions.
we can also pass options to the compiler with the #pragma directive.
Finally, we can see the results of running the pre-processor by using the -E option to gcc.
compiler
The output of the pre-processor is still text, and it not contains everything that the compiler needs to be able to process the file. Now the compiler does a lot of things (and I normally break the box up when I describe this process). The compiler will process the text, do a lexical analysis of it, pass it to the parser that verifies that the program satisfies the grammar of the language, output an intermediate representation of the language, perform optimization and produce assembly code.
We can see the results of running up to the assembler by using the -s option to gcc.
assembler
The output of the compiler is an assembly listing, which is then passed to an assembler (most commonly `gas' (GNU assembler) on Linux), that converts the assembly code into machine code. In addition, on task of the assembler is to build a list of undefined referenced (i.e. a library function of a function that you wrote that is implemented in another source file.)
We can see the results of getting the output of the assembler by using the -c option to gcc.
linker
The input to the linker will be the output from the assembler (typically called object files and use an extention 'o'), as well as various libraries. Conceptually, the linker is responsible for hooking everything together, including fixing up the calls to functions that are found in libraries. Normally, the program that performs the linking in Linux is ld, and we can see the results of linking just by running gcc without any special command line options.
I have simplified the discussion of the linker, I hope I gave you a flavor of what the linker does.
The only issue that I have with the image you referenced, is that I would have move the phase "Object Code" to immediately below the assembler box, and at the same time I would move the arrow labeled "Libraries" down. I feel that this would indicate that the object code from the assembler is combined with libraries and these are combined by the linker to make an executable.
The Compilation Process of C with

What are the differences between a compiler and a linker?

What is the difference between a compiler and a linker in C?
The compiler converts code written in a human-readable programming language into a machine code representation which is understood by your processor. This step creates object files.
Once this step is done by the compiler, another step is needed to create a working executable that can be invoked and run, that is, associate the function calls (for example) that your compiled code needs to invoke in order to work. For example, your code could call sprintf, which is a routine in the C standard library. Your code has nothing that does the actual service provided by sprintf, it just reports that it must be called, but the actual code resides somewhere in the common C library. To perform this (and many others) linkages, the linker must be invoked. After linking, you obtain the actual executable that can run.
A compiler generates object code files (machine language) from source code.
A linker combines these object code files into an executable.
Many IDEs invoke them in succession, so you never actually see the linker at work. Some languages/compilers do not have a distinct linker and linking is done by the compiler as part of its work.
In Simple words -> Linker comes into act whenever a '.obj' file needs to be linked with its library functions as compiler doesn't understand what is (scanf or printf..etc) , compiler just converts '.c' file to '.obj' file if there's no error without understanding library functions we used. So To make 'obj' file to 'exe'(executable file) we need linker because it makes compiler understand of library functions.

Resources