How the compiler knows where my main function is? - c

I am working on a project that contains multiple modules (source files, header files, libraries). One of the files in all that soup contains my main function.
My questions are:
How the compiler knows which modules to compile and which not?
How does the compiler recognize the module with the main() inside?

The compiler itself doesn't care about what file contains which functions; main() is not special. However, in the linking stage, all these symbols from different files (and compilation units, possibly) are matched. The linker has a hidden "template" which has code at a fixed address that the OS will always call when you run a program. That code will call your main; hence, the linker looks for a main in all files. If it isn't there, you get an unresolved symbol error, exactly like if you used a function that you forgot to implement.
The same as for any other function applies to main: You can only have one implementation; having two main in two files that get linked together, you get a linker error, because the linker can't decide which of these to use.

How the compiler knows which modules to compile and which not?
It does not. You tell him which ones you want to compile, typically though the compilation statement(s) present in a makefile.
How does the compiler recognize the module with the main() inside?
Altogether it's a big process, already answered in this related question.
To summarize, while compiling a program with standard C library, the entry point of your program is set to _start. Now that has a reference to main() function internally. So, at compilation time, there is no (need for) checking the presence of main(). At linking time, linker should be able to locate one instance of main() which it can link to. That way, main() will serve as the entry point to your program.
So, to answer
How the compiler knows where my main function is?
It does (and need) not. It's the job of a linker, specifically.

The assembly code (often referred as startup code by embedded people) that starts up the program specifically calls main().
The prototype for main() is included in compiler documentation.
When you compile a program, an object file is produced. The object file from your source code is then linked with a startup runtime component (usually called crt0.o[bj]) and the C library components, etc.
If main() is changed to an unrecognizable signature, the compilation unit will complain about an unresolved external reference to _main or __main.

Related

Does the linker refer to the main code

Let assume I am having three source files main.c, a.c and b.c. In the main.c are called some of the functions (not all) that are defined in a.c. None of the functions defined in b.c are called (used) by main.c. In main.c is the main function. Then we have a makefile that compiles all the source files(main.c, a.c and b.c) and then links them to produce executable file, in my case intel hex file. My question is: Does the linker know in which file the main function resides and knowing that to determine what part of the object files to link together? I mean if the linker produces the exe file based only on the recipe of the rule to make the target then no matter how many functions are called in our application code the size of the executable will be the same because the recipe says to link all the object files. For example we compile the three source files and we get three object files: main.o a.o and b.o (the bigger the object files are, the bigger the exe file is). I know you would say if you dont want anything from the b.c then do not include it in the build. But it means that every time I want to change the application (include/exclide modules) I need to change the makefile too. And another thing is how the linker knows what part of the object file to take, does it understand the C language? I hope you understand my question, excuse my bad English.
1) Does the linker know in which file the main function resides and knowing that to determine what part of the object files to link together?
Maybe there are options of your toolchain (compiler/linker) to enable this kind of optimizations, I mean removing unused functions from link, but I have big doubt for global functions (could be possible for static functions).
2) And another thing is how the linker knows what part of the object file to take, does it understand the C language?
Linker may detect if a function or variable is not used by the application (once again, check the available options), but it is not really the objective of this tool. However if you compile/link some functions as library functions (see options), you can generate a "library" file and then link this library with other object files. The functions of the library will then be included by the linker ONLY if they are used.
What I suggest: use compilation flags (#ifdef...) to include or exclude parts of code from compilation/link.
If you want only those functions in the executable that are eventually called from main, use a library of object files.
Basically the smallest unit the linker will extract from a library is the object file. Whatever symbols are in that object file will also be resolved, until all symbols are resolved.
In other words, if none of the symbols in an object file are needed, it won't end up in the result. If at least one symbol is needed, it will get linked in its entirety.
No, the linker does not understand C. Note that a lot of language compilers create object files (C++, FORTRAN, ..., and assemblers). A linker resolves symbols, which are names attached to values.
John Levine has written a book, "Linkers and Loaders", available on the 'net, which will give you an in-depth understanding of linkers, symbols, and object files.

Compiler warning not generated for multiple definitions

problem I am facing is function with same signature is defined in two .c files and is not giving compile time error. I have included declaration in .h file, which is included to both .c files.
For example:
int add(int x, int y) { return x+y;}
same definition is given in two .c files (Say A.c and B.c) and declaration in one .h file which is included in both A.c and B.c. But why this is not giving compile time error or How can I make to give them compile error
Even Linker is not giving any error, it looks it is taking first definition
I am using GCC compiler mingw
I found another pattern in this.
if I am using this in header file
#ifndef H_H_
#define H_H_
linker is not giving warning warning but If i don't use this Linker gives warning which is expected.
This situation is undefined behaviour with no diagnostic required.
Consult your linker's documentation to see if it has any options to report multiple definition of functions.
The compiler doesn't analyze your program as a whole. It simply processes one .c file at a time. If the declaration in the .h file matches the definition in the .c file, then everything is good as far as the compiler is concerned.
The linker will detect that the function was defined twice and will generate a "duplicate symbol" error.
Compiler sees each source file apart from the other. Compiler includes the content of header file(s) into A.c then geneates an object file A.obj from A.c. A.obj file will contain symbols of the variables and functions defined in A.c. On the other hand, compiler will process B.c apart without checking A.c, or any other source file, content. It will start by including header file(s) into B.c then it generates B.obj which also includes symbols of the variables and functions defined in B.c.
As a result, you will not get errors at compile time as the function duplication is not detected by the compiler. It is the linker job to check the symbols consistency and that there are no duplication present. Linker will get all generated object files in order to generate an executable. Linker must assign a unique memory address to each symbol. For example, in your code if there is a point (let's say in main function) where a function of A.c is called, actually, this is translated into a jump to an address in memory where that function is located. Now, imagine if two functions with the same signature coexist in the executable and each symbol has a different address. Then, how can the processor figure out which function exactly do you intend to call in your program. For that reason, if linker finds a symbol which is duplicated it will signal an error.
As #Matt-McNabb says: consult your linker documentation.
The only other cause I can come up with is that the linker binary compares the two functions, finds they are idenical, and ignores one. You can check this by slightly changing the code, for example by 'return y+x'.

What is responsible for ensuring all symbols are known/defined?

Is it the C preprocessor, compiler, or linkage editor?
To tell you the truth, it is programmer.
The answer you are looking for is... the compiler it depends. Sometimes it's the compiler, sometimes it's the linker, and sometimes it doesn't happen until the program is loaded.
The preprocessor:
handles directives for source file inclusion (#include), macro definitions (#define), and conditional inclusion (#if).
...
The language of preprocessor directives is agnostic to the grammar of C, so the C preprocessor can also be used independently to process other kinds of text files.
The linker:
takes one or more objects generated by a compiler and combines them into a single executable program.
...
Computer programs typically comprise several parts or modules; all
these parts/modules need not be contained within a single object file,
and in such case refer to each other by means of symbols. Typically,
an object file can contain three kinds of symbols:
defined symbols, which allow it to be called by other modules,
undefined symbols, which call the other modules where these symbols are defined, and
local symbols, used internally within the object file to facilitate relocation.
When a program comprises multiple object files, the linker combines
these files into a unified executable program, resolving the
symbols as it goes along.
In environments which allow dynamic linking, it is possible that
executable code still contains undefined symbols, plus a list of objects or libraries that will provide definitions for these.
The programmer must make sure everything is defined somewhere. The programmer is RESPONSIBLE for doing so.
Various tools will complain along the way if they notice anything missing:
The compiler will notice certain things missing, and will error out if it can realize that something's not there.
The linker will error out if it can't fix up a reference that's not in a library somewhere.
At run time there is a loader that pulls the relevant shared libraries into the process's memory space. The loader is the last thing that gets a crack at fixing up symbols before the program gets to run any code, and it will throw errors if it can't find a shared library/dll, or if the interface for the library that was used at link-time doesn't match up correctly with the available library.
None of these tools is RESPONSIBLE for making sure everything is defined. They are just the things that will notice if things are NOT defined, and will be the ones throwing the error message.
For symbols with internal linkage or no linkage: the compiler.
For symbols with external linkage: the linker, either the "traditional" one, or the runtime linker.
Note that the dynamic/runtime linker may choose to do its job lazily, resolving symbols only when they are used (e.g: when a function is called for the first time).

How to get all symbol conflict from 2 static libs in VC8

Say I have 2 static libs
ex1.a
ex2.a
In both libs I will define 10 same functions
When Compiling a sample test code say "test.c" , I link with both static libs ex1.a and ex2.a
In "test.c" I will call only 3 functions, then I will get the
linker error "same symbols deifned in both ex1.a and ex2.a libraries" This is Ok.
My Question here is :
1. Why this error only display 3 functions as multiple defined.. Why not it list all 10 functions
In VC8 How can I list all multiple defined symbols without actualy calling that function in test code ...
Thanks,
Thats because, linker tries to resovle a symbol name, when it compiles and links a code which has the function call. Only when the code has some function calls, linker would try to resolve it in either the test code or the libraries linked along and thats when it would find multiple definitions. If no function called, then I guess no problem.
What you experience is the optimizing part of the linker: By default it won't include code that isn't referenced. The compiler will create multiple object files with most likely unresolved dependencies (calls that couldn't be satisfied by the code included). So the linker takes all object files passed and tries to find solutions for the unresolved dependencies. If it fails, it will check the available library files. If there are multiple options with the same exact name/signature it will start complaining cause it won't be able to decide which one to pick (for identical code this won't matter but imagine different implementations using different "behind the scenes" work on memory, such as debug and release stuff).
The only (and possibly easiest way) I could think of to detect all these multiple definitions would be creating another static library project including all source files used in both static libs. When creating a library the linker will include everything called or exported - you won't need specific code calling the stuff for the linker to see/include everything as long as it's exported.
However I still don't understand what you're actually trying to accomplish as a whole. Trying to find code shared between two libraries?

Can GCC not complain about undefined references?

Under what situation is it possible for GCC to not throw an "undefined reference" link error message when trying to call made-up functions?
For example, a situation in which this C code is compiled and linked by GCC:
void function()
{
made_up_function_name();
return;
}
...even though made_up_function_name is not present anywhere in the code (not headers, source files, declarations, nor any third party library).
Can that kind of code be accepted and compiled by GCC under certain conditions, without touching the actual code? If so, which?
Thanks.
EDIT: no previous declarations or mentions to made_up_function_name are present anywhere else. Meaning that a grep -R of the whole filesystem will only show that exact single line of code.
Yes, it is possible to avoid reporting undefined references - using --unresolved-symbols linker option.
g++ mm.cpp -Wl,--unresolved-symbols=ignore-in-object-files
From man ld
--unresolved-symbols=method
Determine how to handle unresolved symbols. There are four
possible values for method:
ignore-all
Do not report any unresolved symbols.
report-all
Report all unresolved symbols. This is the default.
ignore-in-object-files
Report unresolved symbols that are contained in shared
libraries, but ignore them if they come from regular object
files.
ignore-in-shared-libs
Report unresolved symbols that come from regular object
files, but ignore them if they come from shared libraries. This
can be useful when creating a dynamic binary and it is known
that all the shared libraries that it should be referencing
are included on the linker's command line.
The behaviour for shared libraries on their own can also be
controlled by the --[no-]allow-shlib-undefined option.
Normally the linker will generate an error message for each
reported unresolved symbol but the option --warn-unresolved-symbols can
change this to a warning.
TL;DR It can not complain, but you don't want that. Your code will crash if you force the linker to ignore the problem. It'd be counterproductive.
Your code relies on the ancient C (pre-C99) allowing functions to be implicitly declared at their point of use. Your code is semantically equivalent to the following code:
void function()
{
int made_up_function_name(...); // The implicit declaration
made_up_function_name(); // Call the function
return;
}
The linker rightfully complains that the object file that contains the compiled function() refers to a symbol that wasn't found anywhere else. You have to fix it by providing the implementation for made_up_function_name() or by removing the nonsensical call. That's all there's to it. No linker-fiddling involved.
If you declare the prototype of the function before using it , it shold compile. Anyway the error while linking will remain.
void made_up_function_name();
void function()
{
made_up_function_name();
return;
}
When you build with the linker flag -r or --relocatable it will also not produce any "undefined reference" link error messages.
This is because -r will link different objects in a new object file to be linked at a later stage.
And then there is this nastiness with the -D flag passed to GCC.
$cat undefined.c
void function()
{
made_up_function_name();
return;
}
int main(){
}
$gcc undefined.c -Dmade_up_function_name=atexit
$
Just imagine looking for the definition of made_up_function_name- it appears nowhere yet "does things" in the code.
I can't think of a nice reason to do this exact thing in code.
The -D flag is a powerful tool for changing code at compile time.
If function() is never called, it might not be included in the executable, and the function called from it is not searched for either.
The "standard" algorithm according to which POSIX linkers operate leaves open the possibility that the code will compile and link without any errors. See here for details: https://stackoverflow.com/a/11894098/187690
In order to exploit that possibility the object file that contains your function (let's call it f.o) should be placed into a library. That library should be mentioned in the command line of the compiler (and/or linker), but by that moment no other object file (mentioned earlier in the command line) should have made any calls to function or any other function present in f.o. Under such circumstances linker will see no reason to retrieve f.o from the library. Linker will completely ignore f.o, completely ignore function and, therefore, remain completely oblivious of the call to made_up_function_name. The code will compile even though made_up_function_name is not defined anywhere.

Resources