yes I actually want to get that error. I am using MSVC (the command prompt). I would like to have .lib that would require definition of external symbol that gets linked to it. I must understand something wrong about static linking, because to me my approach seems legit:
I have a file that looks roughly like this:
extern INFO_BLOCK user_setup;
int
crtInit()
{
SetInfoBlock(&user_setup);
return 0;
}
when I try to use the .obj of this file in compilation with the main module
cl main.c file.obj it says unresolved externals. That is desired behaviour.
Nevertheless, once I pack the file.obj with lib file.obj even using /include:user_data (which frankly I don't trust as being of any use in this case)
using the .lib with cl main.c /link file.lib does not generate missing externals and that is the problem. I need the programmer to define that symbol. Does extern get stripped away once you put your .obj in .lib? Where am I wrong?
If main.c does not contain any reference to crtInit there is no reason for the linker to pull that function into the generated binary - Thus it will not "see" the unresolved reference to user_setup at all.
When you mention an object file to the linker, you force it to include the object file into the binary, regardless of whether it is needed by the program or not.
Opposed to that, when you mention a library to the linker, it will only use the library to resolve unresolved references you already have at that point with object files that it pulls in from this library. In case nothing is unresolved up to this point (or not satisfied by any symbol in the library), the linker will not use anything from the library at all.
The above is also the reason why many linkers are a bit picky about the order of libraries when linking (typically from specific to generic - or "user" to "system"), because linkers are normally single pass and will only pull in what they "see" needed at this specific point in the linking process.
Related
I have a .so library and while building it I didn't get any undefined reference errors.
But now I am building an executable using the .so file and I can see the undefined reference errors during the linking stage as shown below:
xy.so: undefined reference to `MICRO_TO_NANO_ULL'
I referred to this and this but couldn't really understand the dynamic linking.
Also reading from here lead to more confusion:
Dynamic linking is accomplished by placing the name of a sharable
library in the executable image. Actual linking with the library
routines does not occur until the image is run, when both the
executable and the library are placed in memory. An advantage of
dynamic linking is that multiple programs can share a single copy of
the library.
My questions are:
Doesn't dynamic linking means that when I start the executable using
./executable_name then if the linker not able to locate the .so
file on which executable depends it should crash?
What actually is dynamic linking if all external entity references are
resolved while building? Is it some sort of pre-check performed by dynamic linker? Else
dynamic linker can make use of
LD_LIBRARY_PATH to get additional libraries to resolve the undefined
symbols.
Doesn't dynamic linking means that when I start the executable using ./executable_name then if the linker not able to locate the .so file on which executable depends it should crash?
No, linker will exit with "No such file or directory" message.
Imagine it like this:
Your executable stores somewhere a list of shared libraries it needs.
Linker, think of it as a normal program.
Linker opens your executable.
Linker reads this list. For each file.
It tries to find this file in linker paths.
If it finds the file, it "loads" it.
If it can't find the file, it get's errno with No Such file or directory from open() call. And then prints a message that it can't find the library and terminates your executable.
When running the executable, linker dynamically searches for a symbol in shared libraries.
When it can't find a symbol, it prints some message and the executable teerminates.
You can for example set LD_DEBUG=all to inspect what linker is doing. You can also inspect your executable under strace to see all the open calls.
What actually is dynamic linking if all external entity references are resolved while
building?
Dynamic linking is when you run the executable then the linker loads each shared library.
When building, your compiler is kind enough to check for you, that all symbols that you use in your program exist in shared libraries. This is just for safety. You can for example disable this check with ex. --unresolved-symbols=ignore-in-shared-libs.
Is it some sort of pre-check performed by dynamic linker?
Yes.
Else dynamic linker can make use of LD_LIBRARY_PATH to get additional libraries to resolve the undefined symbols.
LD_LIBRARY_PATH is just a comma separated list of paths to search for the shared library. Paths in LD_LIBRARY_PATH are just processed before standard paths. That's all. It doesn't get "additional libraries", it gets additional paths to search for the libraries - libraries stay the same.
It looks like there is a #define missing when you compile your shared library. This error
xy.so: undefined reference to `MICRO_TO_NANO_ULL'
means, that something like
#define MICRO_TO_NANO_ULL(sec) ((unsigned long long)sec * 1000)
should be present, but is not.
The compiler assumes then, that it is an external function and creates an (undefined) symbol for it, while it should be resolved at compile time by a preprocessor macro.
If you include the correct file (grep for the macro name) or put an appropriate definition at the top of your source file, then the linker error should vanish.
Doesn't dynamic linking means that when I start the executable using ./executable_name then if the linker not able to locate the .so file on which executable depends it should crash?
Yes. If the .so file is not present at run-time.
What actually is dynamic linking if all external entity references are resolved while building? Is it some sort of pre-check performed by dynamic linker? Else dynamic linker can make use of LD_LIBRARY_PATH to get additional libraries to resolve the undefined symbols.
It allows for libraries to be upgraded and have applications still be able to use the library, and it reduces memory usage by loading one copy of the library instead of one in every application that uses it.
The linker just creates references to these symbols so that the underlying variables or functions can be used later. It does not link the variables and functions directly into the executable.
The dynamic linker does not pull in any libraries unless those libraries are specified in the executable (or by extension any library the executable depends on). If you provide an LD_LIBRARY_PATH directory with a .so file of an entirely different version than what the executable requires the executable can crash.
In your case, it seems as if a required macro definition has not been found and the compiler is using implicit declaration rules. You can easily fix this by compiling your code with -pedantic -pedantic-errors (assuming you're using GCC).
Doesn't dynamic linking means that when I start the executable using
./executable_name then if the linker not able to locate the .so file
on which executable depends it should crash?
It will crash. The time of crash does depend on the way you call a certain exported function from the .so file.
You might retrieve all exported functions via functions pointers by yourself by using dlopen dlysm and co. In this case the program will crash at first call in case it does not find the exported method.
In case of the executable just calling an exported method from a shared object (part of it's header) the dynamic linker uses the information of the method to be called in it's executable (see second answer) and crashes in case of not finding the lib or a mismatch in symbols.
What actually is dynamic linking if all external entity references are resolved while building? Is it some sort of pre-check performed by dynamic linker? Else dynamic linker can make use of LD_LIBRARY_PATH to get additional libraries to resolve the undefined symbols.
You need to differentiate between the actual linking and the dynamic linking. Starting off with the actual linking:
In case of linking a static library, the actual linking will copy all code from the method to be called inside the executable/library using it.
When linking a dynamic library you will not copy code but symbols. The symbols contain offsets or other information pointing to the acual code in the dynamic library. If the executable does invoke a method which is not exported by the dynamic library though, it will already fail at the actual linking part.
Now when starting your executable, the OS will at some point try to load the shared object into memory where the code actually resides in. If it does not find it or also if it is imcotable (i.e.: the executable was linked to a library using different exports), it might still fail at runtime.
Let assume I am having three source files main.c, a.c and b.c. In the main.c are called some of the functions (not all) that are defined in a.c. None of the functions defined in b.c are called (used) by main.c. In main.c is the main function. Then we have a makefile that compiles all the source files(main.c, a.c and b.c) and then links them to produce executable file, in my case intel hex file. My question is: Does the linker know in which file the main function resides and knowing that to determine what part of the object files to link together? I mean if the linker produces the exe file based only on the recipe of the rule to make the target then no matter how many functions are called in our application code the size of the executable will be the same because the recipe says to link all the object files. For example we compile the three source files and we get three object files: main.o a.o and b.o (the bigger the object files are, the bigger the exe file is). I know you would say if you dont want anything from the b.c then do not include it in the build. But it means that every time I want to change the application (include/exclide modules) I need to change the makefile too. And another thing is how the linker knows what part of the object file to take, does it understand the C language? I hope you understand my question, excuse my bad English.
1) Does the linker know in which file the main function resides and knowing that to determine what part of the object files to link together?
Maybe there are options of your toolchain (compiler/linker) to enable this kind of optimizations, I mean removing unused functions from link, but I have big doubt for global functions (could be possible for static functions).
2) And another thing is how the linker knows what part of the object file to take, does it understand the C language?
Linker may detect if a function or variable is not used by the application (once again, check the available options), but it is not really the objective of this tool. However if you compile/link some functions as library functions (see options), you can generate a "library" file and then link this library with other object files. The functions of the library will then be included by the linker ONLY if they are used.
What I suggest: use compilation flags (#ifdef...) to include or exclude parts of code from compilation/link.
If you want only those functions in the executable that are eventually called from main, use a library of object files.
Basically the smallest unit the linker will extract from a library is the object file. Whatever symbols are in that object file will also be resolved, until all symbols are resolved.
In other words, if none of the symbols in an object file are needed, it won't end up in the result. If at least one symbol is needed, it will get linked in its entirety.
No, the linker does not understand C. Note that a lot of language compilers create object files (C++, FORTRAN, ..., and assemblers). A linker resolves symbols, which are names attached to values.
John Levine has written a book, "Linkers and Loaders", available on the 'net, which will give you an in-depth understanding of linkers, symbols, and object files.
I recently ran nm -m -p -g on the System.B.dylib library from the iOS SDK4.3 and was surprised to find a lot of symbols marked (undefined) (external). Why and when would an undefined symbol be marked external? I can understand a undefined external symbol marked lazy or weak but these aren't. Many of the pthread_xxx functions fall in this category. When I link with this library however, all symbols are resolved. The pthread_xxx symbols are defined in one of the libraries in the \usr\lib\system folder so I am assume they are satisfied from there. How does that work during linking?
It's been a while since I was an nm and ld C-coding ninja, but I think this only means that there are other libraries this one links against.
Usually this is how dynamic linking works. If you were to nm a static archive of System.B, you would not have observed this behavior. The System.B.dylib on it's own would not do much; unless you make it as part of an ensemble set of dynamic and static libraries whose functions it makes use of. If you now try to compile your final binary BUT omit the library path '/usr/lib/system' then you linker will cry foul and exit with an error telling you that it cannot find a reference to pthread_XXX() (using your above example). During the final assembling of the binary, it needs to make sure it knows the location of each and every function used.
HTH
Say I have 2 static libs
ex1.a
ex2.a
In both libs I will define 10 same functions
When Compiling a sample test code say "test.c" , I link with both static libs ex1.a and ex2.a
In "test.c" I will call only 3 functions, then I will get the
linker error "same symbols deifned in both ex1.a and ex2.a libraries" This is Ok.
My Question here is :
1. Why this error only display 3 functions as multiple defined.. Why not it list all 10 functions
In VC8 How can I list all multiple defined symbols without actualy calling that function in test code ...
Thanks,
Thats because, linker tries to resovle a symbol name, when it compiles and links a code which has the function call. Only when the code has some function calls, linker would try to resolve it in either the test code or the libraries linked along and thats when it would find multiple definitions. If no function called, then I guess no problem.
What you experience is the optimizing part of the linker: By default it won't include code that isn't referenced. The compiler will create multiple object files with most likely unresolved dependencies (calls that couldn't be satisfied by the code included). So the linker takes all object files passed and tries to find solutions for the unresolved dependencies. If it fails, it will check the available library files. If there are multiple options with the same exact name/signature it will start complaining cause it won't be able to decide which one to pick (for identical code this won't matter but imagine different implementations using different "behind the scenes" work on memory, such as debug and release stuff).
The only (and possibly easiest way) I could think of to detect all these multiple definitions would be creating another static library project including all source files used in both static libs. When creating a library the linker will include everything called or exported - you won't need specific code calling the stuff for the linker to see/include everything as long as it's exported.
However I still don't understand what you're actually trying to accomplish as a whole. Trying to find code shared between two libraries?
Under what situation is it possible for GCC to not throw an "undefined reference" link error message when trying to call made-up functions?
For example, a situation in which this C code is compiled and linked by GCC:
void function()
{
made_up_function_name();
return;
}
...even though made_up_function_name is not present anywhere in the code (not headers, source files, declarations, nor any third party library).
Can that kind of code be accepted and compiled by GCC under certain conditions, without touching the actual code? If so, which?
Thanks.
EDIT: no previous declarations or mentions to made_up_function_name are present anywhere else. Meaning that a grep -R of the whole filesystem will only show that exact single line of code.
Yes, it is possible to avoid reporting undefined references - using --unresolved-symbols linker option.
g++ mm.cpp -Wl,--unresolved-symbols=ignore-in-object-files
From man ld
--unresolved-symbols=method
Determine how to handle unresolved symbols. There are four
possible values for method:
ignore-all
Do not report any unresolved symbols.
report-all
Report all unresolved symbols. This is the default.
ignore-in-object-files
Report unresolved symbols that are contained in shared
libraries, but ignore them if they come from regular object
files.
ignore-in-shared-libs
Report unresolved symbols that come from regular object
files, but ignore them if they come from shared libraries. This
can be useful when creating a dynamic binary and it is known
that all the shared libraries that it should be referencing
are included on the linker's command line.
The behaviour for shared libraries on their own can also be
controlled by the --[no-]allow-shlib-undefined option.
Normally the linker will generate an error message for each
reported unresolved symbol but the option --warn-unresolved-symbols can
change this to a warning.
TL;DR It can not complain, but you don't want that. Your code will crash if you force the linker to ignore the problem. It'd be counterproductive.
Your code relies on the ancient C (pre-C99) allowing functions to be implicitly declared at their point of use. Your code is semantically equivalent to the following code:
void function()
{
int made_up_function_name(...); // The implicit declaration
made_up_function_name(); // Call the function
return;
}
The linker rightfully complains that the object file that contains the compiled function() refers to a symbol that wasn't found anywhere else. You have to fix it by providing the implementation for made_up_function_name() or by removing the nonsensical call. That's all there's to it. No linker-fiddling involved.
If you declare the prototype of the function before using it , it shold compile. Anyway the error while linking will remain.
void made_up_function_name();
void function()
{
made_up_function_name();
return;
}
When you build with the linker flag -r or --relocatable it will also not produce any "undefined reference" link error messages.
This is because -r will link different objects in a new object file to be linked at a later stage.
And then there is this nastiness with the -D flag passed to GCC.
$cat undefined.c
void function()
{
made_up_function_name();
return;
}
int main(){
}
$gcc undefined.c -Dmade_up_function_name=atexit
$
Just imagine looking for the definition of made_up_function_name- it appears nowhere yet "does things" in the code.
I can't think of a nice reason to do this exact thing in code.
The -D flag is a powerful tool for changing code at compile time.
If function() is never called, it might not be included in the executable, and the function called from it is not searched for either.
The "standard" algorithm according to which POSIX linkers operate leaves open the possibility that the code will compile and link without any errors. See here for details: https://stackoverflow.com/a/11894098/187690
In order to exploit that possibility the object file that contains your function (let's call it f.o) should be placed into a library. That library should be mentioned in the command line of the compiler (and/or linker), but by that moment no other object file (mentioned earlier in the command line) should have made any calls to function or any other function present in f.o. Under such circumstances linker will see no reason to retrieve f.o from the library. Linker will completely ignore f.o, completely ignore function and, therefore, remain completely oblivious of the call to made_up_function_name. The code will compile even though made_up_function_name is not defined anywhere.