About the internal logic of C compiler - c

When we build a programe,some symbols are to be resolved at link time(like those in a .lib),
but some can be resolved at run time(those in a .dll),
my doubt is that how does the compiler know about this, or how do we notify the compiler about this?

When you link your code, the compiler searches both static and dynamic libraries for the undefined symbols. If it finds a dynamic symbol exported by a dynamic library, then it defers symbol resolution to runtime; if it finds a static symbol it resolves the symbol right away; and if it doesn't find the symbol at all, it reports an error (unless you're compiling a shared library, in which case it's OK).
You can examine the dynamic symbols exported by a shared library using nm -D.

You must declare a prototype for functions whose bodies are not available at compile time.
You do this by including the appropriate header (.h file) which will contain a definition like so:
int foo(int bar);
Note the lack of a body there.
Often with shared libraries there is also a layer of indirection where a struct containing function pointers is formed. When the library is loaded, it adjusts the function pointers to reference the functions contained in the shared library.

Those that can be resolved at link time are; those that can't are then searched for in shared libraries at run time.

The linker does the job.
For static functions, the linker include the libraries into your excutable. Calls are to fixed positions in memory.
For dynamic libraries, the linker put a runtime "searcher" for the library. Dynamic Libraries publish the list of functions and its relative memory addresses. So, the runtime can fill the list of function pointers to them.
The original code for dynamic functions could be compiled as a call to a function pointer.
[indeed, that's the job of the linker: replace the function calls to its references to produce the executable].

The compiler needs to know the function declaration at compile time. The linker will then link to the declaration at link time to make aa executable.
For dynamically loaded libraries you insert a code to fetch the symbols at runtime using dlopen dlsym and dlclose. Now these function calls search for the symbols and if they are not found in the dynamic libraries they return error. Hence you need to handle this error as well. A dynamic library loading doesnt ensure symbols have been be resolved and linked. It still has to be present when the dynamic library is loaded.
EDIT : Fixed terrible grammar

Related

Undefined reference to error when .so libraries are involved while building the executable

I have a .so library and while building it I didn't get any undefined reference errors.
But now I am building an executable using the .so file and I can see the undefined reference errors during the linking stage as shown below:
xy.so: undefined reference to `MICRO_TO_NANO_ULL'
I referred to this and this but couldn't really understand the dynamic linking.
Also reading from here lead to more confusion:
Dynamic linking is accomplished by placing the name of a sharable
library in the executable image. Actual linking with the library
routines does not occur until the image is run, when both the
executable and the library are placed in memory. An advantage of
dynamic linking is that multiple programs can share a single copy of
the library.
My questions are:
Doesn't dynamic linking means that when I start the executable using
./executable_name then if the linker not able to locate the .so
file on which executable depends it should crash?
What actually is dynamic linking if all external entity references are
resolved while building? Is it some sort of pre-check performed by dynamic linker? Else
dynamic linker can make use of
LD_LIBRARY_PATH to get additional libraries to resolve the undefined
symbols.
Doesn't dynamic linking means that when I start the executable using ./executable_name then if the linker not able to locate the .so file on which executable depends it should crash?
No, linker will exit with "No such file or directory" message.
Imagine it like this:
Your executable stores somewhere a list of shared libraries it needs.
Linker, think of it as a normal program.
Linker opens your executable.
Linker reads this list. For each file.
It tries to find this file in linker paths.
If it finds the file, it "loads" it.
If it can't find the file, it get's errno with No Such file or directory from open() call. And then prints a message that it can't find the library and terminates your executable.
When running the executable, linker dynamically searches for a symbol in shared libraries.
When it can't find a symbol, it prints some message and the executable teerminates.
You can for example set LD_DEBUG=all to inspect what linker is doing. You can also inspect your executable under strace to see all the open calls.
What actually is dynamic linking if all external entity references are resolved while
building?
Dynamic linking is when you run the executable then the linker loads each shared library.
When building, your compiler is kind enough to check for you, that all symbols that you use in your program exist in shared libraries. This is just for safety. You can for example disable this check with ex. --unresolved-symbols=ignore-in-shared-libs.
Is it some sort of pre-check performed by dynamic linker?
Yes.
Else dynamic linker can make use of LD_LIBRARY_PATH to get additional libraries to resolve the undefined symbols.
LD_LIBRARY_PATH is just a comma separated list of paths to search for the shared library. Paths in LD_LIBRARY_PATH are just processed before standard paths. That's all. It doesn't get "additional libraries", it gets additional paths to search for the libraries - libraries stay the same.
It looks like there is a #define missing when you compile your shared library. This error
xy.so: undefined reference to `MICRO_TO_NANO_ULL'
means, that something like
#define MICRO_TO_NANO_ULL(sec) ((unsigned long long)sec * 1000)
should be present, but is not.
The compiler assumes then, that it is an external function and creates an (undefined) symbol for it, while it should be resolved at compile time by a preprocessor macro.
If you include the correct file (grep for the macro name) or put an appropriate definition at the top of your source file, then the linker error should vanish.
Doesn't dynamic linking means that when I start the executable using ./executable_name then if the linker not able to locate the .so file on which executable depends it should crash?
Yes. If the .so file is not present at run-time.
What actually is dynamic linking if all external entity references are resolved while building? Is it some sort of pre-check performed by dynamic linker? Else dynamic linker can make use of LD_LIBRARY_PATH to get additional libraries to resolve the undefined symbols.
It allows for libraries to be upgraded and have applications still be able to use the library, and it reduces memory usage by loading one copy of the library instead of one in every application that uses it.
The linker just creates references to these symbols so that the underlying variables or functions can be used later. It does not link the variables and functions directly into the executable.
The dynamic linker does not pull in any libraries unless those libraries are specified in the executable (or by extension any library the executable depends on). If you provide an LD_LIBRARY_PATH directory with a .so file of an entirely different version than what the executable requires the executable can crash.
In your case, it seems as if a required macro definition has not been found and the compiler is using implicit declaration rules. You can easily fix this by compiling your code with -pedantic -pedantic-errors (assuming you're using GCC).
Doesn't dynamic linking means that when I start the executable using
./executable_name then if the linker not able to locate the .so file
on which executable depends it should crash?
It will crash. The time of crash does depend on the way you call a certain exported function from the .so file.
You might retrieve all exported functions via functions pointers by yourself by using dlopen dlysm and co. In this case the program will crash at first call in case it does not find the exported method.
In case of the executable just calling an exported method from a shared object (part of it's header) the dynamic linker uses the information of the method to be called in it's executable (see second answer) and crashes in case of not finding the lib or a mismatch in symbols.
What actually is dynamic linking if all external entity references are resolved while building? Is it some sort of pre-check performed by dynamic linker? Else dynamic linker can make use of LD_LIBRARY_PATH to get additional libraries to resolve the undefined symbols.
You need to differentiate between the actual linking and the dynamic linking. Starting off with the actual linking:
In case of linking a static library, the actual linking will copy all code from the method to be called inside the executable/library using it.
When linking a dynamic library you will not copy code but symbols. The symbols contain offsets or other information pointing to the acual code in the dynamic library. If the executable does invoke a method which is not exported by the dynamic library though, it will already fail at the actual linking part.
Now when starting your executable, the OS will at some point try to load the shared object into memory where the code actually resides in. If it does not find it or also if it is imcotable (i.e.: the executable was linked to a library using different exports), it might still fail at runtime.

Symbol confusion in dynamic linked library when app code has same symbol [duplicate]

I have two dynamically loadable libraries lib_smtp.so and and libpop.so etc. Both have a global variable named protocol which is initialized to "SMTP" and "POP" respectively. I have another static library libhttp.a where protocol is initialized to "HTTP".
Now for some reason i need to compile all dynamic linkable and loadable libraries statically and include in the executable. Doing so i am getting error "multiple definition of symbol" during linking of static libraries.
I am curious to know how linker resolves duplicate symbols during dynamic linking where all three mentioned libraries are getting linked ?
Is there some way i can do the same statically as linker is doing in dynamic linking ie without any conflict add all static libraries to executable which have same symbols? if not, why the process is different for statically linked libraries.
Dynamic linking in modern Linux and several other operating systems is based on the ELF binary format. The (ELF) dynamic libraries on which an executable or other shared library relies are prioritized. To resolve a given symbol, the dynamic linker checks each library in priority order until it finds one that defines the symbol.
That can be dicey when multiple dynamic objects define the same symbol and also multiple dynamic objects use that symbol. It can then be the case that the symbol is resolved differently in different dynamic objects.
Full details are out of scope for SO, but I don't know a better technical explanation than the one in Ulrich Drepper's paper "How to Write Shared Libraries".
In dynamic linking some facility called "symbol visibility" kicks in. Essentially this allows to expose only certain symbols across the object's (object in the sense of shared object) boundaries. It is good style to compile and link shared objects with symbols being hidden by default and only expose those explicitly that are required by callees.
Symbol visibility is applied during linking and so far only implemented in dynamic linkers. It's certainly possible to also have it in static linkage, Apple's GCC variant implements so called Mach-O relocateable object files which can be statically linked with visibility applied. But I don't know if the vanilla GCC, binutils ld or the gold linker can do this for plain old ELF.

How does linker resolves duplicate symbols in dynamically loadable libraries?

I have two dynamically loadable libraries lib_smtp.so and and libpop.so etc. Both have a global variable named protocol which is initialized to "SMTP" and "POP" respectively. I have another static library libhttp.a where protocol is initialized to "HTTP".
Now for some reason i need to compile all dynamic linkable and loadable libraries statically and include in the executable. Doing so i am getting error "multiple definition of symbol" during linking of static libraries.
I am curious to know how linker resolves duplicate symbols during dynamic linking where all three mentioned libraries are getting linked ?
Is there some way i can do the same statically as linker is doing in dynamic linking ie without any conflict add all static libraries to executable which have same symbols? if not, why the process is different for statically linked libraries.
Dynamic linking in modern Linux and several other operating systems is based on the ELF binary format. The (ELF) dynamic libraries on which an executable or other shared library relies are prioritized. To resolve a given symbol, the dynamic linker checks each library in priority order until it finds one that defines the symbol.
That can be dicey when multiple dynamic objects define the same symbol and also multiple dynamic objects use that symbol. It can then be the case that the symbol is resolved differently in different dynamic objects.
Full details are out of scope for SO, but I don't know a better technical explanation than the one in Ulrich Drepper's paper "How to Write Shared Libraries".
In dynamic linking some facility called "symbol visibility" kicks in. Essentially this allows to expose only certain symbols across the object's (object in the sense of shared object) boundaries. It is good style to compile and link shared objects with symbols being hidden by default and only expose those explicitly that are required by callees.
Symbol visibility is applied during linking and so far only implemented in dynamic linkers. It's certainly possible to also have it in static linkage, Apple's GCC variant implements so called Mach-O relocateable object files which can be statically linked with visibility applied. But I don't know if the vanilla GCC, binutils ld or the gold linker can do this for plain old ELF.

Symbol import in a dynamically loaded library (DSO)

Let's say I am creating shared object library libz.so which includes a header file lets say stdio.h. The stdio.h code which is part of the libc library is linked in statically in the system. How does the dynamic linker resolve the symbol references from DSO to the statically linked libc file?
For example:
Lets say I compile z.c file with following code into a SO.
#include <stdio.h>
int foo(void){
printf("hello world!\n");
}
How would dynamic linker know about the location of printf in the statically linked libc and patch the printf address at run time?
In general, assuming that static library uses ABI compatible with shared libraries (which is not the case on all platforms), then symbol resolution is delegated to dynamic linker.
If your DSO has been linked with libc statically, it may include its own copy of required code from libc and other libraries. That may potentially create issues if DSO is loaded from another executable that uses libc.
If your DSO was linked with libc dynamically, then the DSO would add required shared library on its dynamic dependency list and would leave the symbols unresolved until it's loaded.
Let's assume, that we're dealing with the latter case as it's what typically happens.
When you load DSO, runtime linker recursively loads required shared libraries and for each unresolved symbol tries to figure out its location. If your executable, for instance, has statically linked printf and your DSO wants to us it, dynamic linker would resolve references from DSO to point to printf's location in the executable. If the executable is dynamically linked with libc, printf references would point to the code in libc.so.
You can get a peek at what's going on if you set LD_DEBUG=all in the environment and run the app that loads your DSO (or any dynamically linked executable). You will see what dynamic linker is doing as it happens. The output is verbose, but should give you general idea of what happens under the hood.

Dynamic library (.so) : Link time and run time

What is the main difference between using a .so file a link time or using it at run time (dlopen() etc) ?
What kind of validation is performed when used at link time ?
What is the role of the header file that lists the methods exposed out of .so and used in the target binary ?
How does the address space looks in both the cases ?
At link time the compiler will only check that the symbols are available in the library and will specify which library to link against later (in case you are linking against multiple libraries providing the same symbol)
The header file will only tell the compiler of what function prototypes are available and (depending on the programming language used) how to translate them into symbols. e.g. C++ extern "C".
If you are linking against a library then the linker will create a global address translation table in the executable which will be populated with the symbol addresses at runtime when the library is loaded. If you are opening the library with dlopen then you are responsible of creating variables holding the symbol pointers yourself via dlsym, but it allows you more flexibility like e.g. changing those during runtime, loading plugins or other functions that are unavailable at compile time.

Resources