why do we need the shared library during compile time - c

Why we need the presence of the shared library during the compile time of my executable? My reasoning is that since shared library is not included into my executable and is loaded during the runtime, it is not supposed to be needed during compile time. Or Am I missing something?
#include<stdio.h>
int addNumbers(int, int); //prototype should be enough, no?
int main(int argc, char* argv[]){
int sum = addNumbers(1,2);
printf("sum is %d\n", sum);
return 0;
}
I had the libfoo.so in my current dir but I changed its name to libfar.so to find that shared lib is needed at compile or it doesn't compile.
gcc -o main main.c -L. -lfoo gives main.c:(.text+0x28): undefiend reference to 'addNumber'
I think it should be enough to only have the name of the shared library. The shared library itself is not needed since it is found in the LD_LIBRARY_PATH and loaded dynamically at runtime. Is there something else needed other than the name of the shared lib?

Nothing is needed at compile time, because C has a notion of separate compilation of translation units. But once all the different sources have been compiled, it is time to link everything together. The notion of shared library is not present in the standard but is it now a common thing, so here is how a common linker proceeds:
it looks in all compiled modules for identifiers with external linkage either defined or only declared
it looks in libraries (both static and dynamic) for identifiers already used and not defined. It then links the modules from static libraries, and stores references from dynamic libraries. But at least on Unix-likes, it needs to access the shared library for potential required (declared and not defined) identifiers in order to make sure they are already defined or can be found in other linked libraries be them static or dynamic
This produces the executable file. Then at load time, the dynamic loader knows all the dynamic modules that are required and loads them in memory (if they are not already there) along with the actual executable and builds a (virtual) memory map

gcc -o main main.c -L. -lfoo
This command does (at least) two steps: compile main.c into an object file and link all resources into an executable main. The error you see is from the last step, the linker.
The linker is responsible for generating the final executable machine code. It requires the shared object library because it needs to generate the machine code which loads it and executes any functions used in it.

Related

Undefined reference when linking with shared object

I'm investigating the topic of shared libraries. The way I understood it, when linking a source file with a shared library to form an executable, unresolved symbols will remain unresolved until their first call, then lazy binding will resolve them. Based on that, I assumed that using a function that wasn't defined anywhere won't throw linker error, as it will leave the resolving job to the dynamic linker. But when I typed the following commands in the terminal:
gcc -c foo.c -fPIC
gcc -shared foo.o -o libfoos.so
gcc main.c -Wl,-rpath=. libfoos.so
I got an "undefined reference to 'foo2' " error.
This was all done with the following files in the same directory:
foo.h:
#ifndef __FOO_H__
#define __FOO_H__
int foo(int num);
#endif /* __FOO_H__ */
main.c:
#include <stdio.h>
#include "foo.h"
int main()
{
int a = 5;
printf("%d * %d = %d\n", a, a, foo(a));
printf("%d + %d = %d\n", a, a, foo2(a));
return (0);
}
and foo.c:
#include "foo.h"
int foo(int num)
{
return (num * num);
}
So my questions are:
Is it true that symbols remain unresolved until they are called for the first time? If so, then how come I'm getting an error at linking time?
I'm guessing that maybe some check needs to be made as for the very existence of the symbols (foo and foo2 my example) in the shared library, already at linking time. If so, then why not resolving them already at the same time, since we're accessing some information in the library anyway?
Thanks!
Is it true that symbols remain unresolved until they are called for the first time?
I think you may be confusing the requirements and semantics of the source language (C) with the execution semantics of dynamic shared object formats and implementations, such as ELF.
The C language does not specify when symbols are resolved, only that there must be a definition for each identifier that is used to access an object or call a function.
Different DSO formats have different properties in and around this. With ELF, for example, resolution of dynamic symbols can be deferred until the symbol is first referenced, or it can be performed immediately upon loading the DSO. This is configurable both at runtime and at compile time. The semantics of other DSO formats may be different in this and other regards.
Bottom line: no, it is not necessarily true that dynamic symbols are resolved only when they are first referenced, but that might be the default for your particular implementation and environment.
If so, then how come I'm getting an error at
linking time?
The linker is checking the C language requirements at build time. It is perfectly reasonable and in fact desirable for it to do so even when building shared objects, for if there is an unresolvable symbol used then one would like to know about the problem and fix it before people try to use the program. This is not related to whether dynamic symbol resolution is deferred at runtime.
I'm guessing that maybe some check needs to be made as for the very existence of the symbols (foo and foo2 my example) in the shared
library, already at linking time.
Yes, that's basically it.
If so, then why not resolving them
already at the same time, since we're accessing some information in
the library anyway?
How do you know that doesn't happen?
In a DSO system that does not feature symbol relocation, that can be done and is done. The dynamism in such a DSO system is primarily in whether a given library is loaded at all. DSOs in such a system have fixed load addresses and the symbols exported from them also have fixed addresses. This allows executables to be (much) smaller and for system memory to be used (much) more efficiently, relative to statically-linked executables.
But there are big practical problems with such an approach. For example, you have to contend with address-space collisions between different DSOs, updating DSOs is difficult and risky, and having well-known addresses is a security risk. Therefore, most modern DSO systems feature symbol relocation. In such a system, DSOs' load addresses are determined dynamically, at runtime, and typically even the relative offsets represented by their exported symbols are not fixed. This is the kind of DSO system that supports deferred symbol resolution, and with such a system, symbols from other DSOs cannot be resolved at build time because they are not known until run time, and they might even vary from run to run.

Undefined reference to error when .so libraries are involved while building the executable

I have a .so library and while building it I didn't get any undefined reference errors.
But now I am building an executable using the .so file and I can see the undefined reference errors during the linking stage as shown below:
xy.so: undefined reference to `MICRO_TO_NANO_ULL'
I referred to this and this but couldn't really understand the dynamic linking.
Also reading from here lead to more confusion:
Dynamic linking is accomplished by placing the name of a sharable
library in the executable image. Actual linking with the library
routines does not occur until the image is run, when both the
executable and the library are placed in memory. An advantage of
dynamic linking is that multiple programs can share a single copy of
the library.
My questions are:
Doesn't dynamic linking means that when I start the executable using
./executable_name then if the linker not able to locate the .so
file on which executable depends it should crash?
What actually is dynamic linking if all external entity references are
resolved while building? Is it some sort of pre-check performed by dynamic linker? Else
dynamic linker can make use of
LD_LIBRARY_PATH to get additional libraries to resolve the undefined
symbols.
Doesn't dynamic linking means that when I start the executable using ./executable_name then if the linker not able to locate the .so file on which executable depends it should crash?
No, linker will exit with "No such file or directory" message.
Imagine it like this:
Your executable stores somewhere a list of shared libraries it needs.
Linker, think of it as a normal program.
Linker opens your executable.
Linker reads this list. For each file.
It tries to find this file in linker paths.
If it finds the file, it "loads" it.
If it can't find the file, it get's errno with No Such file or directory from open() call. And then prints a message that it can't find the library and terminates your executable.
When running the executable, linker dynamically searches for a symbol in shared libraries.
When it can't find a symbol, it prints some message and the executable teerminates.
You can for example set LD_DEBUG=all to inspect what linker is doing. You can also inspect your executable under strace to see all the open calls.
What actually is dynamic linking if all external entity references are resolved while
building?
Dynamic linking is when you run the executable then the linker loads each shared library.
When building, your compiler is kind enough to check for you, that all symbols that you use in your program exist in shared libraries. This is just for safety. You can for example disable this check with ex. --unresolved-symbols=ignore-in-shared-libs.
Is it some sort of pre-check performed by dynamic linker?
Yes.
Else dynamic linker can make use of LD_LIBRARY_PATH to get additional libraries to resolve the undefined symbols.
LD_LIBRARY_PATH is just a comma separated list of paths to search for the shared library. Paths in LD_LIBRARY_PATH are just processed before standard paths. That's all. It doesn't get "additional libraries", it gets additional paths to search for the libraries - libraries stay the same.
It looks like there is a #define missing when you compile your shared library. This error
xy.so: undefined reference to `MICRO_TO_NANO_ULL'
means, that something like
#define MICRO_TO_NANO_ULL(sec) ((unsigned long long)sec * 1000)
should be present, but is not.
The compiler assumes then, that it is an external function and creates an (undefined) symbol for it, while it should be resolved at compile time by a preprocessor macro.
If you include the correct file (grep for the macro name) or put an appropriate definition at the top of your source file, then the linker error should vanish.
Doesn't dynamic linking means that when I start the executable using ./executable_name then if the linker not able to locate the .so file on which executable depends it should crash?
Yes. If the .so file is not present at run-time.
What actually is dynamic linking if all external entity references are resolved while building? Is it some sort of pre-check performed by dynamic linker? Else dynamic linker can make use of LD_LIBRARY_PATH to get additional libraries to resolve the undefined symbols.
It allows for libraries to be upgraded and have applications still be able to use the library, and it reduces memory usage by loading one copy of the library instead of one in every application that uses it.
The linker just creates references to these symbols so that the underlying variables or functions can be used later. It does not link the variables and functions directly into the executable.
The dynamic linker does not pull in any libraries unless those libraries are specified in the executable (or by extension any library the executable depends on). If you provide an LD_LIBRARY_PATH directory with a .so file of an entirely different version than what the executable requires the executable can crash.
In your case, it seems as if a required macro definition has not been found and the compiler is using implicit declaration rules. You can easily fix this by compiling your code with -pedantic -pedantic-errors (assuming you're using GCC).
Doesn't dynamic linking means that when I start the executable using
./executable_name then if the linker not able to locate the .so file
on which executable depends it should crash?
It will crash. The time of crash does depend on the way you call a certain exported function from the .so file.
You might retrieve all exported functions via functions pointers by yourself by using dlopen dlysm and co. In this case the program will crash at first call in case it does not find the exported method.
In case of the executable just calling an exported method from a shared object (part of it's header) the dynamic linker uses the information of the method to be called in it's executable (see second answer) and crashes in case of not finding the lib or a mismatch in symbols.
What actually is dynamic linking if all external entity references are resolved while building? Is it some sort of pre-check performed by dynamic linker? Else dynamic linker can make use of LD_LIBRARY_PATH to get additional libraries to resolve the undefined symbols.
You need to differentiate between the actual linking and the dynamic linking. Starting off with the actual linking:
In case of linking a static library, the actual linking will copy all code from the method to be called inside the executable/library using it.
When linking a dynamic library you will not copy code but symbols. The symbols contain offsets or other information pointing to the acual code in the dynamic library. If the executable does invoke a method which is not exported by the dynamic library though, it will already fail at the actual linking part.
Now when starting your executable, the OS will at some point try to load the shared object into memory where the code actually resides in. If it does not find it or also if it is imcotable (i.e.: the executable was linked to a library using different exports), it might still fail at runtime.

Which functions are included in executable file compiled by gcc with "-static" on? And which functions are not?

When a C program is compiled with GCC's -static option on, the final executable file would include tons of C's standard functions. For example, my whole program is like
int main(int argc, char *argv[]) {
printf("Hello, world!\n");
return 0;
}
I checked the compiled executable file, and functions like, strcmp(), mktime(), realloc(), etc. are included in it, even though my program never calls them. However, some functions in stdlib.h are missing, like, rand(), system(), etc. My experiment environments are: Ubuntu 14.04 (with Linux kernel 3.13.0); GCC 4.8.2. I would like to know which C's functions would be included in the executable file when -static is turned on.
Static linking means that ALL libraries that your program needs are linked and included into our executable at compiling time. In other words, your program will be larger, but it will be very independent (portable) as the executable will contain all libraries that it needs to run.
This means that with -static you will have ALL functions defined in your included libraries. You didn't put the include declarations, but just printf() already uses a large amount of libraries.
In other words, we cannot tell you which libraries are included in your program when using static, because it will vary from program to program.
Static libs are archives of object files.
Linking them, only brings in those archive members that resolve undefined
symbol references, and this works recursively (e.g., you may call a(), which calls b(), which calls c()). If each archive member defined exactly one symbol (e.g., a.o only defines a(), etc.) , you'd get only those symbols that were needed (recursively). Practically, an archive member may also define other symbols (e.g., a.o may define a() and variable), so you'll get symbols that resolve undefined symbol references along with the symbols that shared the same object file with the needed symbol definition.

Does the linker refer to the main code

Let assume I am having three source files main.c, a.c and b.c. In the main.c are called some of the functions (not all) that are defined in a.c. None of the functions defined in b.c are called (used) by main.c. In main.c is the main function. Then we have a makefile that compiles all the source files(main.c, a.c and b.c) and then links them to produce executable file, in my case intel hex file. My question is: Does the linker know in which file the main function resides and knowing that to determine what part of the object files to link together? I mean if the linker produces the exe file based only on the recipe of the rule to make the target then no matter how many functions are called in our application code the size of the executable will be the same because the recipe says to link all the object files. For example we compile the three source files and we get three object files: main.o a.o and b.o (the bigger the object files are, the bigger the exe file is). I know you would say if you dont want anything from the b.c then do not include it in the build. But it means that every time I want to change the application (include/exclide modules) I need to change the makefile too. And another thing is how the linker knows what part of the object file to take, does it understand the C language? I hope you understand my question, excuse my bad English.
1) Does the linker know in which file the main function resides and knowing that to determine what part of the object files to link together?
Maybe there are options of your toolchain (compiler/linker) to enable this kind of optimizations, I mean removing unused functions from link, but I have big doubt for global functions (could be possible for static functions).
2) And another thing is how the linker knows what part of the object file to take, does it understand the C language?
Linker may detect if a function or variable is not used by the application (once again, check the available options), but it is not really the objective of this tool. However if you compile/link some functions as library functions (see options), you can generate a "library" file and then link this library with other object files. The functions of the library will then be included by the linker ONLY if they are used.
What I suggest: use compilation flags (#ifdef...) to include or exclude parts of code from compilation/link.
If you want only those functions in the executable that are eventually called from main, use a library of object files.
Basically the smallest unit the linker will extract from a library is the object file. Whatever symbols are in that object file will also be resolved, until all symbols are resolved.
In other words, if none of the symbols in an object file are needed, it won't end up in the result. If at least one symbol is needed, it will get linked in its entirety.
No, the linker does not understand C. Note that a lot of language compilers create object files (C++, FORTRAN, ..., and assemblers). A linker resolves symbols, which are names attached to values.
John Levine has written a book, "Linkers and Loaders", available on the 'net, which will give you an in-depth understanding of linkers, symbols, and object files.

Why create a .a file from .o for static linking?

Consider this code:
one.c:
#include <stdio.h>
int one() {
printf("one!\n");
return 1;
}
two.c:
#include <stdio.h>
int two() {
printf("two!\n");
return 2;
}
prog.c
#include <stdio.h>
int one();
int two();
int main(int argc, char *argv[])
{
one();
two();
return 0;
}
I want to link these programs together. So I do this:
gcc -c -o one.o one.c
gcc -c -o two.o two.c
gcc -o a.out prog.c one.o two.o
This works just fine.
Or I could create a static library:
ar rcs libone.a one.o
ar rcs libtwo.a two.o
gcc prog.c libone.a libtwo.a
gcc -L. prog.c -lone -ltwo
So my question is: why would I use the second version - the one where I created a ".a" files - rather than linking my ".o" files? They both seem to be statically linking, so is there an advantage or architectural difference in one vs another?
Typically libraries are collections of object files that can be used in multiple programs.
In your example there is no advantage, but you might have done:
ar rcs liboneandtwo.a one.o two.o
Then linking your program becomes simpler:
gcc -L. prog.c -loneandtwo
It's really a matter of packaging. Do you have a set of object files that naturally form a set of related functionality that can be reused in multiple programs? If so, then they can sensibly be archived into a static library, otherwise there probably isn't any advantage.
There is one important difference in the final link step. Any object files that you linked will be included in the final program. Object files that are in libraries are only included if they help resolve any undefined symbols in other object files. If they don't, they won't be linked into the final executable.
The difference would be in the size of the executable, although maybe not for your example.
When linking to a library, only the bits that are used by your executable are incorporated. When linking an object file, you take the whole thing.
For example, if your executable had to include every math function in the math library when you only use one, it would be much bigger than it needed to be and contain a lot of unused code.
It is interesting to contrast this with the dynamic linking model of Windows. There, the OS has to load all the Dlls (dynamically linked libraries) entirely that your executable uses, which could lead to bloat in RAM. The advantage of such a model is that your executable is itself smaller, and the linked Dlls may already be in memory used by some other executable, so they don't need to be loaded again.
In static linking, the library functions are loaded separately for each executable.
Technically, the result is exactly the same. Usually, you create libraries for utility functions, so instead of feeding the linker with dozens of object files, you just have to link the library.
BTW, it absolutely makes no sense to create a .a file that contains just one .o file.
You can put a collection of files in an archive (.a) file for later reuse. The standard library is a good example.
Sometimes it makes sense to organize big projects into libraries.
The primary advantage is when you have to link, you can just specify one library instead of all the separate object files. There's also a minor advantage in managing the files, getting to deal with one library instead of a bunch of object files. At one time, this also gave a significant savings in disk space, but current hard drive prices make that less important.
Whenever I am asked this question(by freshers in my team), "why (or sometimes even a 'what is') a .a?", I use the below answer that uses the .zip as an analogy.
"A dotAy is like a zip file of all the dotOhs which you would want to link while building your exe/lib. Savings on disk space, plus one need not type names of all dotOhs involved."
so far, this has seemed to make them understand. ;)

Resources